Sanger Institute - Publications 2011

Number of papers published in 2011: 413

  • Genomics in 2011: challenges and opportunities.

    Adams DJ, Berger B, Harismendy O, Huttenhower C, Liu XS, Myers CL, Oshlack A, Rinn JL and Walhout AJ

    Wellcome Trust Sanger Institute.

    As we come to the end of 2011, Genome Biology has asked some members of our Editorial Board for their views on the state of play in genomics. What was their favorite paper of 2011? What are the challenges in their particular research area? Who has had the biggest influence on their careers? What advice would they give to young researchers embarking on a career in research?

    Genome biology 2011;12;12;137

  • The future of model organisms in human disease research.

    Aitman TJ, Boone C, Churchill GA, Hengartner MO, Mackay TF and Stemple DL

    Medical Research Council Clinical Sciences Centre, Imperial College, London, UK. t.aitman@csc.mrc.ac.uk

    Model organisms have played a huge part in the history of studies of human genetic disease, both in identifying disease genes and characterizing their normal and abnormal functions. But is the importance of model organisms diminishing? The direct discovery of disease genes and variants in humans has been revolutionized, first by genome-wide association studies and now by whole-genome sequencing. Not only is it now much easier to directly identify potential disease genes in humans, but the genetic architecture that is being revealed in many cases is hard to replicate in model organisms. Furthermore, disease modelling can be done with increasing effectiveness using human cells. Where does this leave non-human models of disease?

    Funded by: Medical Research Council; NIGMS NIH HHS: GM076468, GM45146, R01 GM045146-20, R01 GM070683

    Nature reviews. Genetics 2011;12;8;575-82

  • Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome.

    Albers CA, Cvejic A, Favier R, Bouwmans EE, Alessi MC, Bertone P, Jordan G, Kettleborough RN, Kiddle G, Kostadima M, Read RJ, Sipos B, Sivapalaratnam S, Smethurst PA, Stephens J, Voss K, Nurden A, Rendon A, Nurden P and Ouwehand WH

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. caa@sanger.ac.uk

    Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.

    Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: MC_U105260799; Wellcome Trust: 082597, 082961, 084183

    Nature genetics 2011;43;8;735-7

  • Dindel: accurate indel calls from short-read data.

    Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, United Kingdom. caa@sanger.ac.uk

    Small insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs. However, inference of indels from next-generation sequence data is challenging, and so far methods for identifying indels lag behind methods for calling SNVs in terms of sensitivity and specificity. We propose a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included. In our probabilistic realignment model we account for base-calling errors, mapping errors, and also, importantly, for increased sequencing error indel rates in long homopolymer runs. We show that our method is sensitive and achieves low false discovery rates on simulated and real data sets, although challenges remain. The algorithm is implemented in the program Dindel, which has been used in the 1000 Genomes Project call sets.

    Funded by: British Heart Foundation: RG/09/012/28096; Wellcome Trust: 086084, 090532, WT089088/Z/09/Z

    Genome research 2011;21;6;961-73

  • Transcription factor Bcl11b controls selection of invariant natural killer T-cells by regulating glycolipid presentation in double-positive thymocytes.

    Albu DI, VanValkenburgh J, Morin N, Califano D, Jenkins NA, Copeland NG, Liu P and Avram D

    Center for Cell Biology and Cancer Research, Albany Medical College, Albany, NY 12208, USA.

    Invariant natural killer T cells (iNKT cells) are innate-like T cells important in immune regulation, antimicrobial protection, and anti-tumor responses. They express semi-invariant T cell receptors, which recognize glycolipid antigens. Their positive selection is mediated by double-positive (DP) thymocytes, which present glycolipid self-antigens through the noncanonical MHC class I-like molecule CD1d. Here we provide genetic and biochemical evidence that removal of the transcription factor Bcl11b in DP thymocytes leads to an early block in iNKT cell development, caused by both iNKT cell extrinsic and intrinsic defects. Specifically, Bcl11b-deficient DP thymocytes failed to support Bcl11b-sufficient iNKT precursor development due to defective glycolipid self-antigen presentation, and showed enlarged lysosomes and accumulation of glycosphingolipids. Expression of genes encoding lysosomal proteins with roles in sphingolipid metabolism and glycolipid presentation was found to be altered in Bcl11b-deficient DP thymocytes. These include cathepsins and Niemann-Pick disease type A, B, and C genes. Thus, Bcl11b plays a central role in presentation of glycolipid self-antigens by DP thymocytes, and regulates directly or indirectly expression of lysosomal genes, exerting a critical extrinsic role in development of iNKT lineage, in addition to the intrinsic role in iNKT precursors. These studies demonstrate a unique and previously undescribed role of Bcl11b in DP thymocytes, in addition to the critical function in positive selection of conventional CD4 and CD8 single-positive thymocytes.

    Funded by: NIAID NIH HHS: AI067846, AI078273

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;15;6211-6

  • The genome of the green anole lizard and a comparative analysis with birds and mammals.

    Alföldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, Russell P, Lowe CB, Glor RE, Jaffe JD, Ray DA, Boissinot S, Shedlock AM, Botka C, Castoe TA, Colbourne JK, Fujita MK, Moreno RG, ten Hallers BF, Haussler D, Heger A, Heiman D, Janes DE, Johnson J, de Jong PJ, Koriabine MY, Lara M, Novick PA, Organ CL, Peach SE, Poe S, Pollock DD, de Queiroz K, Sanger T, Searle S, Smith JD, Smith Z, Swofford R, Turner-Maier J, Wade J, Young S, Zadissa A, Edwards SV, Glenn TC, Schneider CJ, Losos JB, Lander ES, Breen M, Ponting CP and Lindblad-Toh K

    Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. jalfoldi@broadinstitute.org

    The evolution of the amniotic egg was one of the great evolutionary innovations in the history of life, freeing vertebrates from an obligatory connection to water and thus permitting the conquest of terrestrial environments. Among amniotes, genome sequences are available for mammals and birds, but not for non-avian reptiles. Here we report the genome sequence of the North American green anole lizard, Anolis carolinensis. We find that A. carolinensis microchromosomes are highly syntenic with chicken microchromosomes, yet do not exhibit the high GC and low repeat content that are characteristic of avian microchromosomes. Also, A. carolinensis mobile elements are very young and diverse-more so than in any other sequenced amniote genome. The GC content of this lizard genome is also unusual in its homogeneity, unlike the regionally variable GC content found in mammals and birds. We describe and assign sequence to the previously unknown A. carolinensis X chromosome. Comparative gene analysis shows that amniote egg proteins have evolved significantly more rapidly than other proteins. An anole phylogeny resolves basal branches to illuminate the history of their repeated adaptive radiations.

    Funded by: NHGRI NIH HHS: U54 HG003067-08

    Nature 2011;477;7366;587-91

  • High-throughput phenotyping using parallel sequencing of RNA interference targets in the African trypanosome.

    Alsford S, Turner DJ, Obado SO, Sanchez-Flores A, Glover L, Berriman M, Hertz-Fowler C and Horn D

    London School of Hygiene & Tropical Medicine, London WC1E 7HT, United Kingdom.

    African trypanosomes are major pathogens of humans and livestock and represent a model for studies of unusual protozoal biology. We describe a high-throughput phenotyping approach termed RNA interference (RNAi) target sequencing, or RIT-seq that, using Illumina sequencing, maps fitness-costs associated with RNAi. We scored the abundance of >90,000 integrated RNAi targets recovered from trypanosome libraries before and after induction of RNAi. Data are presented for 7435 protein coding sequences, >99% of a non-redundant set in the Trypanosoma brucei genome. Analysis of bloodstream and insect life-cycle stages and differentiated libraries revealed genome-scale knockdown profiles of growth and development, linking thousands of previously uncharacterized and "hypothetical" genes to essential functions. Genes underlying prominent features of trypanosome biology are highlighted, including the constitutive emphasis on post-transcriptional gene expression control, the importance of flagellar motility and glycolysis in the bloodstream, and of carboxylic acid metabolism and phosphorylation during differentiation from the bloodstream to the insect stage. The current data set also provides much needed genetic validation to identify new drug targets. RIT-seq represents a versatile new tool for genome-scale functional analyses and for the exploitation of genome sequence data.

    Funded by: Wellcome Trust: 079643, 083648, 085775/Z/08/Z, 089172

    Genome research 2011;21;6;915-24

  • IDH1 and IDH2 mutations are frequent events in central chondrosarcoma and central and periosteal chondromas but not in other mesenchymal tumours.

    Amary MF, Bacsi K, Maggiani F, Damato S, Halai D, Berisha F, Pollock R, O'Donnell P, Grigoriadis A, Diss T, Eskandarpour M, Presneau N, Hogendoorn PC, Futreal A, Tirabosco R and Flanagan AM

    Department of Histopathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, Middlesex HA7 4LP, UK.

    Somatic mutations in isocitrate dehydrogenase 1 (IDH1) and IDH2 occur in gliomas and acute myeloid leukaemia (AML). Since patients with multiple enchondromas have occasionally been reported to have these conditions, we hypothesized that the same mutations would occur in cartilaginous neoplasms. Approximately 1200 mesenchymal tumours, including 220 cartilaginous tumours, 222 osteosarcomas and another ∼750 bone and soft tissue tumours, were screened for IDH1 R132 mutations, using Sequenom(®) mass spectrometry. Cartilaginous tumours and chondroblastic osteosarcomas, wild-type for IDH1 R132, were analysed for IDH2 (R172, R140) mutations. Validation was performed by capillary sequencing and restriction enzyme digestion. Heterozygous somatic IDH1/IDH2 mutations, which result in the production of a potential oncometabolite, 2-hydroxyglutarate, were only detected in central and periosteal cartilaginous tumours, and were found in at least 56% of these, ∼40% of which were represented by R132C. IDH1 R132H mutations were confirmed by immunoreactivity for this mutant allele. The ratio of IDH1:IDH2 mutation was 10.6 : 1. No IDH2 R140 mutations were detected. Mutations were detected in enchondromas through to conventional central and dedifferentiated chondrosarcomas, in patients with both solitary and multiple neoplasms. No germline mutations were detected. No mutations were detected in peripheral chondrosarcomas and osteochondromas. In conclusion, IDH1 and IDH2 mutations represent the first common genetic abnormalities to be identified in conventional central and periosteal cartilaginous tumours. As in gliomas and AML, the mutations appear to occur early in tumourigenesis. We speculate that a mosaic pattern of IDH-mutation-bearing cells explains the reports of diverse tumours (gliomas, AML, multiple cartilaginous neoplasms, haemangiomas) occurring in the same patient.

    Funded by: Wellcome Trust: WT077012

    The Journal of pathology 2011;224;3;334-43

  • Ollier disease and Maffucci syndrome are caused by somatic mosaic mutations of IDH1 and IDH2.

    Amary MF, Damato S, Halai D, Eskandarpour M, Berisha F, Bonar F, McCarthy S, Fantin VR, Straley KS, Lobo S, Aston W, Green CL, Gale RE, Tirabosco R, Futreal A, Campbell P, Presneau N and Flanagan AM

    Histopathology Unit, Royal National Orthopaedic Hospital National Health Service Trust, Stanmore, UK. a.flanagan@ucl.ac.uk

    Ollier disease and Maffucci syndrome are characterized by multiple central cartilaginous tumors that are accompanied by soft tissue hemangiomas in Maffucci syndrome. We show that in 37 of 40 individuals with these syndromes, at least one tumor has a mutation in isocitrate dehydrogenase 1 (IDH1) or in IDH2, 65% of which result in a R132C substitution in the protein. In 18 of 19 individuals with more than one tumor analyzed, all tumors from a given individual shared the same IDH1 mutation affecting Arg132. In 2 of 12 subjects, a low level of mutated DNA was identified in non-neoplastic tissue. The levels of the metabolite 2HG were measured in a series of central cartilaginous and vascular tumors, including samples from syndromic and nonsyndromic subjects, and these levels correlated strongly with the presence of IDH1 mutations. The findings are compatible with a model in which IDH1 or IDH2 mutations represent early post-zygotic occurrences in individuals with these syndromes.

    Funded by: Wellcome Trust: WT077012

    Nature genetics 2011;43;12;1262-5

  • Synthetic associations are unlikely to account for many common disease genome-wide association signals.

    Anderson CA, Soranzo N, Zeggini E and Barrett JC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB101HH, United Kingdom. statgen-faculty@sanger.ac.uk

    Funded by: Wellcome Trust: WT089120/Z/09/Z, WT091745/Z/10/Z

    PLoS biology 2011;9;1;e1000580

  • Towards an understanding of genetic predisposition to migraine.

    Anttila V, Wessman M, Kallela M and Palotie A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA, Hinxton, UK. aarno.palotie@sanger.ac.uk.

    Plausible genome-wide associations for episodic neurological diseases (such as migraine, epilepsy and ataxias) have been slow to emerge. The first such association was reported in a recent genome-wide association study of migraine, with quantitative expression analysis linking the variant to a nearby regulatory gene, MTDH/AEG-1. This putative mechanism, regulating the expression of the primary glutamate transporter in the brain, EAAT2/GLT-1, has interesting implications bridging the gap between Mendelian and common forms in this key group of disorders.

    Genome medicine 2011;3;3;17

  • Comparative whole genome sequence analysis of the carcinogenic bacterial model pathogen Helicobacter felis.

    Arnold IC, Zigova Z, Holden M, Lawley TD, Rad R, Dougan G, Falkow S, Bentley SD and Müller A

    Institute of Molecular Cancer Research, University of Zürich, Switzerland.

    The gram-negative bacterium Helicobacter felis naturally colonizes the gastric mucosa of dogs and cats. Due to its ability to persistently infect laboratory mice, H. felis has been used extensively to experimentally model gastric disorders induced in humans by H. pylori. We determined the 1.67 Mb genome sequence of H. felis using combined Solexa and 454 pyrosequencing, annotated the genome, and compared it with multiple previously published Helicobacter genomes. About 1,063 (63.6%) of the 1,671 genes identified in the H. felis genome have orthologues in H. pylori, its closest relative among the fully sequenced Helicobacter species. Many H. pylori virulence factors are shared by H. felis: these include the gamma-glutamyl transpeptidase GGT, the immunomodulator NapA, and the secreted enzymes collagenase and HtrA. Helicobacter felis lacks a Cag pathogenicity island and the vacuolating cytotoxin VacA but possesses a complete comB system conferring natural competence. Remarkable features of the H. felis genome include its paucity of transcriptional regulators and an extraordinary abundance of chemotaxis sensors and restriction/modification systems. Helicobacter felis possesses an episomally replicating 6.7-kb plasmid and harbors three chromosomal regions with deviating GC content. These putative horizontally acquired regions show homology and synteny with the recently isolated H. pylori plasmid pHPPC4 and homology to Campylobacter bacteriophage genes (transposases, structural, and lytic genes), respectively. In summary, the H. felis genome harbors a variety of putative mobile elements that are unique among Helicobacter species and may contribute to this pathogen's carcinogenic properties.

    Funded by: Wellcome Trust: 076964

    Genome biology and evolution 2011;3;302-8

  • Sequence-based analysis uncovers an abundance of non-coding RNA in the total transcriptome of Mycobacterium tuberculosis.

    Arnvig KB, Comas I, Thomson NR, Houghton J, Boshoff HI, Croucher NJ, Rose G, Perkins TT, Parkhill J, Dougan G and Young DB

    Division of Mycobacterial Research, MRC National Institute for Medical Research, London, United Kingdom. karnvig@nimr.mrc.ac.uk

    RNA sequencing provides a new perspective on the genome of Mycobacterium tuberculosis by revealing an extensive presence of non-coding RNA, including long 5' and 3' untranslated regions, antisense transcripts, and intergenic small RNA (sRNA) molecules. More than a quarter of all sequence reads mapping outside of ribosomal RNA genes represent non-coding RNA, and the density of reads mapping to intergenic regions was more than two-fold higher than that mapping to annotated coding sequences. Selected sRNAs were found at increased abundance in stationary phase cultures and accumulated to remarkably high levels in the lungs of chronically infected mice, indicating a potential contribution to pathogenesis. The ability of tubercle bacilli to adapt to changing environments within the host is critical to their ability to cause disease and to persist during drug treatment; it is likely that novel post-transcriptional regulatory networks will play an important role in these adaptive responses.

    Funded by: Medical Research Council: U117581288; Wellcome Trust

    PLoS pathogens 2011;7;11;e1002342

  • Enterotypes of the human gut microbiome.

    Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto JM, Bertalan M, Borruel N, Casellas F, Fernandez L, Gautier L, Hansen T, Hattori M, Hayashi T, Kleerebezem M, Kurokawa K, Leclerc M, Levenez F, Manichanh C, Nielsen HB, Nielsen T, Pons N, Poulain J, Qin J, Sicheritz-Ponten T, Tims S, Torrents D, Ugarte E, Zoetendal EG, Wang J, Guarner F, Pedersen O, de Vos WM, Brunak S, Doré J, MetaHIT Consortium, Antolín M, Artiguenave F, Blottiere HM, Almeida M, Brechot C, Cara C, Chervaux C, Cultrone A, Delorme C, Denariaz G, Dervyn R, Foerstner KU, Friss C, van de Guchte M, Guedon E, Haimet F, Huber W, van Hylckama-Vlieg J, Jamet A, Juste C, Kaci G, Knol J, Lakhdari O, Layec S, Le Roux K, Maguin E, Mérieux A, Melo Minardi R, M'rini C, Muller J, Oozeer R, Parkhill J, Renault P, Rescigno M, Sanchez N, Sunagawa S, Torrejon A, Turner K, Vandemeulebrouck G, Varela E, Winogradsky Y, Zeller G, Weissenbach J, Ehrlich SD and Bork P

    European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.

    Our knowledge of species and functional composition of the human gut microbiome is rapidly increasing, but it is still based on very few cohorts and little is known about variation across the world. By combining 22 newly sequenced faecal metagenomes of individuals from four countries with previously published data sets, here we identify three robust clusters (referred to as enterotypes hereafter) that are not nation or continent specific. We also confirmed the enterotypes in two published, larger cohorts, indicating that intestinal microbiota variation is generally stratified, not continuous. This indicates further the existence of a limited number of well-balanced host-microbial symbiotic states that might respond differently to diet and drug intake. The enterotypes are mostly driven by species composition, but abundant molecular functions are not necessarily provided by abundant species, highlighting the importance of a functional analysis to understand microbial communities. Although individual host properties such as body mass index, age, or gender cannot explain the observed enterotypes, data-driven marker genes or functional modules can be identified for each of these host properties. For example, twelve genes significantly correlate with age and three functional modules with the body mass index, hinting at a diagnostic potential of microbial markers.

    Funded by: Wellcome Trust: 076964, 082372

    Nature 2011;473;7346;174-80

  • Comprehensive comparison of three commercial human whole-exome capture platforms.

    Asan, Xu Y, Jiang H, Tyler-Smith C, Xue Y, Jiang T, Wang J, Wu M, Liu X, Tian G, Wang J, Wang J, Yang H and Zhang X

    Beijing Institute of Genomics, Chinese Academy of Sciences, No.7 Beitucheng West Road, Chaoyang District, Beijing 100029, China. asan@genomics.org.cn

    Background: Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

    Results: We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

    Conclusions: We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.

    Funded by: Wellcome Trust

    Genome biology 2011;12;9;R95

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770

    PloS one 2011;6;7;e22213

  • Male lineages in the Himalayan foothills: a commentary on Y-chromosome haplogroup diversity in the sub-Himalayan Terai and Duars populations of East India.

    Ayub Q

    Journal of human genetics 2011;56;12;813-4

  • Combined high-resolution genotyping and geospatial analysis reveals modes of endemic urban typhoid fever transmission.

    Baker S, Holt KE, Clements AC, Karkey A, Arjyal A, Boni MF, Dongol S, Hammond N, Koirala S, Duy PT, Nga TV, Campbell JI, Dolecek C, Basnyat B, Dougan G and Farrar JJ

    The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme , Oxford University Clinical Research Unit , 190 Ben Ham Tu, Quan 5, Ho Chi Minh City , Vietnam.

    Typhoid is a systemic infection caused by Salmonella Typhi and Salmonella Paratyphi A, human-restricted bacteria that are transmitted faeco-orally. Salmonella Typhi and S. Paratyphi A are clonal, and their limited genetic diversity has precluded the identification of long-term transmission networks in areas with a high disease burden. To improve our understanding of typhoid transmission we have taken a novel approach, performing a longitudinal spatial case-control study for typhoid in Nepal, combining single-nucleotide polymorphism genotyping and case localization via global positioning. We show extensive clustering of typhoid occurring independent of population size and density. For the first time, we demonstrate an extensive range of genotypes existing within typhoid clusters, and even within individual households, including some resulting from clonal expansion. Furthermore, although the data provide evidence for direct human-to-human transmission, we demonstrate an overwhelming contribution of indirect transmission, potentially via contaminated water. Consistent with this, we detected S. Typhi and S. Paratyphi A in water supplies and found that typhoid was spatially associated with public water sources and low elevation. These findings have implications for typhoid-control strategies, and our innovative approach may be applied to other diseases caused by other monophyletic or emerging pathogens.

    Open biology 2011;1;2;110008

  • Parallel evolution of genes and languages in the Caucasus region.

    Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, Pocheshkhova E, Haber M, Platt D, Schurr T, Haak W, Kuznetsova M, Radzhabov M, Balaganskaya O, Romanov A, Zakharova T, Soria Hernanz DF, Zalloua P, Koshel S, Ruhlen M, Renfrew C, Wells RS, Tyler-Smith C, Balanovska E and Genographic Consortium

    Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia. balanovsky@inbox.ru

    We analyzed 40 single nucleotide polymorphism and 19 short tandem repeat Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation, and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language coevolution occurred within geographically isolated populations, probably due to its mountainous terrain.

    Funded by: Wellcome Trust: 077009

    Molecular biology and evolution 2011;28;10;2905-20

  • Gene inactivation and its implications for annotation in the era of personal genomics.

    Balasubramanian S, Habegger L, Frankish A, MacArthur DG, Harte R, Tyler-Smith C, Harrow J and Gerstein M

    Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

    The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.

    Funded by: Wellcome Trust

    Genes & development 2011;25;1;1-10

  • CCR4-associated factor 1 coordinates the expression of Plasmodium falciparum egress and invasion proteins.

    Balu B, Maher SP, Pance A, Chauhan C, Naumov AV, Andrews RM, Ellis PD, Khan SM, Lin JW, Janse CJ, Rayner JC and Adams JH

    Department of Global Health, College of Public Health, University of South Florida, College of Public Health, 3720 Spectrum Blvd., Suite 304, Tampa, FL, USA.

    Coordinated regulation of gene expression is a hallmark of the Plasmodium falciparum asexual blood-stage development cycle. We report that carbon catabolite repressor protein 4 (CCR4)-associated factor 1 (CAF1) is critical in regulating more than 1,000 genes during malaria parasites' intraerythrocytic stages, especially egress and invasion proteins. CAF1 knockout results in mistimed expression, aberrant accumulation and localization of proteins involved in parasite egress, and invasion of new host cells, leading to premature release of predominantly half-finished merozoites, drastically reducing the intraerythrocytic growth rate of the parasite. This study demonstrates that CAF1 of the CCR4-Not complex is a significant gene regulatory mechanism needed for Plasmodium development within the human host.

    Funded by: NIAID NIH HHS: R01 AI094973-01, R01 AI094973-02, R01AI033656, R01AI094973; Wellcome Trust

    Eukaryotic cell 2011;10;9;1257-63

  • Association of genetic Loci with glucose levels in childhood and adolescence: a meta-analysis of over 6,000 children.

    Barker A, Sharp SJ, Timpson NJ, Bouatia-Naji N, Warrington NM, Kanoni S, Beilin LJ, Brage S, Deloukas P, Evans DM, Grontved A, Hassanali N, Lawlor DA, Lecoeur C, Loos RJ, Lye SJ, McCarthy MI, Mori TA, Ndiaye NC, Newnham JP, Ntalla I, Pennell CE, St Pourcain B, Prokopenko I, Ring SM, Sattar N, Visvikis-Siest S, Dedoussis GV, Palmer LJ, Froguel P, Smith GD, Ekelund U, Wareham NJ and Langenberg C

    Medical Research Council Epidemiology Unit, Addenbrooke’s Hospital,Institute of Metabolic Science, Cambridge, U.K.

    Objective: To investigate whether associations of common genetic variants recently identified for fasting glucose or insulin levels in nondiabetic adults are detectable in healthy children and adolescents.

    A total of 16 single nucleotide polymorphisms (SNPs) associated with fasting glucose were genotyped in six studies of children and adolescents of European origin, including over 6,000 boys and girls aged 9-16 years. We performed meta-analyses to test associations of individual SNPs and a weighted risk score of the 16 loci with fasting glucose.

    Results: Nine loci were associated with glucose levels in healthy children and adolescents, with four of these associations reported in previous studies and five reported here for the first time (GLIS3, PROX1, SLC2A2, ADCY5, and CRY2). Effect sizes were similar to those in adults, suggesting age-independent effects of these fasting glucose loci. Children and adolescents carrying glucose-raising alleles of G6PC2, MTNR1B, GCK, and GLIS3 also showed reduced β-cell function, as indicated by homeostasis model assessment of β-cell function. Analysis using a weighted risk score showed an increase [β (95% CI)] in fasting glucose level of 0.026 mmol/L (0.021-0.031) for each unit increase in the score.

    Conclusions: Novel fasting glucose loci identified in genome-wide association studies of adults are associated with altered fasting glucose levels in healthy children and adolescents with effect sizes comparable to adults. In nondiabetic adults, fasting glucose changes little over time, and our results suggest that age-independent effects of fasting glucose loci contribute to long-term interindividual differences in glucose levels from childhood onwards.

    Funded by: NIDDK NIH HHS: R01-DK-077659; Wellcome Trust: 076467, 74882

    Diabetes 2011;60;6;1805-12

  • RNAcentral: A vision for an international database of RNA sequences.

    Bateman A, Agrawal S, Birney E, Bruford EA, Bujnicki JM, Cochrane G, Cole JR, Dinger ME, Enright AJ, Gardner PP, Gautheret D, Griffiths-Jones S, Harrow J, Herrero J, Holmes IH, Huang HD, Kelly KA, Kersey P, Kozomara A, Lowe TM, Marz M, Moxon S, Pruitt KD, Samuelsson T, Stadler PF, Vilella AJ, Vogel JH, Williams KP, Wright MW and Zwieb C

    During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.

    RNA (New York, N.Y.) 2011;17;11;1941-6

  • Characterization of the proteome, diseases and evolution of the human postsynaptic density.

    Bayés A, van de Lagemaat LN, Collins MO, Croning MD, Whittle IR, Choudhary JS and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, UK.

    We isolated the postsynaptic density from human neocortex (hPSD) and identified 1,461 proteins. hPSD mutations cause 133 neurological and psychiatric diseases and were enriched in cognitive, affective and motor phenotypes underpinned by sets of genes. Strong protein sequence conservation in mammalian lineages, particularly in hub proteins, indicates conserved function and organization in primate and rodent models. The hPSD is an important structure for nervous system disease and behavior.

    Funded by: Chief Scientist Office: CZB/4/486; Medical Research Council: G0802238, G0802238(89569); Wellcome Trust: 066717, 077155

    Nature neuroscience 2011;14;1;19-21

  • How I treat essential thrombocythemia.

    Beer PA, Erber WN, Campbell PJ and Green AR

    Cambridge Institute for Medical Research and Department of Haematology, University of Cambridge, Cambridge, United Kingdom.

    In the past 5 years we have witnessed significant advances in both the diagnostic process and optimal therapy for patients with essential thrombocythemia (ET). Insights into the underlying molecular mechanisms have been accompanied by the development of new diagnostic tests and by an improved understanding of the relationship between ET and other related myeloproliferative neoplasms, such as polycythemia vera and primary myelofibrosis. In the first part of this review, we describe how recent molecular and histologic studies can be integrated into a streamlined diagnostic process that is applicable to everyday clinical practice. We also address areas of current diagnostic controversy, including heterogeneity within ET and the phenotypic overlap between ET, polycythemia vera, and primary myelofibrosis. In the second part, we provide an overview of our current approach to the treatment of ET, including risk stratification, choice of cytoreductive agent, and a consideration of special situations such as the pregnant or perioperative patient. Areas of controversy discussed include the identification of those at high risk of complications and therapeutic decisions in the younger patient.

    Funded by: Cancer Research UK; Wellcome Trust: 088340

    Blood 2011;117;5;1472-82

  • Essential thrombocythemia: seeing the wood for the trees Response

    Beer PA, Erber WN, CAMPBELL PJ, Green AR

    Blood. 2011;118;1180-1

  • Genome-wide association scan allowing for epistasis in type 2 diabetes.

    Bell JT, Timpson NJ, Rayner NW, Zeggini E, Frayling TM, Hattersley AT, Morris AP and McCarthy MI

    Wellcome Trust Centre for Human Genetics, University of Oxford, UK. jordana@well.ox.ac.uk

    In the presence of epistasis multilocus association tests of human complex traits can provide powerful methods to detect susceptibility variants. We undertook multilocus analyses in 1924 type 2 diabetes cases and 2938 controls from the Wellcome Trust Case Control Consortium (WTCCC). We performed a two-dimensional genome-wide association (GWA) scan using joint two-locus tests of association including main and epistatic effects in 70,236 markers tagging common variants. We found two-locus association at 79 SNP-pairs at a Bonferroni-corrected P-value = 0.05 (uncorrected P-value = 2.14 × 10⁻¹¹). The 79 pair-wise results always contained rs11196205 in TCF7L2 paired with 79 variants including confirmed variants in FTO, TSPAN8, and CDKAL1, which are associated in the absence of epistasis. However, the majority (82%) of the 79 variants did not have compelling single-locus association signals (P-value = 5 × 10⁻⁴). Analyses conditional on the single-locus effects at TCF7L2 established that the joint two-locus results could be attributed to single-locus association at TCF7L2 alone. Interaction analyses among the peak 80 regions and among 23 previously established diabetes candidate genes identified five SNP-pairs with case-control and case-only epistatic signals. Our results demonstrate the feasibility of systematic scans in GWA data, but confirm that single-locus association can underlie and obscure multilocus findings.

    Funded by: Medical Research Council: G0600705; Wellcome Trust: 076113, 088885, WT081682/Z/06/Z, WT088885/Z/09/Z

    Annals of human genetics 2011;75;1;10-9

  • Metagenomics and the molecular identification of novel viruses.

    Bexfield N and Kellam P

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK. nb289@cam.ac.uk

    There have been rapid recent developments in establishing methods for identifying and characterising viruses associated with animal and human diseases. These methodologies, commonly based on hybridisation or PCR techniques, are combined with advanced sequencing techniques termed 'next generation sequencing'. Allied advances in data analysis, including the use of computational transcriptome subtraction, have also impacted the field of viral pathogen discovery. This review details these molecular detection techniques, discusses their application in viral discovery, and provides an overview of some of the novel viruses discovered. The problems encountered in attributing disease causality to a newly identified virus are also considered.

    Veterinary journal (London, England : 1997) 2011;190;2;191-8

  • The genomic rate of molecular adaptation of the human influenza A virus.

    Bhatt S, Holmes EC and Pybus OG

    Department of Zoology, University of Oxford, Oxford, United Kingdom.

    Quantifying adaptive evolution at the genomic scale is an essential yet challenging aspect of evolutionary biology. Here, we develop a method that extends and generalizes previous approaches to estimate the rate of genomic adaptation in rapidly evolving populations and apply it to a large data set of complete human influenza A virus genome sequences. In accord with previous studies, we observe particularly high rates of adaptive evolution in domain 1 of the viral hemagglutinin (HA1). However, our novel approach also reveals previously unseen adaptation in other viral genes. Notably, we find that the rate of adaptation (per codon per year) is higher in surface residues of the viral neuraminidase than in HA1, indicating strong antibody-mediated selection on the former. We also observed high rates of adaptive evolution in several nonstructural proteins, which may relate to viral evasion of T-cell and innate immune responses. Furthermore, our analysis provides strong quantitative support for the hypothesis that human H1N1 influenza experiences weaker antigenic selection than H3N2. As well as shedding new light on the dynamics and determinants of positive Darwinian selection in influenza viruses, the approach introduced here is applicable to other pathogens for which densely sampled genome sequences are available, and hence is ideally suited to the interpretation of next-generation genome sequencing data.

    Funded by: NIGMS NIH HHS: R01 GM080533

    Molecular biology and evolution 2011;28;9;2443-51

  • Meeting report of the RNA Ontology Consortium January 8-9, 2011.

    Birmingham A, Clemente JC, Desai N, Gilbert J, Gonzalez A, Kyrpides N, Meyer F, Nawrocki E, Sterk P, Stombaugh J, Weinberg Z, Wendel D, Leontis NB, Zirbel C, Knight R and Laederach A

    This report summarizes the proceedings of the structure mapping working group meeting of the RNA Ontology Consortium (ROC), held in Kona, Hawaii on January 8-9, 2011. The ROC hosted this workshop to facilitate collaborations among those researchers formalizing concepts in RNA, those developing RNA-related software, and those performing genome annotation and standardization. The workshop included three software presentations, extended round-table discussions, and the constitution of two new working groups, the first to address the need for better software integration and the second to discuss standardization and benchmarking of existing RNA annotation pipelines. These working groups have subsequently pursued concrete implementation of actions suggested during the discussion. Further information about the ROC and its activities can be found at http://roc.bgsu.edu/.

    Standards in genomic sciences 2011;4;2;252-6

  • Meta-analysis of genome-wide association studies from the CHARGE consortium identifies common variants associated with carotid intima media thickness and plaque.

    Bis JC, Kavousi M, Franceschini N, Isaacs A, Abecasis GR, Schminke U, Post WS, Smith AV, Cupples LA, Markus HS, Schmidt R, Huffman JE, Lehtimäki T, Baumert J, Münzel T, Heckbert SR, Dehghan A, North K, Oostra B, Bevan S, Stoegerer EM, Hayward C, Raitakari O, Meisinger C, Schillert A, Sanna S, Völzke H, Cheng YC, Thorsson B, Fox CS, Rice K, Rivadeneira F, Nambi V, Halperin E, Petrovic KE, Peltonen L, Wichmann HE, Schnabel RB, Dörr M, Parsa A, Aspelund T, Demissie S, Kathiresan S, Reilly MP, Taylor K, Uitterlinden A, Couper DJ, Sitzer M, Kähönen M, Illig T, Wild PS, Orru M, Lüdemann J, Shuldiner AR, Eiriksdottir G, White CC, Rotter JI, Hofman A, Seissler J, Zeller T, Usala G, Ernst F, Launer LJ, D'Agostino RB, O'Leary DH, Ballantyne C, Thiery J, Ziegler A, Lakatta EG, Chilukoti RK, Harris TB, Wolf PA, Psaty BM, Polak JF, Li X, Rathmann W, Uda M, Boerwinkle E, Klopp N, Schmidt H, Wilson JF, Viikari J, Koenig W, Blankenberg S, Newman AB, Witteman J, Heiss G, Duijn Cv, Scuteri A, Homuth G, Mitchell BD, Gudnason V, O'Donnell CJ and CARDIoGRAM Consortium

    Cardiovascular Health Research Unit and Department of Medicine, University of Washington, Seattle, Washington, USA. joshbis@uw.edu

    Carotid intima media thickness (cIMT) and plaque determined by ultrasonography are established measures of subclinical atherosclerosis that each predicts future cardiovascular disease events. We conducted a meta-analysis of genome-wide association data in 31,211 participants of European ancestry from nine large studies in the setting of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. We then sought additional evidence to support our findings among 11,273 individuals using data from seven additional studies. In the combined meta-analysis, we identified three genomic regions associated with common carotid intima media thickness and two different regions associated with the presence of carotid plaque (P < 5 × 10(-8)). The associated SNPs mapped in or near genes related to cellular signaling, lipid metabolism and blood pressure homeostasis, and two of the regions were associated with coronary artery disease (P < 0.006) in the Coronary Artery Disease Genome-Wide Replication and Meta-Analysis (CARDIoGRAM) consortium. Our findings may provide new insight into pathways leading to subclinical atherosclerosis and subsequent cardiovascular events.

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: MC_U127561128; NCRR NIH HHS: M01 RR 16500, M01RR00069, UL1RR025005; NHGRI NIH HHS: HG005581, U01HG004402; NHLBI NIH HHS: HL075366, HL080295, HL084729, HL087652, HL105756, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-85239, N02-HL-6-4278, R01HL086694, R01HL087641, R01HL59367, U01 HL072515-06, Z01 HL006002-01, Z99 HL999999; NIA NIH HHS: AG-023629, AG-027058, AG-15928, AG-20098, AG033193, AG08122, AG16495, N01-AG-1-2109, N01-AG-12100, R01 AG18728; NIDDK NIH HHS: DK063491, P30 DK072488; NIGMS NIH HHS: U01 GM074518-04; NINDS NIH HHS: NS17950; PHS HHS: 268200625226C

    Nature genetics 2011;43;10;940-7

  • Genetic mapping identifies novel highly protective antigens for an apicomplexan parasite.

    Blake DP, Billington KJ, Copestake SL, Oakes RD, Quail MA, Wan KL, Shirley MW and Smith AL

    Institute for Animal Health, Compton, Berkshire, United Kingdom. dblake@rvc.ac.uk

    Apicomplexan parasites are responsible for a myriad of diseases in humans and livestock; yet despite intensive effort, development of effective sub-unit vaccines remains a long-term goal. Antigenic complexity and our inability to identify protective antigens from the pool that induce response are serious challenges in the development of new vaccines. Using a combination of parasite genetics and selective barriers with population-based genetic fingerprinting, we have identified that immunity against the most important apicomplexan parasite of livestock (Eimeria spp.) was targeted against a few discrete regions of the genome. Herein we report the identification of six genomic regions and, within two of those loci, the identification of true protective antigens that confer immunity as sub-unit vaccines. The first of these is an Eimeria maxima homologue of apical membrane antigen-1 (AMA-1) and the second is a previously uncharacterised gene that we have termed 'immune mapped protein-1' (IMP-1). Significantly, homologues of the AMA-1 antigen are protective with a range of apicomplexan parasites including Plasmodium spp., which suggest that there may be some characteristic(s) of protective antigens shared across this diverse group of parasites. Interestingly, homologues of the IMP-1 antigen, which is protective against E. maxima infection, can be identified in Toxoplasma gondii and Neospora caninum. Overall, this study documents the discovery of novel protective antigens using a population-based genetic mapping approach allied with a protection-based screen of candidate genes. The identification of AMA-1 and IMP-1 represents a substantial step towards development of an effective anti-eimerian sub-unit vaccine and raises the possibility of identification of novel antigens for other apicomplexan parasites. Moreover, validation of the parasite genetics approach to identify effective antigens supports its adoption in other parasite systems where legitimate protective antigen identification is difficult.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE01089X1, S20198; Wellcome Trust: 085775/Z/08/Z

    PLoS pathogens 2011;7;2;e1001279

  • Soluble flagellin, FliC, induces an Ag-specific Th2 response, yet promotes T-bet-regulated Th1 clearance of Salmonella typhimurium infection.

    Bobat S, Flores-Langarica A, Hitchcock J, Marshall JL, Kingsley RA, Goodall M, Gil-Cruz C, Serre K, Leyton DL, Letran SE, Gaspal F, Chester R, Chamberlain JL, Dougan G, López-Macías C, Henderson IR, Alexander J, MacLennan IC and Cunningham AF

    MRC Centre for Immune Regulation, University of Birmingham, Birmingham, UK.

    Clearance of disseminated Salmonella infection requires bacterial-specific Th1 cells and IFN-γ production, and Th1-promoting vaccines are likely to help control these infections. Consequently, vaccine design has focused on developing Th1-polarizing adjuvants or Ag that naturally induce Th1 responses. In this study, we show that, in mice, immunization with soluble, recombinant FliC protein flagellin (sFliC) induces Th2 responses as evidenced by Ag-specific GATA-3, IL-4 mRNA, and protein induction in CD62L(lo) CD4(+) T cells without associated IFN-γ production. Despite these Th2 features, sFliC immunization can enhance the development of protective Th1 immunity during subsequent Salmonella infection in an Ab-independent, T-cell-dependent manner. Salmonella infection in sFliC-immunized mice resulted in augmented Th1 responses, with greater bacterial clearance and increased numbers of IFN-γ-producing CD4(+) T cells, despite the early induction of Th2 features to sFliC. The augmented Th1 immunity after sFliC immunization was regulated by T-bet although T-bet is dispensable for primary responses to sFliC. These findings show that there can be flexibility in T-cell responses to some subunit vaccines. These vaccines may induce Th2-type immunity during primary immunization yet promote Th1-dependent responses during later infection. This suggests that designing Th1-inducing subunit vaccines may not always be necessary since this can occur naturally during subsequent infection.

    Funded by: Biotechnology and Biological Sciences Research Council

    European journal of immunology 2011;41;6;1606-18

  • Abdominal aortic aneurysm is associated with a variant in low-density lipoprotein receptor-related protein 1.

    Bown MJ, Jones GT, Harrison SC, Wright BJ, Bumpstead S, Baas AF, Gretarsdottir S, Badger SA, Bradley DT, Burnand K, Child AH, Clough RE, Cockerill G, Hafez H, Scott DJ, Futers S, Johnson A, Sohrabi S, Smith A, Thompson MM, van Bockxmeer FM, Waltham M, Matthiasson SE, Thorleifsson G, Thorsteinsdottir U, Blankensteijn JD, Teijink JA, Wijmenga C, de Graaf J, Kiemeney LA, Assimes TL, McPherson R, CARDIoGRAM Consortium, Global BPgen Consortium, DIAGRAM Consortium, VRCNZ Consortium, Folkersen L, Franco-Cereceda A, Palmen J, Smith AJ, Sylvius N, Wild JB, Refstrup M, Edkins S, Gwilliam R, Hunt SE, Potter S, Lindholt JS, Frikke-Schmidt R, Tybjærg-Hansen A, Hughes AE, Golledge J, Norman PE, van Rij A, Powell JT, Eriksson P, Stefansson K, Thompson JR, Humphries SE, Sayers RD, Deloukas P and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Leicester LE2 7LX, UK. m.bown@le.ac.uk

    Abdominal aortic aneurysm (AAA) is a common cause of morbidity and mortality and has a significant heritability. We carried out a genome-wide association discovery study of 1866 patients with AAA and 5435 controls and replication of promising signals (lead SNP with a p value < 1 × 10(-5)) in 2871 additional cases and 32,687 controls and performed further follow-up in 1491 AAA and 11,060 controls. In the discovery study, nine loci demonstrated association with AAA (p < 1 × 10(-5)). In the replication sample, the lead SNP at one of these loci, rs1466535, located within intron 1 of low-density-lipoprotein receptor-related protein 1 (LRP1) demonstrated significant association (p = 0.0042). We confirmed the association of rs1466535 and AAA in our follow-up study (p = 0.035). In a combined analysis (6228 AAA and 49182 controls), rs1466535 had a consistent effect size and direction in all sample sets (combined p = 4.52 × 10(-10), odds ratio 1.15 [1.10-1.21]). No associations were seen for either rs1466535 or the 12q13.3 locus in independent association studies of coronary artery disease, blood pressure, diabetes, or hyperlipidaemia, suggesting that this locus is specific to AAA. Gene-expression studies demonstrated a trend toward increased LRP1 expression for the rs1466535 CC genotype in arterial tissues; there was a significant (p = 0.029) 1.19-fold (1.04-1.36) increase in LRP1 expression in CC homozygotes compared to TT homozygotes in aortic adventitia. Functional studies demonstrated that rs1466535 might alter a SREBP-1 binding site and influence enhancer activity at the locus. In conclusion, this study has identified a biologically plausible genetic variant associated specifically with AAA, and we suggest that this variant has a possible functional role in LRP1 expression.

    Funded by: British Heart Foundation: FS/11/16/28696, PG/10/001/28098, RG2008/08; Wellcome Trust: 076113, 084695, 085475

    American journal of human genetics 2011;89;5;619-27

  • Joint thrombus and vessel segmentation using dynamic texture likelihoods and shape prior.

    Brieu N, Groher M, Serbanovic-Canic J, Cvejic A, Ouwehand W and Navab N

    Computer Aided Medical Procedures, Technische Universität München, Germany. brieu@in.tum.de

    The segmentation of thrombus and vessel in microscopic image sequences is of high interest for identifying genes linked to cardiovascular diseases. This task is however challenging because of the low contrast and the highly dynamic conditions observed in time-lapse DIC in-vivo microscopic scenes. In this work, we introduce a probabilistic framework for the joint segmentation of thrombus and vessel regions. Modeling the scene with dynamic textures, we derive two likelihood functions to account for both spatial and temporal discrepancies of the motion patterns. A tubular shape prior is moreover introduced to constrain the aortic region. Extensive experiments on microscopic sequences quantitatively show the good performance of our approach.

    Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention 2011;14;Pt 3;579-86

  • TSIDER1, a short and non-autonomous Salivarian trypanosome-specific retroposon related to the ingi6 subclade.

    Bringaud F, Berriman M and Hertz-Fowler C

    Centre de Résonance Magnétique des Systèmes Biologiques, UMR 5536, Université Bordeaux Segalen, CNRS, 146 rue Léo Saignat, 33076 Bordeaux, France. bringaud@rmsb.u-bordeaux2.fr

    Retroposons of the ingi clade are the most abundant transposable elements identified in the trypanosomatid genomes. Some are long autonomous elements (ingi, L1Tc) while others, such as RIME and NARTc, are short non-coding elements that parasitize the retrotransposition machinery of the active autonomous ones for their own mobilization. Here, we identified a new family of short non-autonomous retroposons of the ingi clade, called TSIDER1, which are present in the genome of Salivarian (African) trypanosomes, Trypanosoma brucei, T. congolense and T. vivax, but absent in the T. cruzi and Leishmania spp. genomes and, as such, TSIDER1 is the only retroposon subfamily conserved at the nucleotide level between African trypanosome species. We identified three TvSIDER1 families within the genome of T. vivax and the high level of sequence conservation within the TvSIDER1a and TvSIDER1b groups suggests that they are still active. We propose that TvSIDER1a/b elements are using the Tvingi retrotransposition machinery, as they are preceded by the same conserved pattern characteristic of the ingi6 subclade, which corresponds to the retroposon-encoded endonuclease binding site. In contrast, TcoSIDER1, TbSIDER1 and TvSIDER1c are too divergent to be considered as active retroposons. The relatively low number of SIDER elements identified in the T. congolense (70 copies), T. vivax (32 copies) and T. brucei (22 copies) genomes confirms that trypanosomes have not expanded short transposable elements, which is in contrast to Leishmania spp. (∼2000 copies), where SIDER play a role in the regulation of gene expression.

    Funded by: Wellcome Trust: WT 085775/Z//08/Z

    Molecular and biochemical parasitology 2011;179;1;30-6

  • Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.

    Brosch M, Saunders GI, Frankish A, Collins MO, Yu L, Wright J, Verstraten R, Adams DJ, Harrow J, Choudhary JS and Hubbard T

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).

    Funded by: Cancer Research UK; Wellcome Trust: 077198

    Genome research 2011;21;5;756-67

  • A missense mutation in Fgfr1 causes ear and skull defects in hush puppy mice.

    Calvert JA, Dedos SG, Hawker K, Fleming M, Lewis MA and Steel KP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The hush puppy mouse mutant has been shown previously to have skull and outer, middle, and inner ear defects, and an increase in hearing threshold. The fibroblast growth factor receptor 1 (Fgfr1) gene is located in the region of chromosome 8 containing the mutation. Sequencing of the gene in hush puppy heterozygotes revealed a missense mutation in the kinase domain of the protein (W691R). Homozygotes were found to die during development, at approximately embryonic day 8.5, and displayed a phenotype similar to null mutants. Reverse transcription PCR indicated a decrease in Fgfr1 transcript in heterozygotes and homozygotes. Generation of a construct containing the mutation allowed the function of the mutated receptor to be studied. Immunocytochemistry showed that the mutant receptor protein was present at the cell membrane, suggesting normal expression and trafficking. Measurements of changes in intracellular calcium concentration showed that the mutated receptor could not activate the IP(3) pathway, in contrast to the wild-type receptor, nor could it initiate activation of the Ras/MAP kinase pathway. Thus, the hush puppy mutation in fibroblast growth factor receptor 1 appears to cause a loss of receptor function. The mutant protein appears to have a dominant negative effect, which could be due to it dimerising with the wild-type protein and inhibiting its activity, thus further reducing the levels of functional protein. A dominant modifier, Mhspy, which reduces the effect of the hush puppy mutation on pinna and stapes development, has been mapped to the distal end of chromosome 7 and may show imprinting.

    Funded by: Medical Research Council; Wellcome Trust: 072084, 077189

    Mammalian genome : official journal of the International Mammalian Genome Society 2011;22;5-6;290-305

  • A novel role for PSD-95 in mediating ethanol intoxication, drinking and place preference.

    Camp MC, Feyder M, Ihne J, Palachick B, Hurd B, Karlsson RM, Noronha B, Chen YC, Coba MP, Grant SG and Holmes A

    Section on Behavioral Science and Genetics, Laboratory for Integrative Neuroscience, National Institute on Alcohol Abuse and Alcoholism/NIH, 5625 Fishers Ln., Rockville, MD 20852-1798, USA. campmc@mail.nih.gov

    The synaptic signaling mechanisms mediating the behavioral effects of ethanol (EtOH) remain poorly understood. Post-synaptic density 95 (PSD-95, SAP-90, Dlg4) is a key orchestrator of N-methyl-D-aspartate receptors (NMDAR) and glutamatergic synapses, which are known to be major sites of EtOH's behavioral actions. However, the potential contribution of PSD-95 to EtOH-related behaviors has not been established. Here, we evaluated knockout (KO) mice lacking PSD-95 for multiple measures of sensitivity to the acute intoxicating effects of EtOH (ataxia, hypothermia, sedation/hypnosis), EtOH drinking under conditions of free access and following deprivation, acquisition and long-term retention of EtOH conditioned place preference (CPP) (and lithium chloride-induced conditioned taste aversion), and intoxication-potentiating responses to NMDAR antagonism. PSD-95 KO exhibited increased sensitivity to the sedative/hypnotic, but not ataxic or hypothermic, effects of acute EtOH relative to wild-type controls (WT). PSD-95 KO consumed less EtOH than WT, particularly at higher EtOH concentrations, although increases in KO drinking could be induced by concentration-fading and deprivation. PSD-95 KO showed normal EtOH CPP 1 day after conditioning, but showed significant aversion 2 weeks later. Lithium chloride-induced taste aversion was impaired in PSD-95 KO at both time points. Finally, the EtOH-potentiating effects of the NMDAR antagonist MK-801 were intact in PSD-95 KO at the dose tested. These data reveal a major, novel role for PSD-95 in mediating EtOH behaviors, and add to growing evidence that PSD-95 is a key mediator of the effects of multiple abused drugs.

    Funded by: NIAAA NIH HHS: Z01-AA000411; Wellcome Trust

    Addiction biology 2011;16;3;428-39

  • Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.

    Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom. sc11@sanger.ac.uk

    The diversity in the Plasmodium falciparum genome can be used to explore parasite population dynamics, with practical applications to malaria control. The ability to identify the geographic origin and trace the migratory patterns of parasites with clinically important phenotypes such as drug resistance is particularly relevant. With increasing single-nucleotide polymorphism (SNP) discovery from ongoing Plasmodium genome sequencing projects, a demand for high SNP and sample throughput genotyping platforms for large-scale population genetic studies is required. Low parasitaemias and multiple clone infections present a number of challenges to genotyping P. falciparum. We addressed some of these issues using a custom 384-SNP Illumina GoldenGate assay on P. falciparum DNA from laboratory clones (long-term cultured adapted parasite clones), short-term cultured parasite isolates and clinical (non-cultured isolates) samples from East and West Africa, Southeast Asia and Oceania. Eighty percent of the SNPs (n = 306) produced reliable genotype calls on samples containing as little as 2 ng of total genomic DNA and on whole genome amplified DNA. Analysis of artificial mixtures of laboratory clones demonstrated high genotype calling specificity and moderate sensitivity to call minor frequency alleles. Clear resolution of geographically distinct populations was demonstrated using Principal Components Analysis (PCA), and global patterns of population genetic diversity were consistent with previous reports. These results validate the utility of the platform in performing population genetic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0600718, G19/9; NIAID NIH HHS: R37 AI048071; Wellcome Trust: 090532, 093956

    PloS one 2011;6;6;e20251

  • Differences between Trypanosoma brucei gambiense groups 1 and 2 in their resistance to killing by trypanolytic factor 1.

    Capewell P, Veitch NJ, Turner CM, Raper J, Berriman M, Hajduk SL and MacLeod A

    College of Medical, Veterinary and Biological Sciences, Wellcome Trust Centre for Molecular Parasitology, University of Glasgow, Glasgow, United Kingdom.

    Background: The three sub-species of Trypanosoma brucei are important pathogens of sub-Saharan Africa. T. b. brucei is unable to infect humans due to sensitivity to trypanosome lytic factors (TLF) 1 and 2 found in human serum. T. b. rhodesiense and T. b. gambiense are able to resist lysis by TLF. There are two distinct sub-groups of T. b. gambiense that differ genetically and by human serum resistance phenotypes. Group 1 T. b. gambiense have an invariant phenotype whereas group 2 show variable resistance. Previous data indicated that group 1 T. b. gambiense are resistant to TLF-1 due in-part to reduced uptake of TLF-1 mediated by reduced expression of the TLF-1 receptor (the haptoglobin-hemoglobin receptor (HpHbR)) gene. Here we investigate if this is also true in group 2 parasites.

    Methodology: Isogenic resistant and sensitive group 2 T. b. gambiense were derived and compared to other T. brucei parasites. Both resistant and sensitive lines express the HpHbR gene at similar levels and internalized fluorescently labeled TLF-1 similar fashion to T. b. brucei. Both resistant and sensitive group 2, as well as group 1 T. b. gambiense, internalize recombinant APOL1, but only sensitive group 2 parasites are lysed.

    Conclusions: Our data indicate that, despite group 1 T. b. gambiense avoiding TLF-1, it is resistant to the main lytic component, APOL1. Similarly group 2 T. b. gambiense is innately resistant to APOL1, which could be based on the same mechanism. However, group 2 T. b. gambiense variably displays this phenotype and expression does not appear to correlate with a change in expression site or expression of HpHbR. Thus there are differences in the mechanism of human serum resistance between T. b. gambiense groups 1 and 2.

    Funded by: Biotechnology and Biological Sciences Research Council; NIAID NIH HHS: AI039033, AI041233; Wellcome Trust: 079703, 095201

    PLoS neglected tropical diseases 2011;5;9;e1287

  • Determinants of bluetongue virus virulence in murine models of disease.

    Caporale M, Wash R, Pini A, Savini G, Franchi P, Golder M, Patterson-Kane J, Mertens P, Di Gialleonardo L, Armillotta G, Lelli R, Kellam P and Palmarini M

    Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

    Bluetongue is a major infectious disease of ruminants that is caused by bluetongue virus (BTV). In this study, we analyzed virulence and genetic differences of (i) three BTV field strains from Italy maintained at either a low (L strains) or high (H strains) passage number in cell culture and (ii) three South African "reference" wild-type strains and their corresponding live attenuated vaccine strains. The Italian BTV L strains, in general, were lethal for both newborn NIH-Swiss mice inoculated intracerebrally and adult type I interferon receptor-deficient (IFNAR(-/-)) mice, while the virulence of the H strains was attenuated significantly in both experimental models. Similarly, the South African vaccine strains were not pathogenic for IFNAR(-/-) mice, while the corresponding wild-type strains were virulent. Thus, attenuation of the virulence of the BTV strains used in this study is not mediated by the presence of an intact interferon system. No clear distinction in virulence was observed for the South African BTV strains in newborn NIH-Swiss mice. Full genomic sequencing revealed relatively few amino acid substitutions, scattered in several different viral proteins, for the strains found to be attenuated in mice compared to the pathogenic related strains. However, only the genome segments encoding VP1, VP2, and NS2 consistently showed nonsynonymous changes between all virulent and attenuated strain pairs. This study established an experimental platform for investigating the determinants of BTV virulence. Future studies using reverse genetics will allow researchers to precisely map and "weight" the relative influences of the various genome segments and viral proteins on BTV virulence.

    Funded by: Medical Research Council: G0801822; Wellcome Trust

    Journal of virology 2011;85;21;11479-89

  • Microbial sequences benefit health now.

    Cartwright EJ, Köser CU and Peacock SJ

    Nature 2011;471;7340;578

  • A modified vaccinia Ankara virus (MVA) vaccine expressing African horse sickness virus (AHSV) VP2 protects against AHSV challenge in an IFNAR -/- mouse model.

    Castillo-Olivares J, Calvo-Pinilla E, Casanova I, Bachanek-Bankowska K, Chiam R, Maan S, Nieto JM, Ortego J and Mertens PP

    Institute for Animal Health, Pirbright, Woking, Surrey, United Kingdom. javier.castillo-olivares@bbsrc.ac.uk

    African horse sickness (AHS) is a lethal viral disease of equids, which is transmitted by Culicoides midges that become infected after biting a viraemic host. The use of live attenuated vaccines has been vital for the control of this disease in endemic regions. However, there are safety concerns over their use in non-endemic countries. Research efforts over the last two decades have therefore focused on developing alternative vaccines based on recombinant baculovirus or live viral vectors expressing structural components of the AHS virion. However, ethical and financial considerations, relating to the use of infected horses in high biosecurity installations, have made progress very slow. We have therefore assessed the potential of an experimental mouse-model for AHSV infection for vaccine and immunology research. We initially characterised AHSV infection in this model, then tested the protective efficacy of a recombinant vaccine based on modified vaccinia Ankara expressing AHS-4 VP2 (MVA-VP2).

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/00654

    PloS one 2011;6;1;e16503

  • The impact of recombination on dN/dS within recently emerged bacterial clones.

    Castillo-Ramírez S, Harris SR, Holden MT, He M, Parkhill J, Bentley SD and Feil EJ

    Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom.

    The development of next-generation sequencing platforms is set to reveal an unprecedented level of detail on short-term molecular evolutionary processes in bacteria. Here we re-analyse genome-wide single nucleotide polymorphism (SNP) datasets for recently emerged clones of methicillin resistant Staphylococcus aureus (MRSA) and Clostridium difficile. We note a highly significant enrichment of synonymous SNPs in those genes which have been affected by recombination, i.e. those genes on mobile elements designated "non-core" (in the case of S. aureus), or those core genes which have been affected by homologous replacements (S. aureus and C. difficile). This observation suggests that the previously documented decrease in dN/dS over time in bacteria applies not only to genomes of differing levels of divergence overall, but also to horizontally acquired genes of differing levels of divergence within a single genome. We also consider the role of increased drift acting on recently emerged, highly specialised clones, and the impact of recombination on selection at linked sites. This work has implications for a wide range of genomic analyses.

    PLoS pathogens 2011;7;7;e1002129

  • Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma.

    Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, Holm H, Sanna S, Kavousi M, Baumeister SE, Coin LJ, Deng G, Gieger C, Heard-Costa NL, Hottenga JJ, Kühnel B, Kumar V, Lagou V, Liang L, Luan J, Vidal PM, Mateo Leach I, O'Reilly PF, Peden JF, Rahmioglu N, Soininen P, Speliotes EK, Yuan X, Thorleifsson G, Alizadeh BZ, Atwood LD, Borecki IB, Brown MJ, Charoen P, Cucca F, Das D, de Geus EJ, Dixon AL, Döring A, Ehret G, Eyjolfsson GI, Farrall M, Forouhi NG, Friedrich N, Goessling W, Gudbjartsson DF, Harris TB, Hartikainen AL, Heath S, Hirschfield GM, Hofman A, Homuth G, Hyppönen E, Janssen HL, Johnson T, Kangas AJ, Kema IP, Kühn JP, Lai S, Lathrop M, Lerch MM, Li Y, Liang TJ, Lin JP, Loos RJ, Martin NG, Moffatt MF, Montgomery GW, Munroe PB, Musunuru K, Nakamura Y, O'Donnell CJ, Olafsson I, Penninx BW, Pouta A, Prins BP, Prokopenko I, Puls R, Ruokonen A, Savolainen MJ, Schlessinger D, Schouten JN, Seedorf U, Sen-Chowdhry S, Siminovitch KA, Smit JH, Spector TD, Tan W, Teslovich TM, Tukiainen T, Uitterlinden AG, Van der Klauw MM, Vasan RS, Wallace C, Wallaschofski H, Wichmann HE, Willemsen G, Würtz P, Xu C, Yerges-Armstrong LM, Alcohol Genome-wide Association (AlcGen) Consortium, Diabetes Genetics Replication and Meta-analyses (DIAGRAM+) Study, Genetic Investigation of Anthropometric Traits (GIANT) Consortium, Global Lipids Genetics Consortium, Genetics of Liver Disease (GOLD) Consortium, International Consortium for Blood Pressure (ICBP-GWAS), Meta-analyses of Glucose and Insulin-Related Traits Consortium (MAGIC), Abecasis GR, Ahmadi KR, Boomsma DI, Caulfield M, Cookson WO, van Duijn CM, Froguel P, Matsuda K, McCarthy MI, Meisinger C, Mooser V, Pietiläinen KH, Schumann G, Snieder H, Sternberg MJ, Stolk RP, Thomas HC, Thorsteinsdottir U, Uda M, Waeber G, Wareham NJ, Waterworth DM, Watkins H, Whitfield JB, Witteman JC, Wolffenbuttel BH, Fox CS, Ala-Korpela M, Stefansson K, Vollenweider P, Völzke H, Schadt EE, Scott J, Järvelin MR, Elliott P and Kooner JS

    Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London, UK. john.chambers@ic.ac.uk

    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.

    Funded by: British Heart Foundation: FS/10/011/27881, PG/09/002/26056, PG/09/023/26806, RG/07/008/23674; Medical Research Council: G0401527, G0601653, G0601966, G0700931, G0701863, G0902037, G1000143, G19/35, G8802774, G9521010, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647; NIAAA NIH HHS: K05 AA017688; NIDDK NIH HHS: Z99 DK999999, ZIA DK075013-05, ZIA DK075013-07; Wellcome Trust: 090532

    Nature genetics 2011;43;11;1131-8

  • Common variants show predicted polygenic effects on height in the tails of the distribution, except in extremely short individuals.

    Chan Y, Holmen OL, Dauber A, Vatten L, Havulinna AS, Skorpen F, Kvaløy K, Silander K, Nguyen TT, Willer C, Boehnke M, Perola M, Palotie A, Salomaa V, Hveem K, Frayling TM, Hirschhorn JN and Weedon MN

    Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America.

    Common genetic variants have been shown to explain a fraction of the inherited variation for many common diseases and quantitative traits, including height, a classic polygenic trait. The extent to which common variation determines the phenotype of highly heritable traits such as height is uncertain, as is the extent to which common variation is relevant to individuals with more extreme phenotypes. To address these questions, we studied 1,214 individuals from the top and bottom extremes of the height distribution (tallest and shortest ∼1.5%), drawn from ∼78,000 individuals from the HUNT and FINRISK cohorts. We found that common variants still influence height at the extremes of the distribution: common variants (49/141) were nominally associated with height in the expected direction more often than is expected by chance (p<5×10⁻²⁸), and the odds ratios in the extreme samples were consistent with the effects estimated previously in population-based data. To examine more closely whether the common variants have the expected effects, we calculated a weighted allele score (WAS), which is a weighted prediction of height for each individual based on the previously estimated effect sizes of the common variants in the overall population. The average WAS is consistent with expectation in the tall individuals, but was not as extreme as expected in the shortest individuals (p<0.006), indicating that some of the short stature is explained by factors other than common genetic variation. The discrepancy was more pronounced (p<10⁻⁶) in the most extreme individuals (height<0.25 percentile). The results at the extreme short tails are consistent with a large number of models incorporating either rare genetic non-additive or rare non-genetic factors that decrease height. We conclude that common genetic variants are associated with height at the extremes as well as across the population, but that additional factors become more prominent at the shorter extreme.

    Funded by: NIDDK NIH HHS: 1R01DK075787; Wellcome Trust: 085301/Z/08/Z

    PLoS genetics 2011;7;12;e1002439

  • Antimicrobial resistance to ceftazidime involving loss of penicillin-binding protein 3 in Burkholderia pseudomallei.

    Chantratita N, Rholl DA, Sim B, Wuthiekanun V, Limmathurotsakul D, Amornchai P, Thanwisai A, Chua HH, Ooi WF, Holden MT, Day NP, Tan P, Schweizer HP and Peacock SJ

    Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand.

    Known mechanisms of resistance to β-lactam antibiotics include β-lactamase expression, altered drug target, decreased bacterial permeability, and increased drug efflux. Here, we describe a unique mechanism of β-lactam resistance in the biothreat organism Burkholderia pseudomallei (the cause of melioidosis), associated with treatment failure during prolonged ceftazidime therapy of natural infection. Detailed comparisons of the initial ceftazidime-susceptible infecting isolate and subsequent ceftazidime-resistant variants from six patients led us to identify a common, large-scale genomic loss involving a minimum of 49 genes in all six resistant strains. Mutational analysis of wild-type B. pseudomallei demonstrated that ceftazidime resistance was due to deletion of a gene encoding a penicillin-binding protein 3 (BPSS1219) present within the region of genomic loss. The clinical ceftazidime-resistant variants failed to grow using commonly used laboratory culture media, including commercial blood cultures, rendering the variants almost undetectable in the diagnostic laboratory. Melioidosis is notoriously difficult to cure and clinical treatment failure is common in patients treated with ceftazidime, the drug of first choice across most of Southeast Asia where the majority of cases are reported. The mechanism described here represents an explanation for ceftazidime treatment failure, and may be a frequent but undetected resistance event.

    Funded by: NIAID NIH HHS: AI065357; Wellcome Trust: 087769/Z/08/

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;41;17165-70

  • Defining the power limits of genome-wide association scan meta-analyses.

    Chapman K, Ferreira T, Morris A, Asimit J and Zeggini E

    Wellcome Trust Centre for Human Genetics, Roosevelt Drive, University of Oxford, Oxford, United Kingdom.

    Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits of GWAS meta-analyses, in the presence and absence of modest levels of heterogeneity and across a range of different allelic architectures. We find that data combination through large-scale collaboration is vital in the quest for complex trait susceptibility loci, but that effect size heterogeneity across meta-analyzed studies drawn from similar populations does not appear to have a profound effect on sample size requirements.

    Funded by: Wellcome Trust: 088885, 090532, WT079557MA, WT081682/Z/06/Z, WT088885/Z/09/Z

    Genetic epidemiology 2011;35;8;781-9

  • Expressions of individuality.

    Chappell L

    Nature reviews. Microbiology 2011;9;10;701

  • Genome-wide association study reveals three susceptibility loci for common migraine in the general population.

    Chasman DI, Schürks M, Anttila V, de Vries B, Schminke U, Launer LJ, Terwindt GM, van den Maagdenberg AM, Fendrich K, Völzke H, Ernst F, Griffiths LR, Buring JE, Kallela M, Freilinger T, Kubisch C, Ridker PM, Palotie A, Ferrari MD, Hoffmann W, Zee RY and Kurth T

    Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

    Migraine is a common, heterogeneous and heritable neurological disorder. Its pathophysiology is incompletely understood, and its genetic influences at the population level are unknown. In a population-based genome-wide analysis including 5,122 migraineurs and 18,108 non-migraineurs, rs2651899 (1p36.32, PRDM16), rs10166942 (2q37.1, TRPM8) and rs11172113 (12q13.3, LRP1) were among the top seven associations (P < 5 × 10(-6)) with migraine. These SNPs were significant in a meta-analysis among three replication cohorts and met genome-wide significance in a meta-analysis combining the discovery and replication cohorts (rs2651899, odds ratio (OR) = 1.11, P = 3.8 × 10(-9); rs10166942, OR = 0.85, P = 5.5 × 10(-12); and rs11172113, OR = 0.90, P = 4.3 × 10(-9)). The associations at rs2651899 and rs10166942 were specific for migraine compared with non-migraine headache. None of the three SNP associations was preferential for migraine with aura or without aura, nor were any associations specific for migraine features. TRPM8 has been the focus of neuropathic pain models, whereas LRP1 modulates neuronal glutamate signaling, plausibly linking both genes to migraine pathophysiology.

    Funded by: NCI NIH HHS: CA-47988, R01 CA047988, R01 CA047988-21; NHLBI NIH HHS: HL-043851, HL-080467, HL-099355, R01 HL043851, R01 HL043851-10, R01 HL080467, R01 HL080467-05, RC1 HL099355, RC1 HL099355-02; NINDS NIH HHS: NS-061836, R01 NS061836, R01 NS061836-03

    Nature genetics 2011;43;7;695-8

  • Population genetic structure in Indian Austroasiatic speakers: the role of landscape barriers and sex-specific admixture.

    Chaubey G, Metspalu M, Choi Y, Mägi R, Romero IG, Soares P, van Oven M, Behar DM, Rootsi S, Hudjashov G, Mallick CB, Karmin M, Nelis M, Parik J, Reddy AG, Metspalu E, van Driem G, Xue Y, Tyler-Smith C, Thangaraj K, Singh L, Remm M, Richards MB, Lahr MM, Kayser M, Villems R and Kivisild T

    Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia.

    The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and "structure-like" analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components-one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.

    Funded by: Wellcome Trust: 077009

    Molecular biology and evolution 2011;28;2;1013-24

  • Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.

    Chaudhuri RR, Yu L, Kanji A, Perkins TT, Gardner PP, Choudhary J, Maskell DJ and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.

    Funded by: Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z

    Microbiology (Reading, England) 2011;157;Pt 10;2922-32

  • Genetic screens using the piggyBac transposon.

    Chew SK, Rad R, Futreal PA, Bradley A and Liu P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    Transposons are an attractive system to use in genetic screens as they are molecularly tractable and the disrupted loci that give rise to the desired phenotype are easily mapped. We consider herein the characteristics of the piggyBac transposon system in complementing existing mammalian screen strategies, including the Sleeping Beauty transposon system. We also describe the design of the piggyBac resources that we have developed for both forward and reverse genetic screens, and the protocols we use in these experiments.

    Funded by: Wellcome Trust

    Methods (San Diego, Calif.) 2011;53;4;366-71

  • Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute.

    Chiang GT, Clapham P, Qi G, Sale K and Coates G

    Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK. gtc@sanger.ac.uk

    Background: Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data.

    Results: We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data.The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced.

    Conclusions: iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.

    BMC bioinformatics 2011;12;361

  • Cancer genomics: from discovery science to personalized medicine.

    Chin L, Andersen JN and Futreal PA

    Belfer Institute for Applied Cancer Science, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. lynda_chin@dfci.harvard.edu

    Recent advances in genome technologies and the ensuing outpouring of genomic information related to cancer have accelerated the convergence of discovery science and clinical medicine. Successful examples of translating cancer genomics into therapeutics and diagnostics reinforce its potential to make possible personalized cancer medicine. However, the bottlenecks along the path of converting a genome discovery into a tangible clinical endpoint are numerous and formidable. In this Perspective, we emphasize the importance of establishing the biological relevance of a cancer genomic discovery in realizing its clinical potential and discuss some of the major obstacles to moving from the bench to the bedside.

    Nature medicine 2011;17;3;297-303

  • A deep sequencing approach to comparatively analyze the transcriptome of lifecycle stages of the filarial worm, Brugia malayi.

    Choi YJ, Ghedin E, Berriman M, McQuillan J, Holroyd N, Mayhew GF, Christensen BM and Michalski ML

    Department of Pathobiological Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA.

    Background: Developing intervention strategies for the control of parasitic nematodes continues to be a significant challenge. Genomic and post-genomic approaches play an increasingly important role for providing fundamental molecular information about these parasites, thus enhancing basic as well as translational research. Here we report a comprehensive genome-wide survey of the developmental transcriptome of the human filarial parasite Brugia malayi.

    Using deep sequencing, we profiled the transcriptome of eggs and embryos, immature (≤3 days of age) and mature microfilariae (MF), third- and fourth-stage larvae (L3 and L4), and adult male and female worms. Comparative analysis across these stages provided a detailed overview of the molecular repertoires that define and differentiate distinct lifecycle stages of the parasite. Genome-wide assessment of the overall transcriptional variability indicated that the cuticle collagen family and those implicated in molting exhibit noticeably dynamic stage-dependent patterns. Of particular interest was the identification of genes displaying sex-biased or germline-enriched profiles due to their potential involvement in reproductive processes. The study also revealed discrete transcriptional changes during larval development, namely those accompanying the maturation of MF and the L3 to L4 transition that are vital in establishing successful infection in mosquito vectors and vertebrate hosts, respectively.

    Characterization of the transcriptional program of the parasite's lifecycle is an important step toward understanding the developmental processes required for the infectious cycle. We find that the transcriptional program has a number of stage-specific pathways activated during worm development. In addition to advancing our understanding of transcriptome dynamics, these data will aid in the study of genome structure and organization by facilitating the identification of novel transcribed elements and splice variants.

    Funded by: NIAID NIH HHS: AI019769, AI067295

    PLoS neglected tropical diseases 2011;5;12;e1409

  • Modernizing reference genome assemblies.

    Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen HC, Agarwala R, McLaren WM, Ritchie GR, Albracht D, Kremitzki M, Rock S, Kotkiewicz H, Kremitzki C, Wollam A, Trani L, Fulton L, Fulton R, Matthews L, Whitehead S, Chow W, Torrance J, Dunn M, Harden G, Threadgold G, Wood J, Collins J, Heath P, Griffiths G, Pelan S, Grafham D, Eichler EE, Weinstock G, Mardis ER, Wilson RK, Howe K, Flicek P and Hubbard T

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America. church@ncbi.nlm.nih.gov

    Funded by: Wellcome Trust: 077198, 095908

    PLoS biology 2011;9;7;e1001091

  • Single nucleotide polymorphism (SNP) panels for rapid positional cloning in zebrafish.

    Clark MD, Guryev V, Bruijn Ed, Nijman IJ, Tada M, Wilson C, Deloukas P, Postlethwait JH, Cuppen E and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Despite considerable genetic and genomic resources the positional cloning of forward mutations remains a slow and manually intensive task, typically using gel based genotyping and sequential rounds of mapping. We have used the latest genetic resources and genotyping technologies to develop two commercially available SNP panels of thousands of markers that can be used to speed up positional cloning.

    Methods in cell biology 2011;104;219-35

  • Basic statistical analysis in genetic case-control studies.

    Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP and Zondervan KT

    Genetic and Genomic Epidemiology Unit, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. gclarke@well.ox.ac.uk

    This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of measures of association and relevance of disease models; (ii) appropriate selection of tests of association; (iii) visualization and interpretation of results; (iv) consideration of appropriate methods to control for multiple testing; and (v) replication strategies. Assuming no previous experience with software such as PLINK, R or Haploview, we describe how to use these popular tools for handling single-nucleotide polymorphism data in order to carry out tests of association and visualize and interpret results. This protocol assumes that data quality assessment and control has been performed, as described in a previous protocol, so that samples and markers deemed to have the potential to introduce bias to the study have been identified and removed. Study design, marker selection and quality control of case-control studies have also been discussed in earlier protocols. The protocol should take ~1 h to complete.

    Funded by: Wellcome Trust: 081682, 085235, WT91745/Z/10/Z

    Nature protocols 2011;6;2;121-33

  • The GENCODE exome: sequencing the complete human exome.

    Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, Lehesjoki AE, Turner DJ, Hubbard TJ and Palotie A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing.

    Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT062023, WT077198, WT089062

    European journal of human genetics : EJHG 2011;19;7;827-31

  • A world in a grain of sand: human history from genetic data.

    Colonna V, Pagani L, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Genome-wide genotypes and sequences are enriching our understanding of the past 50,000 years of human history and providing insights into earlier periods largely inaccessible to mitochondrial DNA and Y-chromosomal studies.To see a world in a grain of sand ...William Blake, Auguries of Innocence.

    Funded by: Wellcome Trust

    Genome biology 2011;12;11;234

  • Variation in genome-wide mutation rates within and between human families.

    Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, Zilversmit M, Cartwright R, Rouleau GA, Daly M, Stone EA, Hurles ME, Awadalla P and 1000 Genomes Project

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.

    Funded by: NHGRI NIH HHS: R01 HG004960; NIGMS NIH HHS: R01 GM070806; Wellcome Trust: 077014, 077014/Z/05/Z, 085532, 090532

    Nature genetics 2011;43;7;712-4

  • Inherited variation in vitamin D genes is associated with predisposition to autoimmune disease type 1 diabetes.

    Cooper JD, Smyth DJ, Walker NM, Stevens H, Burren OS, Wallace C, Greissl C, Ramos-Lopez E, Hyppönen E, Dunger DB, Spector TD, Ouwehand WH, Wang TJ, Badenhoop K and Todd JA

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.

    Objective: Vitamin D deficiency (25-hydroxyvitamin D [25(OH)D] <50 nmol/L) is commonly reported in both children and adults worldwide, and growing evidence indicates that vitamin D deficiency is associated with many extraskeletal chronic disorders, including the autoimmune diseases type 1 diabetes and multiple sclerosis.

    We measured 25(OH)D concentrations in 720 case and 2,610 control plasma samples and genotyped single nucleotide polymorphisms from seven vitamin D metabolism genes in 8,517 case, 10,438 control, and 1,933 family samples. We tested genetic variants influencing 25(OH)D metabolism for an association with both circulating 25(OH)D concentrations and disease status.

    Results: Type 1 diabetic patients have lower circulating levels of 25(OH)D than similarly aged subjects from the British population. Only 4.3 and 18.6% of type 1 diabetic patients reached optimal levels (≥75 nmol/L) of 25(OH)D for bone health in the winter and summer, respectively. We replicated the associations of four vitamin D metabolism genes (GC, DHCR7, CYP2R1, and CYP24A1) with 25(OH)D in control subjects. In addition to the previously reported association between type 1 diabetes and CYP27B1 (P = 1.4 × 10(-4)), we obtained consistent evidence of type 1 diabetes being associated with DHCR7 (P = 1.2 × 10(-3)) and CYP2R1 (P = 3.0 × 10(-3)).

    Conclusions: Circulating levels of 25(OH)D in children and adolescents with type 1 diabetes vary seasonally and are under the same genetic control as in the general population but are much lower. Three key 25(OH)D metabolism genes show consistent evidence of association with type 1 diabetes risk, indicating a genetic etiological role for vitamin D deficiency in type 1 diabetes.

    Funded by: Department of Health; Medical Research Council: G0000934, G0601653; Wellcome Trust: 061858, 068545/Z/02, 076113, 076113/C/04/Z, 079895

    Diabetes 2011;60;5;1624-31

  • A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease.

    Coronary Artery Disease (C4D) Genetics Consortium

    Genome-wide association studies have identified 11 common variants convincingly associated with coronary artery disease (CAD)¹⁻⁷, a modest number considering the apparent heritability of CAD⁸. All of these variants have been discovered in European populations. We report a meta-analysis of four large genome-wide association studies of CAD, with ∼575,000 genotyped SNPs in a discovery dataset comprising 15,420 individuals with CAD (cases) (8,424 Europeans and 6,996 South Asians) and 15,062 controls. There was little evidence for ancestry-specific associations, supporting the use of combined analyses. Replication in an independent sample of 21,408 cases and 19,185 controls identified five loci newly associated with CAD (P < 5 × 10⁻⁸ in the combined discovery and replication analysis): LIPA on 10q23, PDGFD on 11q22, ADAMTS7-MORF4L1 on 15q25, a gene rich locus on 7q22 and KIAA1462 on 10p11. The CAD-associated SNP in the PDGFD locus showed tissue-specific cis expression quantitative trait locus effects. These findings implicate new pathways for CAD susceptibility.

    Funded by: British Heart Foundation: RG/08/014/24067, SP/08/010/25939; Cancer Research UK: 10293; Medical Research Council: G0601966, G0700931, G0801056, G9521010, MC_U137686854, MC_U137686857

    Nature genetics 2011;43;4;339-44

  • Bamako 2009 conference on the bioinformatics of infectious diseases.

    Corpas M, Doumbia S, Gascuel O and Mulder N

    Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 2011;11;4;695-7

  • Pervasive sharing of genetic effects in autoimmune disease.

    Cotsapas C, Voight BF, Rossin E, Lage K, Neale BM, Wallace C, Abecasis GR, Barrett JC, Behrens T, Cho J, De Jager PL, Elder JT, Graham RR, Gregersen P, Klareskog L, Siminovitch KA, van Heel DA, Wijmenga C, Worthington J, Todd JA, Hafler DA, Rich SS, Daly MJ and FOCiS Network of Consortia

    Center For Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America.

    Genome-wide association (GWA) studies have identified numerous, replicable, genetic associations between common single nucleotide polymorphisms (SNPs) and risk of common autoimmune and inflammatory (immune-mediated) diseases, some of which are shared between two diseases. Along with epidemiological and clinical evidence, this suggests that some genetic risk factors may be shared across diseases-as is the case with alleles in the Major Histocompatibility Locus. In this work we evaluate the extent of this sharing for 107 immune disease-risk SNPs in seven diseases: celiac disease, Crohn's disease, multiple sclerosis, psoriasis, rheumatoid arthritis, systemic lupus erythematosus, and type 1 diabetes. We have developed a novel statistic for Cross Phenotype Meta-Analysis (CPMA) which detects association of a SNP to multiple, but not necessarily all, phenotypes. With it, we find evidence that 47/107 (44%) immune-mediated disease risk SNPs are associated to multiple-but not all-immune-mediated diseases (SNP-wise P(CPMA)<0.01). We also show that distinct groups of interacting proteins are encoded near SNPs which predispose to the same subsets of diseases; we propose these as the mechanistic basis of shared disease risk. We are thus able to leverage genetic data across diseases to construct biological hypotheses about the underlying mechanism of pathogenesis.

    Funded by: Arthritis Research UK: 17552

    PLoS genetics 2011;7;8;e1002254

  • Complete sequence and molecular epidemiology of IncK epidemic plasmid encoding blaCTX-M-14.

    Cottell JL, Webber MA, Coldham NG, Taylor DL, Cerdeño-Tárraga AM, Hauser H, Thomson NR, Woodward MJ and Piddock LJ

    The University of Birmingham, Birmingham, UK.

    Antimicrobial drug resistance is a global challenge for the 21st century with the emergence of resistant bacterial strains worldwide. Transferable resistance to β-lactam antimicrobial drugs, mediated by production of extended-spectrum β-lactamases (ESBLs), is of particular concern. In 2004, an ESBL-carrying IncK plasmid (pCT) was isolated from cattle in the United Kingdom. The sequence was a 93,629-bp plasmid encoding a single antimicrobial drug resistance gene, blaCTX-M-14. From this information, PCRs identifying novel features of pCT were designed and applied to isolates from several countries, showing that the plasmid has disseminated worldwide in bacteria from humans and animals. Complete DNA sequences can be used as a platform to develop rapid epidemiologic tools to identify and trace the spread of plasmids in clinically relevant pathogens, thus facilitating a better understanding of their distribution and ability to transfer between bacteria of humans and animals.

    Emerging infectious diseases 2011;17;4;645-52

  • Basigin is a receptor essential for erythrocyte invasion by Plasmodium falciparum.

    Crosnier C, Bustamante LY, Bartholdson SJ, Bei AK, Theron M, Uchikawa M, Mboup S, Ndir O, Kwiatkowski DP, Duraisingh MT, Rayner JC and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Erythrocyte invasion by Plasmodium falciparum is central to the pathogenesis of malaria. Invasion requires a series of extracellular recognition events between erythrocyte receptors and ligands on the merozoite, the invasive form of the parasite. None of the few known receptor-ligand interactions involved are required in all parasite strains, indicating that the parasite is able to access multiple redundant invasion pathways. Here, we show that we have identified a receptor-ligand pair that is essential for erythrocyte invasion in all tested P. falciparum strains. By systematically screening a library of erythrocyte proteins, we have found that the Ok blood group antigen, basigin, is a receptor for PfRh5, a parasite ligand that is essential for blood stage growth. Erythrocyte invasion was potently inhibited by soluble basigin or by basigin knockdown, and invasion could be completely blocked using low concentrations of anti-basigin antibodies; importantly, these effects were observed across all laboratory-adapted and field strains tested. Furthermore, Ok(a-) erythrocytes, which express a basigin variant that has a weaker binding affinity for PfRh5, had reduced invasion efficiencies. Our discovery of a cross-strain dependency on a single extracellular receptor-ligand pair for erythrocyte invasion by P. falciparum provides a focus for new anti-malarial therapies.

    Funded by: Medical Research Council: G19/9; NCEZID CDC HHS: R36 CK000119-01; NIAID NIH HHS: 2T32 AI007535-12, R01 AI057919, R01 AI057919-05, R01AI057919; Wellcome Trust: 077108, 089084, 090532

    Nature 2011;480;7378;534-7

  • Disruption of mouse Slx4, a regulator of structure-specific nucleases, phenocopies Fanconi anemia.

    Crossan GP, van der Weyden L, Rosado IV, Langevin F, Gaillard PH, McIntyre RE, Sanger Mouse Genetics Project, Gallagher F, Kettunen MI, Lewis DY, Brindle K, Arends MJ, Adams DJ and Patel KJ

    Medical Research Council, Laboratory of Molecular Biology, Cambridge, UK.

    The evolutionarily conserved SLX4 protein, a key regulator of nucleases, is critical for DNA damage response. SLX4 nuclease complexes mediate repair during replication and can also resolve Holliday junctions formed during homologous recombination. Here we describe the phenotype of the Btbd12 knockout mouse, the mouse ortholog of SLX4, which recapitulates many key features of the human genetic illness Fanconi anemia. Btbd12-deficient animals are born at sub-Mendelian ratios, have greatly reduced fertility, are developmentally compromised and are prone to blood cytopenias. Btbd12(-/-) cells prematurely senesce, spontaneously accumulate damaged chromosomes and are particularly sensitive to DNA crosslinking agents. Genetic complementation reveals a crucial requirement for Btbd12 (also known as Slx4) to interact with the structure-specific endonuclease Xpf-Ercc1 to promote crosslink repair. The Btbd12 knockout mouse therefore establishes a disease model for Fanconi anemia and genetically links a regulator of nuclease incision complexes to the Fanconi anemia DNA crosslink repair pathway.

    Funded by: Cancer Research UK: 12401, A11073, A11376, A12401, A8449; Medical Research Council: MC_U105178811, U.1051.03.009(78811); Wellcome Trust: 098051

    Nature genetics 2011;43;2;147-52

  • Rapid pneumococcal evolution in response to clinical interventions.

    Croucher NJ, Harris SR, Fraser C, Quail MA, Burton J, van der Linden M, McGee L, von Gottberg A, Song JH, Ko KS, Pichon B, Baker S, Parry CM, Lambertsen LM, Shahinas D, Pillai DR, Mitchell TJ, Dougan G, Tomasz A, Klugman KP, Parkhill J, Hanage WP and Bentley SD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Epidemiological studies of the naturally transformable bacterial pathogen Streptococcus pneumoniae have previously been confounded by high rates of recombination. Sequencing 240 isolates of the PMEN1 (Spain(23F)-1) multidrug-resistant lineage enabled base substitutions to be distinguished from polymorphisms arising through horizontal sequence transfer. More than 700 recombinations were detected, with genes encoding major antigens frequently affected. Among these were 10 capsule-switching events, one of which accompanied a population shift as vaccine-escape serotype 19A isolates emerged in the USA after the introduction of the conjugate polysaccharide vaccine. The evolution of resistance to fluoroquinolones, rifampicin, and macrolides was observed to occur on multiple occasions. This study details how genomic plasticity within lineages of recombinogenic bacteria can permit adaptation to clinical interventions over remarkably short time scales.

    Funded by: Medical Research Council: G0800596; Wellcome Trust: 076962, 076964

    Science (New York, N.Y.) 2011;331;6016;430-4

  • Identification, variation and transcription of pneumococcal repeat sequences.

    Croucher NJ, Vernikos GS, Parkhill J and Bentley SD

    Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. nc3@sanger.ac.uk

    Background: Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics.

    Results: Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR.

    Conclusions: BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/.

    Funded by: Wellcome Trust

    BMC genomics 2011;12;120

  • Assessing the complex architecture of polygenic traits in diverged yeast populations.

    Cubillos FA, Billi E, Zörgö E, Parts L, Fargier P, Omholt S, Blomberg A, Warringer J, Louis EJ and Liti G

    Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, UK.

    Phenotypic variation arising from populations adapting to different niches has a complex underlying genetic architecture. A major challenge in modern biology is to identify the causative variants driving phenotypic variation. Recently, the baker's yeast, Saccharomyces cerevisiae has emerged as a powerful model for dissecting complex traits. However, past studies using a laboratory strain were unable to reveal the complete architecture of polygenic traits. Here, we present a linkage study using 576 recombinant strains obtained from crosses of isolates representative of the major lineages. The meiotic recombinational landscape appears largely conserved between populations; however, strain-specific hotspots were also detected. Quantitative measurements of growth in 23 distinct ecologically relevant environments show that our recombinant population recapitulates most of the standing phenotypic variation described in the species. Linkage analysis detected an average of 6.3 distinct QTLs for each condition tested in all crosses, explaining on average 39% of the phenotypic variation. The QTLs detected are not constrained to a small number of loci, and the majority are specific to a single cross-combination and to a specific environment. Moreover, crosses between strains of similar phenotypes generate greater variation in the offspring, suggesting the presence of many antagonistic alleles and epistatic interactions. We found that subtelomeric regions play a key role in defining individual quantitative variation, emphasizing the importance of the adaptive nature of these regions in natural populations. This set of recombinant strains is a powerful tool for investigating the complex architecture of polygenic traits.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F015216/1, BB/G01616X/1, BBF0152161; Wellcome Trust: WT 084507MA, WT077192 ⁄ Z ⁄ 05 ⁄ Z

    Molecular ecology 2011;20;7;1401-13

  • The role of meis1 in primitive and definitive hematopoiesis during zebrafish development.

    Cvejic A, Serbanovic-Canic J, Stemple DL and Ouwehand WH

    Department of Haematology, University of Cambridge, Long Road, Cambridge CB2 0PT, UK.

    Background: The Meis1 protein represents an important cofactor for Hox and Pbx1 and is implicated in human and murine leukemias. Though much is known about the role of meis1 in leukemogenesis, its function in normal hematopoiesis remains largely unclear. Here we characterized the role of the proto-oncogene, meis1, during zebrafish primitive and definitive hematopoiesis.

    Zebrafish embryos were stained with o-dianisidine to detect hemoglobin-containing cells and Sudan black to quantify neutrophils. The numbers of other cells (scl-, gata1- and alas2-positive cells) were also quantified by measuring the corresponding stained areas of the embryos. We used anti-Meis1 antibody and whole mount immunohistochemistry to determine the pattern of expression of Meis1 during zebrafish development and then analyzed the functional role of Meis1 by knocking-down the meis1 gene.

    Results: Using antisense morpholino oligomers to interrupt meis1 expression we found that, although primitive macrophage development could occur unhampered, posterior erythroid differentiation required meis1, and its absence resulted in a severe decrease in the number of mature erythrocytes. Furthermore a picture emerged that meis1 exerts important effects on later stages of erythrocyte maturation and that these effects are independent of gata1, but under the control of scl. In addition, meis1 morpholino knock-down led to dramatic single arteriovenous tube formation. We also found that knock-down of pbx1 resulted in a phenotype that was strikingly similar to that of meis1 knock-down zebrafish.

    Conclusions: These results imply that meis1, jointly with pbx1, regulates primitive hematopoiesis as well as vascular development.

    Funded by: Wellcome Trust: 077037/Z/05/Z, 077047/Z/05/Z, 082597/Z/07/Z

    Haematologica 2011;96;2;190-8

  • A viral discovery methodology for clinical biopsy samples utilising massively parallel next generation sequencing.

    Daly GM, Bexfield N, Heaney J, Stubbs S, Mayer AP, Palser A, Kellam P, Drou N, Caccamo M, Tiley L, Alexander GJ, Bernal W and Heeney JL

    Department of Veterinary Medicine, The University of Cambridge, Cambridge, United Kingdom.

    Here we describe a virus discovery protocol for a range of different virus genera, that can be applied to biopsy-sized tissue samples. Our viral enrichment procedure, validated using canine and human liver samples, significantly improves viral read copy number and increases the length of viral contigs that can be generated by de novo assembly. This in turn enables the Illumina next generation sequencing (NGS) platform to be used as an effective tool for viral discovery from tissue samples.

    Funded by: Wellcome Trust

    PloS one 2011;6;12;e28879

  • The variant call format and VCFtools.

    Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R and 1000 Genomes Project Analysis Group

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.

    SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net

    Funded by: British Heart Foundation: RG/09/012/28096; NHGRI NIH HHS: 54 HG003067, R01 HG004719, U01 HG005208; Wellcome Trust: 075491/Z/04, RG/09/012/28096

    Bioinformatics (Oxford, England) 2011;27;15;2156-8

  • The effect of next-generation sequencing technology on complex trait research.

    Day-Williams AG and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Background: Advances in the understanding of complex trait genetics have always been enabled by advances in genomic technology. Next-generation sequencing (NGS) is set to revolutionize the way complex trait genetics research is carried out.

    Results: NGS has multiple applications in the field of human genetics, but is accompanied by substantial study design, analysis and interpretation challenges. This review discusses key aspects of study design considerations, data handling issues and required analytical developments. We also highlight early successes in mapping genetic traits using NGS.

    Conclusion: NGS opens the entire spectrum of genomic alterations for the genetic analysis of complex traits and there are early publications illustrating its power. Continuing development in analytical tools will allow the promise of NGS to be realized.

    European journal of clinical investigation 2011;41;5;561-7

  • Linkage analysis without defined pedigrees.

    Day-Williams AG, Blangero J, Dyer TD, Lange K and Sobel EM

    Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095-7088, USA.

    The need to collect accurate and complete pedigree information has been a drawback of family-based linkage and association studies. Even in case-control studies, investigators should be aware of, and condition on, familial relationships. In single nucleotide polymorphism (SNP) genome scans, relatedness can be directly inferred from the genetic data rather than determined through interviews. Various methods of estimating relatedness have previously been implemented, most notably in PLINK. We present new fast and accurate algorithms for estimating global and local kinship coefficients from dense SNP genotypes. These algorithms require only a single pass through the SNP genotype data. We also show that these estimates can be used to cluster individuals into pedigrees. With these estimates in hand, quantitative trait locus linkage analysis proceeds via traditional variance components methods without any prior relationship information. We demonstrate the success of our algorithms on simulated and real data sets. Our procedures make linkage analysis as easy as a typical genomewide association study.

    Funded by: NHGRI NIH HHS: R01 HG006139; NHLBI NIH HHS: P01 HL045522-18; NIGMS NIH HHS: GM053275, R01 GM053275, R01 GM053275-15; NIMH NIH HHS: MH059490, R37 MH059490-12

    Genetic epidemiology 2011;35;5;360-70

  • An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies.

    Day-Williams AG, McLay K, Drury E, Edkins S, Coffey AJ, Palotie A and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.

    Funded by: Wellcome Trust: WT088885/Z/09/Z

    PloS one 2011;6;11;e26279

  • A variant in MCF2L is associated with osteoarthritis.

    Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, Esko T, Estrada K, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, Keis A, Kerkhof HJ, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Loughlin J, McCaskie A, Ollier WE, Ralston SH, Spector TD, Wallis GA, Wilkinson JM, Aslam N, Birell F, Carluke I, Joseph J, Rai A, Reed M, Walker K, arcOGEN Consortium, Doherty SA, Jonsdottir I, Maciewicz RA, Muir KR, Metspalu A, Rivadeneira F, Stefansson K, Styrkarsdottir U, Uitterlinden AG, van Meurs JB, Zhang W, Valdes AM, Doherty M and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Osteoarthritis (OA) is a prevalent, heritable degenerative joint disease with a substantial public health impact. We used a 1000-Genomes-Project-based imputation in a genome-wide association scan for osteoarthritis (3177 OA cases and 4894 controls) to detect a previously unidentified risk locus. We discovered a small disease-associated set of variants on chromosome 13. Through large-scale replication, we establish a robust association with SNPs in MCF2L (rs11842874, combined odds ratio [95% confidence interval] 1.17 [1.11-1.23], p = 2.1 × 10(-8)) across a total of 19,041 OA cases and 24,504 controls of European descent. This risk locus represents the third established signal for OA overall. MCF2L regulates a nerve growth factor (NGF), and treatment with a humanized monoclonal antibody against NGF is associated with reduction in pain and improvement in function for knee OA patients.

    Funded by: Medical Research Council: G0100594, G0901461

    American journal of human genetics 2011;89;3;446-50

  • Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing.

    de Gruijter JM, Lao O, Vermeulen M, Xue Y, Woodwark C, Gillson CJ, Coffey AJ, Ayub Q, Mehdi SQ, Kayser M and Tyler-Smith C

    Department of Forensic Molecular Biology, Erasmus MC University Medical Center, PO Box 2040, Rotterdam, 3000 CA, The Netherlands. o.laogrueso@erasmusmc.nl.

    Background: Numerous genome-wide scans conducted by genotyping previously ascertained single-nucleotide polymorphisms (SNPs) have provided candidate signatures for positive selection in various regions of the human genome, including in genes involved in pigmentation traits. However, it is unclear how well the signatures discovered by such haplotype-based test statistics can be reproduced in tests based on full resequencing data. Four genes (oculocutaneous albinism II (OCA2), tyrosinase-related protein 1 (TYRP1), dopachrome tautomerase (DCT), and KIT ligand (KITLG)) implicated in human skin-color variation, have shown evidence for positive selection in Europeans and East Asians in previous SNP-scan data. In the current study, we resequenced 4.7 to 6.7 kb of DNA from each of these genes in Africans, Europeans, East Asians, and South Asians.

    Results: Applying all commonly used neutrality-test statistics for allele frequency distribution to the newly generated sequence data provided conflicting results regarding evidence for positive selection. Previous haplotype-based findings could not be clearly confirmed. Although some tests were marginally significant for some populations and genes, none of them were significant after multiple-testing correction. Combined P values for each gene-population pair did not improve these results. Application of Approximate Bayesian Computation Markov chain Monte Carlo based to these sequence data using a simple forward simulator revealed broad posterior distributions of the selective parameters for all four genes, providing no support for positive selection. However, when we applied this approach to published sequence data on SLC45A2, another human pigmentation candidate gene, we could readily confirm evidence for positive selection, as previously detected with sequence-based and some haplotype-based tests.

    Conclusions: Overall, our data indicate that even genes that are strong biological candidates for positive selection and show reproducible signatures of positive selection in SNP scans do not always show the same replicability of selection signals in other tests, which should be considered in future studies on detecting positive selection in genetic data.

    Investigative genetics 2011;2;1;24

  • Computational identification of insertional mutagenesis targets for cancer gene discovery.

    de Jong J, de Ridder J, van der Weyden L, Sun N, van Uitert M, Berns A, van Lohuizen M, Jonkers J, Adams DJ and Wessels LF

    Bioinformatics and Statistics, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, The Netherlands.

    Insertional mutagenesis is a potent forward genetic screening technique used to identify candidate cancer genes in mouse model systems. An important, yet unresolved issue in the analysis of these screens, is the identification of the genes affected by the insertions. To address this, we developed Kernel Convolved Rule Based Mapping (KC-RBM). KC-RBM exploits distance, orientation and insertion density across tumors to automatically map integration sites to target genes. We perform the first genome-wide evaluation of the association of insertion occurrences with aberrant gene expression of the predicted targets in both retroviral and transposon data sets. We demonstrate the efficiency of KC-RBM by showing its superior performance over existing approaches in recovering true positives from a list of independently, manually curated cancer genes. The results of this work will significantly enhance the accuracy and speed of cancer gene discovery in forward genetic screens. KC-RBM is available as R-package.

    Funded by: Cancer Research UK; Wellcome Trust

    Nucleic acids research 2011;39;15;e105

  • Cadm1 expression and function in the mouse lens.

    De Maria A, Shi Y, Luo X, Van Der Weyden L and Bassnett S

    Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, Missouri, USA.

    Purpose: The immunoglobulin superfamily member Cadm1 is a single-pass, type 1 membrane protein that mediates calcium-independent, cell-cell adhesion. Cadm1 has been implicated in tumor formation and synaptogenesis. A recent analysis of mouse lens cell membranes identified Cadm1 as a major constituent of the fiber cell membrane proteome. Here the authors examined the expression and function of Cadm1 in the mouse lens.

    Methods: Cadm1 expression was analyzed by Western blotting and immunofluorescence. The morphology of individual wild-type and Cadm1-null lens cells was visualized by confocal microscopy.

    Results: Cadm1 was present in epithelial and superficial fiber cells as a heavily glycosylated protein with an apparent molecular mass of ≈80 kDa. Analysis of proteins extracted from various strata of the lens indicated that Cadm1 was degraded during fiber cell differentiation, at approximately the same time as the lens organelles, an observation confirmed by confocal microscopy. In epithelial cells, Cadm1 was enriched in basolateral membranes, whereas, in fiber cells, expression was restricted to the lateral membranes. Lenses from Cadm1-null mice were of normal size and transparency. The three-dimensional morphology of the cells in the epithelial layer was unaltered in the absence of Cadm1. However, in contrast to wild-type lens fiber cells, Cadm1-null fiber cells had an irregular, highly undulating morphology.

    Conclusions: Cadm1 is an abundant component of the lens fiber cell membrane. Although not essential for lens transparency, Cadm1 has an indispensable role in establishing and maintaining the characteristic three-dimensional architecture of the lens fiber cell mass.

    Funded by: NEI NIH HHS: EY009852, P30EY002687, R01EY018185; Wellcome Trust

    Investigative ophthalmology & visual science 2011;52;5;2293-9

  • Genetic risk reclassification for type 2 diabetes by age below or above 50 years using 40 type 2 diabetes risk single nucleotide polymorphisms.

    de Miguel-Yanes JM, Shrader P, Pencina MJ, Fox CS, Manning AK, Grant RW, Dupuis J, Florez JC, D'Agostino RB, Cupples LA, Meigs JB, MAGIC Investigators and DIAGRAM+ Investigators

    General Medicine Division, Massachusetts General Hospital, Boston, Massachusetts, USA.

    Objective: To test if knowledge of type 2 diabetes genetic variants improves disease prediction.

    We tested 40 single nucleotide polymorphisms (SNPs) associated with diabetes in 3,471 Framingham Offspring Study subjects followed over 34 years using pooled logistic regression models stratified by age (<50 years, diabetes cases = 144; or ≥50 years, diabetes cases = 302). Models included clinical risk factors and a 40-SNP weighted genetic risk score.

    Results: In people <50 years of age, the clinical risk factors model C-statistic was 0.908; the 40-SNP score increased it to 0.911 (P = 0.3; net reclassification improvement (NRI): 10.2%, P = 0.001). In people ≥50 years of age, the C-statistics without and with the score were 0.883 and 0.884 (P = 0.2; NRI: 0.4%). The risk per risk allele was higher in people <50 than ≥50 years of age (24 vs. 11%; P value for age interaction = 0.02).

    Conclusions: Knowledge of common genetic variation appropriately reclassifies younger people for type 2 diabetes risk beyond clinical risk factors but not older people.

    Funded by: Medical Research Council: MC_U106179474; NCRR NIH HHS: 1S10RR163736-01A1; NHLBI NIH HHS: N01-HC- 25195; NIDDK NIH HHS: K23 DK65978, K24 DK080140, R01 DK078616, R21 DK084527, R21 DK084527-01

    Diabetes care 2011;34;1;121-5

  • Ethical issues in human genomics research in developing countries.

    de Vries J, Bull SJ, Doumbo O, Ibrahim M, Mercereau-Puijalon O, Kwiatkowski D and Parker M

    The Ethox Centre, Department of Public Health and Primary Care, University of Oxford, Old Road Campus, Headington, Oxford, OX3 7LF, UK. jantina.devries@ethox.ox.ac.uk

    Background: Genome-wide association studies (GWAS) provide a powerful means of identifying genetic variants that play a role in common diseases. Such studies present important ethical challenges. An increasing number of GWAS is taking place in lower income countries and there is a pressing need to identify the particular ethical challenges arising in such contexts. In this paper, we draw upon the experiences of the MalariaGEN Consortium to identify specific ethical issues raised by such research in Africa, Asia and Oceania.

    Discussion: We explore ethical issues in three key areas: protecting the interests of research participants, regulation of international collaborative genomics research and protecting the interests of scientists in low income countries. With regard to participants, important challenges are raised about community consultation and consent. Genomics research raises ethical and governance issues about sample export and ownership, about the use of archived samples and about the complexity of reviewing such large international projects. In the context of protecting the interests of researchers in low income countries, we discuss aspects of data sharing and capacity building that need to be considered for sustainable and mutually beneficial collaborations.

    Summary: Many ethical issues are raised when genomics research is conducted on populations that are characterised by lower average income and literacy levels, such as the populations included in MalariaGEN. It is important that such issues are appropriately addressed in such research. Our experience suggests that the ethical issues in genomics research can best be identified, analysed and addressed where ethics is embedded in the design and implementation of such research projects.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 077383/Z/05/Z, 087285/Z/08/Z, WT 083326/Z/07/Z

    BMC medical ethics 2011;12;5

  • Cell type-specific DNA methylation at intragenic CpG islands in the immune system.

    Deaton AM, Webb S, Kerr AR, Illingworth RS, Guy J, Andrews R and Bird A

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom

    Human and mouse genomes contain a similar number of CpG islands (CGIs), which are discrete CpG-rich DNA sequences associated with transcription start sites. In both species, ∼50% of all CGIs are remote from annotated promoters but, nevertheless, often have promoter-like features. To determine the role of CGI methylation in cell differentiation, we analyzed DNA methylation at a comprehensive CGI set in cells of the mouse hematopoietic lineage. Using a method that potentially detects ∼33% of genomic CpGs in the methylated state, we found that large differences in gene expression were accompanied by surprisingly few DNA methylation changes. There were, however, many DNA methylation differences between hematopoietic cells and a distantly related tissue, brain. Altered DNA methylation in the immune system occurred predominantly at CGIs within gene bodies, which have the properties of cell type-restricted promoters, but infrequently at annotated gene promoters or CGI flanking sequences (CGI "shores"). Unexpectedly, elevated intragenic CGI methylation correlated with silencing of the associated gene. Differentially methylated intragenic CGIs tended to lack H3K4me3 and associate with a transcriptionally repressive environment regardless of methylation state. Our results indicate that DNA methylation changes play a relatively minor role in the late stages of differentiation and suggest that intragenic CGIs represent regulatory sites of differential gene expression during the early stages of lineage specification.

    Funded by: Medical Research Council; Wellcome Trust

    Genome research 2011;21;7;1074-86

  • Does a short breastfeeding period protect from FTO-induced adiposity in children?

    Dedoussis GV, Yannakoulia M, Timpson NJ, Manios Y, Kanoni S, Scott RA, Papoutsakis C, Deloukas P, Pitsiladis YP, Davey-Smith G, Hirschhorn JN and Lyon HN

    Department of Dietetics and Nutrition, Harokopio University, Athens, Greece. dedousi@hua.gr

    Context: A number of studies have reported replicable associations between common genetic loci and obesity indices. One of these loci is the fat mass and obesity associated locus (FTO). We aimed to assess whether breastfeeding mediated the known association between FTO and indices of body fatness.

    Methods: This study includes three independent pediatric cohorts, two of Greek origin (the Gene-Diet Attica Investigation: GENDAI, n=1 138 and the "Growth, Exercise and Nutrition Epidemiological Study In preschoolers": the GENESIS study, n=2 374) and one British (the Avon Longitudinal Study of Parents and Children:ALSPAC, n=4 325). Among other information, breastfeeding history was recorded. A DNA sample was ascertained by either blood or saliva. Genotyping for FTO variants was performed in GENDAI and ALSPAC for the rs9939609, while in GENESIS, for the rs17817449 variant.

    Results: In all cohorts, multivariate analysis showed that the association between FTO:rs9939609 and measures of obesity was consistent across newly presented cohorts (GENDAI: Body mass index [BMI], β=0.43, p=0.009; Waist Circumference, β=1.067, p=0.019; triceps skinfold, β=0.972, p=0.003; subscapular skinfold, β=0.593, p=0.023; GENESIS: Waist Circumference, β=0.473, p=0.008 and subscapular skinfold, β=0.227, p=0.014). Inclusion of one month of breastfeeding as an interaction term effectively removed these associations with indices of obesity (BMI, Waist-Hip-Ratio and subscapular skinfold). No evidence of such interaction was observed for the independent cohort of British children.

    Conclusions: Our findings indicate that in two moderately sized Greek samples, breastfeeding may exert a modifying effect on the relationship between variants at the FTO locus and indices of adiposity. These findings were not replicated in a larger British collection.

    Funded by: Medical Research Council: G0600705, G9815508; NIDDK NIH HHS: K23 DK067288; Wellcome Trust

    International journal of pediatric obesity : IJPO : an official journal of the International Association for the Study of Obesity 2011;6;2-2;e326-35

  • Meta-analysis of genome-wide association studies in >80 000 subjects identifies multiple loci for C-reactive protein levels.

    Dehghan A, Dupuis J, Barbalic M, Bis JC, Eiriksdottir G, Lu C, Pellikka N, Wallaschofski H, Kettunen J, Henneman P, Baumert J, Strachan DP, Fuchsberger C, Vitart V, Wilson JF, Paré G, Naitza S, Rudock ME, Surakka I, de Geus EJ, Alizadeh BZ, Guralnik J, Shuldiner A, Tanaka T, Zee RY, Schnabel RB, Nambi V, Kavousi M, Ripatti S, Nauck M, Smith NL, Smith AV, Sundvall J, Scheet P, Liu Y, Ruokonen A, Rose LM, Larson MG, Hoogeveen RC, Freimer NB, Teumer A, Tracy RP, Launer LJ, Buring JE, Yamamoto JF, Folsom AR, Sijbrands EJ, Pankow J, Elliott P, Keaney JF, Sun W, Sarin AP, Fontes JD, Badola S, Astor BC, Hofman A, Pouta A, Werdan K, Greiser KH, Kuss O, Meyer zu Schwabedissen HE, Thiery J, Jamshidi Y, Nolte IM, Soranzo N, Spector TD, Völzke H, Parker AN, Aspelund T, Bates D, Young L, Tsui K, Siscovick DS, Guo X, Rotter JI, Uda M, Schlessinger D, Rudan I, Hicks AA, Penninx BW, Thorand B, Gieger C, Coresh J, Willemsen G, Harris TB, Uitterlinden AG, Järvelin MR, Rice K, Radke D, Salomaa V, Willems van Dijk K, Boerwinkle E, Vasan RS, Ferrucci L, Gibson QD, Bandinelli S, Snieder H, Boomsma DI, Xiao X, Campbell H, Hayward C, Pramstaller PP, van Duijn CM, Peltonen L, Psaty BM, Gudnason V, Ridker PM, Homuth G, Koenig W, Ballantyne CM, Witteman JC, Benjamin EJ, Perola M and Chasman DI

    Erasmus Medical Center, Dr Molewaterplein 50, Rotterdam, Netherlands.

    Background: C-reactive protein (CRP) is a heritable marker of chronic inflammation that is strongly associated with cardiovascular disease. We sought to identify genetic variants that are associated with CRP levels.

    We performed a genome-wide association analysis of CRP in 66 185 participants from 15 population-based studies. We sought replication for the genome-wide significant and suggestive loci in a replication panel comprising 16 540 individuals from 10 independent studies. We found 18 genome-wide significant loci, and we provided evidence of replication for 8 of them. Our results confirm 7 previously known loci and introduce 11 novel loci that are implicated in pathways related to the metabolic syndrome (APOC1, HNF1A, LEPR, GCKR, HNF4A, and PTPN2) or the immune system (CRP, IL6R, NLRP3, IL1F10, and IRF1) or that reside in regions previously not known to play a role in chronic inflammation (PPP1R3B, SALL1, PABPC4, ASCL1, RORA, and BCL7B). We found a significant interaction of body mass index with LEPR (P<2.9×10(-6)). A weighted genetic risk score that was developed to summarize the effect of risk alleles was strongly associated with CRP levels and explained ≈5% of the trait variance; however, there was no evidence for these genetic variants explaining the association of CRP with coronary heart disease.

    Conclusions: We identified 18 loci that were associated with CRP levels. Our study highlights immune response and metabolic regulatory pathways involved in the regulation of chronic inflammation.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: PG/05/117; Chief Scientist Office; Medical Research Council: G0000934, G0500539, G0600705; NCI NIH HHS: CA047988; NCRR NIH HHS: 1S10 RR163736-01A1, M01RR00425; NHLBI NIH HHS: 5R01 HL087679-02, HL043851, HL064753, HL076784, N01 HC-15103, N01 HC-55222, N01 HC-75150, N01-HC 25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N02-HL-64278, R01 HL087652, U01 HL072515, U01 HL080295; NIA NIH HHS: 1R01 AG032098-01A1, AG028321, N01 AG62103, N01 AG62106, N01-AG-12100, N01-AG-12109, Z01 AG000015-50; NIDDK NIH HHS: DK063491, P30 DK072488; NIMH NIH HHS: 1RL1 MH083268-01, 5R01 MH63706:02; Wellcome Trust: 068545/Z/02, GR069224

    Circulation 2011;123;7;731-8

  • The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying genetic affinities of Negrito and non-Negrito groups.

    Delfin F, Salvador JM, Calacal GC, Perdigon HB, Tabbada KA, Villamor LP, Halos SC, Gunnarsdóttir E, Myles S, Hughes DA, Xu S, Jin L, Lao O, Kayser M, Hurles ME, Stoneking M and De Ungria MC

    DNA Analysis Laboratory, Natural Sciences Research Institute, University of the Philippines, Diliman, Quezon City, Philippines.

    The Philippines exhibits a rich diversity of people, languages, and culture, including so-called 'Negrito' groups that have for long fascinated anthropologists, yet little is known about their genetic diversity. We report here, a survey of Y-chromosome variation in 390 individuals from 16 Filipino ethnolinguistic groups, including six Negrito groups, from across the archipelago. We find extreme diversity in the Y-chromosome lineages of Filipino groups with heterogeneity seen in both Negrito and non-Negrito groups, which does not support a simple dichotomy of Filipino groups as Negrito vs non-Negrito. Filipino non-recombining region of the human Y chromosome lineages reflect a chronology that extends from after the initial colonization of the Asia-Pacific region, to the time frame of the Austronesian expansion. Filipino groups appear to have diverse genetic affinities with different populations in the Asia-Pacific region. In particular, some Negrito groups are associated with indigenous Australians, with a potential time for the association ranging from the initial colonization of the region to more recent (after colonization) times. Overall, our results indicate extensive heterogeneity contributing to a complex genetic history for Filipino groups, with varying roles for migrations from outside the Philippines, genetic drift, and admixture among neighboring groups.

    European journal of human genetics : EJHG 2011;19;2;224-30

  • Genetic architecture of circulating lipid levels.

    Demirkan A, Amin N, Isaacs A, Jarvelin MR, Whitfield JB, Wichmann HE, Kyvik KO, Rudan I, Gieger C, Hicks AA, Johansson Å, Hottenga JJ, Smith JJ, Wild SH, Pedersen NL, Willemsen G, Mangino M, Hayward C, Uitterlinden AG, Hofman A, Witteman J, Montgomery GW, Pietiläinen KH, Rantanen T, Kaprio J, Döring A, Pramstaller PP, Gyllensten U, de Geus EJ, Penninx BW, Wilson JF, Rivadeneria F, Magnusson PK, Boomsma DI, Spector T, Campbell H, Hoehne B, Martin NG, Oostra BA, McCarthy M, Peltonen-Palotie L, Aulchenko Y, Visscher PM, Ripatti S, Janssens AC, van Duijn CM and ENGAGE CONSORTIUM

    Genetic Epidemiology Unit, Department of Epidemiology and Clinical Genetics, Erasmus University Medical Center, Rotterdam, The Netherlands.

    Serum concentrations of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides (TGs) and total cholesterol (TC) are important heritable risk factors for cardiovascular disease. Although genome-wide association studies (GWASs) of circulating lipid levels have identified numerous loci, a substantial portion of the heritability of these traits remains unexplained. Evidence of unexplained genetic variance can be detected by combining multiple independent markers into additive genetic risk scores. Such polygenic scores, constructed using results from the ENGAGE Consortium GWAS on serum lipids, were applied to predict lipid levels in an independent population-based study, the Rotterdam Study-II (RS-II). We additionally tested for evidence of a shared genetic basis for different lipid phenotypes. Finally, the polygenic score approach was used to identify an alternative genome-wide significance threshold before pathway analysis and those results were compared with those based on the classical genome-wide significance threshold. Our study provides evidence suggesting that many loci influencing circulating lipid levels remain undiscovered. Cross-prediction models suggested a small overlap between the polygenic backgrounds involved in determining LDL-C, HDL-C and TG levels. Pathway analysis utilizing the best polygenic score for TC uncovered extra information compared with using only genome-wide significant loci. These results suggest that the genetic architecture of circulating lipids involves a number of undiscovered variants with very small effects, and that increasing GWAS sample sizes will enable the identification of novel variants that regulate lipid levels.

    Funded by: Chief Scientist Office; Wellcome Trust

    European journal of human genetics : EJHG 2011;19;7;813-9

  • Specific capture and whole-genome sequencing of viruses from clinical samples.

    Depledge DP, Palser AL, Watson SJ, Lai IY, Gray ER, Grant P, Kanda RK, Leproust E, Kellam P and Breuer J

    Division of Infection and Immunity, University College London, London, United Kingdom. d.depledge@ucl.ac.uk

    Whole genome sequencing of viruses directly from clinical samples is integral for understanding the genetics of host-virus interactions. Here, we report the use of sample sparing target enrichment (by hybridisation) for viral nucleic acid separation and deep-sequencing of herpesvirus genomes directly from a range of clinical samples including saliva, blood, virus vesicles, cerebrospinal fluid, and tumour cell lines. We demonstrate the effectiveness of the method by deep-sequencing 13 highly cell-associated human herpesvirus genomes and generating full length genome alignments at high read depth. Moreover, we show the specificity of the method enables the study of viral population structures and their diversity within a range of clinical samples types.

    Funded by: Department of Health; Medical Research Council: G07008, G0700814, G0900950; Wellcome Trust: 081703MA

    PloS one 2011;6;11;e27805

  • Glucocorticoid receptor (NR3C1) gene polymorphisms and onset of alcohol abuse in adolescents.

    Desrivières S, Lourdusamy A, Müller C, Ducci F, Wong CP, Kaakinen M, Pouta A, Hartikainen AL, Isohanni M, Charoen P, Peltonen L, Freimer N, Elliott P, Jarvelin MR and Schumann G

    MRC-SGDP Centre, Institute of Psychiatry, Kings College, UK.

    Onset of alcohol use at an early age increases the risk for later alcohol dependence. We investigated the role of the glucocorticoid receptor (GR) gene (NR3C1) in onset of alcohol use and abuse in 14-year-old adolescents (n=4534). Several NR3C1 polymorphisms were associated with onset of alcohol drinking or drunkenness at this age. Strongest associations were observed in females, with one marker (rs244465) remaining significant after correction for multiple testing (P(adj) =0.0067; odds ratio=1.7, for drunkenness). Our data provide the first evidence that GR modulates initiation of alcohol abuse and reveal a polymorphism that might contribute to susceptibility to addiction.

    Funded by: Department of Health; Medical Research Council: G0500539, G0600705; NHLBI NIH HHS: 1-R01HL087679-01; NIMH NIH HHS: 1RL1MH083268-01, 5RL1MH083268; Wellcome Trust: 069224, 089061, GR06922

    Addiction biology 2011;16;3;510-3

  • Host candidate gene polymorphisms and clearance of drug-resistant Plasmodium falciparum parasites.

    Diakite M, Achidi EA, Achonduh O, Craik R, Djimde AA, Evehe MS, Green A, Hubbart C, Ibrahim M, Jeffreys A, Khan BK, Kimani F, Kwiatkowski DP, Mbacham WF, Jezan SO, Ouedraogo JB, Rockett K, Rowlands K, Tagelsir N, Tekete MM, Zongo I and Ranford-Cartwright LC

    Malaria Research and Training Centre, Faculty of Medicine, Pharmacy and Odontostomatology, University of Bamako, Mali. mdiakite@icermali.org

    Background: Resistance to anti-malarial drugs is a widespread problem for control programmes for this devastating disease. Molecular tests are available for many anti-malarial drugs and are useful tools for the surveillance of drug resistance. However, the correlation of treatment outcome and molecular tests with particular parasite markers is not perfect, due in part to individuals who are able to clear genotypically drug-resistant parasites. This study aimed to identify molecular markers in the human genome that correlate with the clearance of malaria parasites after drug treatment, despite the drug resistance profile of the protozoan as predicted by molecular approaches.

    Methods: 3721 samples from five African countries, which were known to contain genotypically drug resistant parasites, were analysed. These parasites were collected from patients who subsequently failed to clear their infection following drug treatment, as expected, but also from patients who successfully cleared their infections with drug-resistant parasites. 67 human polymorphisms (SNPs) on 17 chromosomes were analysed using Sequenom's mass spectrometry iPLEX gold platform, to identify regions of the human genome, which contribute to enhanced clearance of drug resistant parasites.

    Results: An analysis of all data from the five countries revealed significant associations between the phenotype of ability to clear drug-resistant Plasmodium falciparum infection and human immune response loci common to all populations. Overall, three SNPs showed a significant association with clearance of drug-resistant parasites with odds ratios of 0.76 for SNP rs2706384 (95% CI 0.71-0.92, P = 0.005), 0.66 for SNP rs1805015 (95% CI 0.45-0.97, P = 0.03), and 0.67 for SNP rs1128127 (95% CI 0.45-0.99, P = 0.05), after adjustment for possible confounding factors. The first two SNPs (rs2706384 and rs1805015) are within loci involved in pro-inflammatory (interferon-gamma) and anti-inflammatory (IL-4) cytokine responses. The third locus encodes a protein involved in the degradation of misfolded proteins within the endoplasmic reticulum, and its role, if any, in the clearance phenotype is unclear.

    Conclusions: The study showed significant association of three loci in the human genome with the ability of parasite to clear drug-resistant P. falciparum in samples taken from five countries distributed across sub-Saharan Africa. Both SNP rs2706384 and SNP1805015 have previously been reported to be associated with risk of malaria infection in African populations. The loci are involved in the Th1/Th2 balance, and the association of SNPs within these genes suggests a key role for antibody in the clearance of drug-resistant parasites. It is possible that patients able to clear drug-resistant infections have an enhanced ability to control parasite growth.

    Funded by: Medical Research Council; Wellcome Trust: 075491/Z/04

    Malaria journal 2011;10;250

  • Live vaccines and their role in modern vaccinology

    DOUGAN G, GOULDING D, Hall LJ

    Replicating Vaccines. 2011;Part 1;3-14

  • Immunity to salmonellosis.

    Dougan G, John V, Palmer S and Mastroeni P

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. gd1@sanger.ac.uk

    Salmonella enterica is a genetically broad species harboring isolates that display considerable antigenic heterogeneity and significant differences in virulence potential. Salmonella generally exhibit an invasive potential and they can survive for extended periods within cells of the immune system. They cause acute or chronic infections that can be local (e.g. gastroenteritis) or systemic (e.g. typhoid). In vivo Salmonella infections are complex with multiple arms of the immune system being engaged. Both humoral and cellular responses can be detected and characterized, but full protective immunity is not always induced, even following natural infection. The murine model has proven to be a fertile ground for exploring immune mechanisms and observations in the mouse have often, although not always, correlated with those in other infectable species, including humans. Host genetic studies have identified a number of mammalian genes that are central to controlling infection, operating both in innate and acquired immune pathways. Vaccines, both oral and parenteral, are available or under development, and these have been used with some success to explore immunity in both model systems and clinically in humans.

    Funded by: Wellcome Trust

    Immunological reviews 2011;240;1;196-210

  • Dalliance: interactive genome viewing on the web.

    Down TA, Piipari M and Hubbard TJ

    Wellcome Trust/CRUK Gurdon Institute, Cambridge CB2 1QN, UK. thomas@biodalliance.org

    Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.

    Dalliance runs entirely within your web browser, and relies on existing DAS server infrastructure. Browsers for several mammalian genomes are available at http://www.biodalliance.org/, and the use of DAS means you can add your own data to these browsers. In addition, the source code (Javascript) is available under the BSD license, and is straightforward to install on your own web server and embed within other documents.

    Funded by: Wellcome Trust: 077198, 083563

    Bioinformatics (Oxford, England) 2011;27;6;889-90

  • Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.

    Downing T, Imamura H, Decuypere S, Clark TG, Coombs GH, Cotton JA, Hilley JD, de Doncker S, Maes I, Mottram JC, Quail MA, Rijal S, Sanders M, Schönian G, Stark O, Sundar S, Vanaerschot M, Hertz-Fowler C, Dujardin JC and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.

    Funded by: Wellcome Trust: 076355, 085775/Z/08/Z

    Genome research 2011;21;12;2143-56

  • TTC12-ANKK1-DRD2 and CHRNA5-CHRNA3-CHRNB4 influence different pathways leading to smoking behavior from adolescence to mid-adulthood.

    Ducci F, Kaakinen M, Pouta A, Hartikainen AL, Veijola J, Isohanni M, Charoen P, Coin L, Hoggart C, Ekelund J, Peltonen L, Freimer N, Elliott P, Schumann G and Järvelin MR

    Medical Research Council-Social Genetics and Developmental Psychiatry Centre, Institute of Psychiatry, King's College, London, United Kingdom. Francesca.Ducci@kcl.ac.uk

    Background: CHRNA5-CHRNA3-CHRNB4 and TTC12-ANKK1-DRD2 gene-clusters influence smoking behavior. Our aim was to test developmental changes in their effects as well as the interplays between them and with nongenetic factors.

    Methods: Participants included 4762 subjects from a general population-based, prospective Northern Finland 1966 Birth Cohort (NFBC 1966). Smoking behavior was collected at age 14 and 31 years. Information on maternal smoking, socioeconomic status, and novelty seeking were also collected. Structural equation modeling was used to construct an integrative etiologic model including genetic and nongenetic factors.

    Results: Several single nucleotide polymorphisms in both gene-clusters were significantly associated with smoking. The most significant were in CHRNA3 (rs1051730, p = 1.1 × 10(-5)) and in TTC12 (rs10502172, p = 9.1 × 10(-6)). CHRNA3-rs1051730[A] was more common among heavy/regular smokers than nonsmokers with similar effect-sizes at age 14 years (odds ratio [95% CI]: 1.27 [1.06-1.52]) and 31 years (1.28 [1.13-1.44]). TTC12-rs10502172[G] was more common among smokers than nonsmokers with stronger association at 14 years (1.33 [1.11-1.60]) than 31 years (1.14 [1.02-1.28]). In adolescence, carriers of three-four risk alleles at either CHRNA3-rs1051730 or TTC12-rs10502172 had almost threefold odds of smoking regularly than subjects with no risk alleles. TTC12-rs10502172 effect on smoking in adulthood was mediated by its effect on smoking in adolescence and via novelty seeking. Effect of CHRNA3-rs1051730 on smoking in adulthood was direct.

    Conclusions: TTC12-ANKK1-DRD2s seemed to influence smoking behavior mainly in adolescence, and its effect is partially mediated by personality characteristics promoting drug-seeking behavior. In contrast, CHRNA5-CHRNA3-CHRNB4 is involved in the transition toward heavy smoking in mid-adulthood and in smoking persistence. Factors related to familial and social disadvantages were strong independent predictors of smoking.

    Funded by: Medical Research Council: 93558, G0500539; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01, RL1 MH083268-01

    Biological psychiatry 2011;69;7;650-60

  • Developing and implementing an institute-wide data sharing policy.

    Dyke SO and Hubbard TJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. th@sanger.ac.uk.

    The Wellcome Trust Sanger Institute has a strong reputation for prepublication data sharing as a result of its policy of rapid release of genome sequence data and particularly through its contribution to the Human Genome Project. The practicalities of broad data sharing remain largely uncharted, especially to cover the wide range of data types currently produced by genomic studies and to adequately address ethical issues. This paper describes the processes and challenges involved in implementing a data sharing policy on an institute-wide scale. This includes questions of governance, practical aspects of applying principles to diverse experimental contexts, building enabling systems and infrastructure, incentives and collaborative issues.

    Genome medicine 2011;3;9;60

  • Assemblathon 1: a competitive assessment of de novo short read assembly methods.

    Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol İ, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I and Paten B

    Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.

    Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

    Funded by: Howard Hughes Medical Institute; NCI NIH HHS: 1U24CA143858-01; NHGRI NIH HHS: HG00064, P41HG002371, U01HG004695, U41HG004568, U54HG004555

    Genome research 2011;21;12;2224-41

  • Retrospective application of transposon-directed insertion site sequencing to a library of signature-tagged mini-Tn5Km2 mutants of Escherichia coli O157:H7 screened in cattle.

    Eckert SE, Dziva F, Chaudhuri RR, Langridge GC, Turner DJ, Pickard DJ, Maskell DJ, Thomson NR and Stevens MP

    Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Bush Farm Road, Roslin, Midlothian EH25 9RG, United Kingdom.

    Massively parallel sequencing of transposon-flanking regions assigned the genotype and fitness score to 91% of Escherichia coli O157:H7 mutants previously screened in cattle by signature-tagged mutagenesis (STM). The method obviates the limitations of STM and markedly extended the functional annotation of the prototype E. coli O157:H7 genome without further animal use.

    Funded by: Biotechnology and Biological Sciences Research Council: D017556, D017947

    Journal of bacteriology 2011;193;7;1771-6

  • The human postsynaptic density shares conserved elements with proteomes of unicellular eukaryotes and prokaryotes.

    Emes RD and Grant SG

    School of Veterinary Medicine and Science, University of Nottingham Leicestershire, UK.

    The animal nervous system processes information from the environment and mediates learning and memory using molecular signaling pathways in the postsynaptic terminal of synapses. Postsynaptic neurotransmitter receptors assemble to form multiprotein complexes that drive signal transduction pathways to downstream cell biological processes. Studies of mouse and Drosophila postsynaptic proteins have identified key roles in synaptic physiology and behavior for a wide range of proteins including receptors, scaffolds, enzymes, structural, translational, and transcriptional regulators. Comparative proteomic and genomic studies identified components of the postsynaptic proteome conserved in eukaryotes and early metazoans. We extend these studies, and examine the conservation of genes and domains found in the human postsynaptic density with those across the three superkingdoms, archaeal, bacteria, and eukaryota. A conserved set of proteins essential for basic cellular functions were conserved across the three superkingdoms, whereas synaptic structural and many signaling molecules were specific to the eukaryote lineage. Genes involved with metabolism and environmental signaling in Escherichia coli including the chemotactic and ArcAB Two-Component signal transduction systems shared homologous genes in the mammalian postsynaptic proteome. These data suggest conservation between prokaryotes and mammalian synapses of signaling mechanisms from receptors to transcriptional responses, a process essential to learning and memory in vertebrates. A number of human postsynaptic proteins with homologs in prokaryotes are mutated in human genetic diseases with nervous system pathology. These data also indicate that structural and signaling proteins characteristic of postsynaptic complexes arose in the eukaryotic lineage and rapidly expanded following the emergence of the metazoa, and provide an insight into the early evolution of synaptic mechanisms and conserved mechanisms of learning and memory.

    Frontiers in neuroscience 2011;5;44

  • A user's guide to the encyclopedia of DNA elements (ENCODE).

    ENCODE Project Consortium

    HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America. rmyers@hudsonalpha.org

    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

    Funded by: NHGRI NIH HHS: R01 HG003143, R01 HG004037, RC2 HG005573; NIDDK NIH HHS: R01 DK054369, R01 DK065806; Wellcome Trust: 095908

    PLoS biology 2011;9;4;e1001046

  • Meta-analysis of genome-wide association studies confirms a susceptibility locus for knee osteoarthritis on chromosome 7q22.

    Evangelou E, Valdes AM, Kerkhof HJ, Styrkarsdottir U, Zhu Y, Meulenbelt I, Lories RJ, Karassa FB, Tylzanowski P, Bos SD, arcOGEN Consortium, Akune T, Arden NK, Carr A, Chapman K, Cupples LA, Dai J, Deloukas P, Doherty M, Doherty S, Engstrom G, Gonzalez A, Halldorsson BV, Hammond CL, Hart DJ, Helgadottir H, Hofman A, Ikegawa S, Ingvarsson T, Jiang Q, Jonsson H, Kaprio J, Kawaguchi H, Kisand K, Kloppenburg M, Kujala UM, Lohmander LS, Loughlin J, Luyten FP, Mabuchi A, McCaskie A, Nakajima M, Nilsson PM, Nishida N, Ollier WE, Panoutsopoulou K, van de Putte T, Ralston SH, Rivadeneira F, Saarela J, Schulte-Merker S, Shi D, Slagboom PE, Sudo A, Tamm A, Tamm A, Thorleifsson G, Thorsteinsdottir U, Tsezou A, Wallis GA, Wilkinson JM, Yoshimura N, Zeggini E, Zhai G, Zhang F, Jonsdottir I, Uitterlinden AG, Felson DT, van Meurs JB, Stefansson K, Ioannidis JP, Spector TD and Translation Research in Europe Applied Technologies for Osteoarthritis (TreatOA)

    Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece.

    Objectives: Osteoarthritis (OA) is the most prevalent form of arthritis and accounts for substantial morbidity and disability, particularly in older people. It is characterised by changes in joint structure, including degeneration of the articular cartilage, and its aetiology is multifactorial with a strong postulated genetic component.

    Methods: A meta-analysis was performed of four genome-wide association (GWA) studies of 2371 cases of knee OA and 35 909 controls in Caucasian populations. Replication of the top hits was attempted with data from 10 additional replication datasets.

    Results: With a cumulative sample size of 6709 cases and 44 439 controls, one genome-wide significant locus was identified on chromosome 7q22 for knee OA (rs4730250, p=9.2 × 10⁻⁹), thereby confirming its role as a susceptibility locus for OA.

    Conclusion: The associated signal is located within a large (500 kb) linkage disequilibrium block that contains six genes: PRKAR2B (protein kinase, cAMP-dependent, regulatory, type II, β), HPB1 (HMG-box transcription factor 1), COG5 (component of oligomeric golgi complex 5), GPR22 (G protein-coupled receptor 22), DUS4L (dihydrouridine synthase 4-like) and BCAP29 (B cell receptor-associated protein 29). Gene expression analyses of the (six) genes in primary cells derived from different joint tissues confirmed expression of all the genes in the joint environment.

    Funded by: Arthritis Research UK: 17489, 18030; Medical Research Council: G0000934, G0100594, G0901461; Wellcome Trust: 068545, 083948, 088785, WT079557MA, WT088885/Z/09/Z

    Annals of the rheumatic diseases 2011;70;2;349-55

  • Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility.

    Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, Kochan G, Oppermann U, Opperman U, Dilthey A, Pirinen M, Stone MA, Appleton L, Moutsianas L, Moutsianis L, Leslie S, Wordsworth T, Kenna TJ, Karaderi T, Thomas GP, Ward MM, Weisman MH, Farrar C, Bradbury LA, Danoy P, Inman RD, Maksymowych W, Gladman D, Rahman P, Spondyloarthritis Research Consortium of Canada (SPARCC), Morgan A, Marzo-Ortega H, Bowness P, Gaffney K, Gaston JS, Smith M, Bruges-Armas J, Couto AR, Sorrentino R, Paladini F, Ferreira MA, Xu H, Liu Y, Jiang L, Lopez-Larrea C, Díaz-Peña R, López-Vázquez A, Zayats T, Band G, Bellenguez C, Blackburn H, Blackwell JM, Bramon E, Bumpstead SJ, Casas JP, Corvin A, Craddock N, Deloukas P, Dronov S, Duncanson A, Edkins S, Freeman C, Gillman M, Gray E, Gwilliam R, Hammond N, Hunt SE, Jankowski J, Jayakumar A, Langford C, Liddle J, Markus HS, Mathew CG, McCann OT, McCarthy MI, Palmer CN, Peltonen L, Plomin R, Potter SC, Rautanen A, Ravindrarajah R, Ricketts M, Samani N, Sawcer SJ, Strange A, Trembath RC, Viswanathan AC, Waller M, Weston P, Whittaker P, Widaa S, Wood NW, McVean G, Reveille JD, Wordsworth BP, Brown MA, Donnelly P, Australo-Anglo-American Spondyloarthritis Consortium (TASC) and Wellcome Trust Case Control Consortium 2 (WTCCC2)

    Medical Research Council (MRC) Centre for Causal Analyses in Translational Epidemiology, School of Social and Community Medicine, University of Bristol, Bristol, UK.

    Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 × 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 × 10(-6) overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides.

    Funded by: Arthritis Research UK: 18797, 19536; Canadian Institutes of Health Research; Medical Research Council: G0000934; NCRR NIH HHS: MO1-RR00425, UL1RR024188; NIAMS NIH HHS: R01-AR046208; PHS HHS: P01-052915; Wellcome Trust: 068545/Z/02, 076113, 083948/Z/07/Z

    Nature genetics 2011;43;8;761-7

  • Differential protein expression throughout the life cycle of Trypanosoma congolense, a major parasite of cattle in Africa.

    Eyford BA, Sakurai T, Smith D, Loveless B, Hertz-Fowler C, Donelson JE, Inoue N and Pearson TW

    Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada.

    Trypanosoma congolense is an important pathogen of livestock in Africa. To study protein expression throughout the T. congolense life cycle, we used culture-derived parasites of each of the three main insect stages and bloodstream stage parasites isolated from infected mice, to perform differential protein expression analysis. Three complete biological replicates of all four life cycle stages were produced from T. congolense IL3000, a cloned parasite that is amenable to culture of major life cycle stages in vitro. Cellular proteins from each life cycle stage were trypsin digested and the resulting peptides were labeled with isobaric tags for relative and absolute quantification (iTRAQ). The peptides were then analyzed by tandem mass spectrometry (MS/MS). This method was used to identify and relatively quantify proteins from the different life cycle stages in the same experiment. A search of the Wellcome Trust's Sanger Institute's semi-annotated T. congolense database was performed using the MS/MS fragmentation data to identify the corresponding source proteins. A total of 2088 unique protein sequences were identified, representing 23% of the ∼9000 proteins predicted for the T. congolense proteome. The 1291 most confidently identified proteins were prioritized for further study. Of these, 784 yielded annotated hits while 501 were described as "hypothetical proteins". Six proteins showed no significant sequence similarity to any known proteins (from any species) and thus represent new, previously uncharacterized T. congolense proteins. Of particular interest among the remainder are several membrane molecules that showed drastic differential expression, including, not surprisingly, the well-studied variant surface glycoproteins (VSGs), invariant surface glycoproteins (ISGs) 65 and 75, congolense epimastigote specific protein (CESP), the surface protease GP63, an amino acid transporter, a pteridine transporter and a haptoglobin-hemoglobin receptor. Several of these surface disposed proteins are of functional interest as they are necessary for survival of the parasites.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Molecular and biochemical parasitology 2011;177;2;116-25

  • Examining the overlap between genome-wide rare variant association signals and linkage peaks in rheumatoid arthritis.

    Eyre S, Ke X, Lawrence R, Bowes J, Panoutsopoulou K, Barton A, Thomson W, Worthington J and Zeggini E

    University of Manchester, Manchester, UK.

    Objective: With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.

    Methods: The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.

    Results: The distribution of rare variant signals (association P≤10(-5)) was found to differ significantly between regions with and without linkage evidence (P=2×10(-17) by Fisher's exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA-DRB1 locus was accounted for.

    Conclusion: The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.

    Funded by: Arthritis Research UK: 17552, 18030; Wellcome Trust: 076113, 079557MA, 088885, WT088885/Z/09/Z

    Arthritis and rheumatism 2011;63;6;1522-6

  • Troponin T is essential for sarcomere assembly in zebrafish skeletal muscle.

    Ferrante MI, Kiff RM, Goulding DA and Stemple DL

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.

    In striated muscle, the basic contractile unit is the sarcomere, which comprises myosin-rich thick filaments intercalated with thin filaments made of actin, tropomyosin and troponin. Troponin is required to regulate Ca(2+)-dependent contraction, and mutant forms of troponins are associated with muscle diseases. We have disrupted several genes simultaneously in zebrafish embryos and have followed the progression of muscle degeneration in the absence of troponin. Complete loss of troponin T activity leads to loss of sarcomere structure, in part owing to the destructive nature of deregulated actin-myosin activity. When troponin T and myosin activity are simultaneously disrupted, immature sarcomeres are rescued. However, tropomyosin fails to localise to sarcomeres, and intercalating thin filaments are missing from electron microscopic cross-sections, indicating that loss of troponin T affects thin filament composition. If troponin activity is only partially disrupted, myofibrils are formed but eventually disintegrate owing to deregulated actin-myosin activity. We conclude that the troponin complex has at least two distinct activities: regulation of actin-myosin activity and, independently, a role in the proper assembly of thin filaments. Our results also indicate that sarcomere assembly can occur in the absence of normal thin filaments.

    Funded by: Wellcome Trust: WT 077037/Z/05/Z, WT 077047/Z/05/Z

    Journal of cell science 2011;124;Pt 4;565-77

  • The Genomic Standards Consortium.

    Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glöckner FO, Hirschman L, Karsch-Mizrachi I, Klenk HP, Knight R, Kottmann R, Kyrpides N, Meyer F, San Gil I, Sansone SA, Schriml LM, Sterk P, Tatusova T, Ussery DW, White O and Wooley J

    Centre for Ecology & Hydrology, Maclean Building, Crowmarsh Gifford, Wallingford, Oxfordshire, United Kingdom. dfield@ceh.ac.uk

    A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.

    PLoS biology 2011;9;6;e1001088

  • A call for papers for the second special issue of SIGS from the genomic standards consortium

    Field D, STERK P, Kottmann R

    Standards in Genomic Sciences 2011;4;111-2

  • The Deciphering Developmental Disorders (DDD) study.

    Firth HV, Wright CF and DDD Study

    Department of Medical Genetics, Cambridge University Hospitals Foundation Trust, Cambridge, UK.

    Funded by: Wellcome Trust

    Developmental medicine and child neurology 2011;53;8;702-3

  • Germline fitness-based scoring of cancer mutations.

    Fischer A, Greenman C and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    A key goal in cancer research is to find the genomic alterations that underlie malignant cells. Genomics has proved successful in identifying somatic variants at a large scale. However, it has become evident that a typical cancer exhibits a heterogenous mutation pattern across samples. Cases where the same alteration is observed repeatedly seem to be the exception rather than the norm. Thus, pinpointing the key alterations (driver mutations) from a background of variations with no direct causal link to cancer (passenger mutations) is difficult. Here we analyze somatic missense mutations from cancer samples and their healthy tissue counterparts (germline mutations) from the viewpoint of germline fitness. We calibrate a scoring system from protein domain alignments to score mutations and their target loci. We show first that this score predicts to a good degree the rate of polymorphism of the observed germline variation. The scoring is then applied to somatic mutations. We show that candidate cancer genes prone to copy number loss harbor mutations with germline fitness effects that are significantly more deleterious than expected by chance. This suggests that missense mutations play a driving role in tumor suppressor genes. Furthermore, these mutations fall preferably onto loci in sequence neighborhoods that are high scoring in terms of germline fitness. In contrast, for somatic mutations in candidate onco genes we do not observe a statistically significant effect. These results help to inform how to exploit germline fitness predictions in discovering new genes and mutations responsible for cancer.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;188;2;383-93

  • aCGH.Spline--an R package for aCGH dye bias normalization.

    Fitzgerald TW, Larcombe LD, Le Scouarnec S, Clayton S, Rajan D, Carter NP and Redon R

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. tf2@sanger.ac.uk

    Motivation: The careful normalization of array-based comparative genomic hybridization (aCGH) data is of critical importance for the accurate detection of copy number changes. The difference in labelling affinity between the two fluorophores used in aCGH-usually Cy5 and Cy3-can be observed as a bias within the intensity distributions. If left unchecked, this bias is likely to skew data interpretation during downstream analysis and lead to an increased number of false discoveries.

    Results: In this study, we have developed aCGH.Spline, a natural cubic spline interpolation method followed by linear interpolation of outlier values, which is able to remove a large portion of the dye bias from large aCGH datasets in a quick and efficient manner.

    Conclusions: We have shown that removing this bias and reducing the experimental noise has a strong positive impact on the ability to detect accurately both copy number variation (CNV) and copy number alterations (CNA).

    Funded by: Wellcome Trust: WT077008

    Bioinformatics (Oxford, England) 2011;27;9;1195-200

  • Ensembl 2011.

    Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GR, Ruffier M, Schuster M, Sobral D, Spudich G, Tang YA, Trevanion S, Vandrovcova J, Vilella AJ, White S, Wilder SP, Zadissa A, Zamora J, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, Vogel J and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. flicek@ebi.ac.uk

    The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 062023, 077198

    Nucleic acids research 2011;39;Database issue;D800-6

  • Salmonella bongori provides insights into the evolution of the Salmonellae.

    Fookes M, Schroeder GN, Langridge GC, Blondel CJ, Mammina C, Connor TR, Seth-Smith H, Vernikos GS, Robinson KS, Sanders M, Petty NK, Kingsley RA, Bäumler AJ, Nuccio SP, Contreras I, Santiviago CA, Maskell D, Barrow P, Humphrey T, Nastasi A, Roberts M, Frankel G, Parkhill J, Dougan G and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The genus Salmonella contains two species, S. bongori and S. enterica. Compared to the well-studied S. enterica there is a marked lack of information regarding the genetic makeup and diversity of S. bongori. S. bongori has been found predominantly associated with cold-blooded animals, but it can infect humans. To define the phylogeny of this species, and compare it to S. enterica, we have sequenced 28 isolates representing most of the known diversity of S. bongori. This cross-species analysis allowed us to confidently differentiate ancestral functions from those acquired following speciation, which include both metabolic and virulence-associated capacities. We show that, although S. bongori inherited a basic set of Salmonella common virulence functions, it has subsequently elaborated on this in a different direction to S. enterica. It is an established feature of S. enterica evolution that the acquisition of the type III secretion systems (T3SS-1 and T3SS-2) has been followed by the sequential acquisition of genes encoding secreted targets, termed effectors proteins. We show that this is also true of S. bongori, which has acquired an array of novel effector proteins (sboA-L). All but two of these effectors have no significant S. enterica homologues and instead are highly similar to those found in enteropathogenic Escherichia coli (EPEC). Remarkably, SboH is found to be a chimeric effector protein, encoded by a fusion of the T3SS-1 effector gene sopA and a gene highly similar to the EPEC effector nleH from enteropathogenic E. coli. We demonstrate that representatives of these new effectors are translocated and that SboH, similarly to NleH, blocks intrinsic apoptotic pathways while being targeted to the mitochondria by the SopA part of the fusion. This work suggests that S. bongori has inherited the ancestral Salmonella virulence gene set, but has adapted by incorporating virulence determinants that resemble those employed by EPEC.

    Funded by: Medical Research Council; Wellcome Trust: 076964

    PLoS pathogens 2011;7;8;e1002191

  • COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

    Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridge, UK.

    COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136,000 coding mutations in almost 542,000 tumour samples; of the 18,490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867

    Nucleic acids research 2011;39;Database issue;D945-50

  • Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.

    Frampton D, Kerr J, Harrison TJ and Kellam P

    Department of Infection, Division of Infection and Immunity, University College London, London, United Kingdom.

    Chronic fatigue syndrome (CFS) is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.

    PloS one 2011;6;3;e16872

  • Endogenous ion channel complexes: the NMDA receptor.

    Frank RA

    Wellcome Trust Sanger Institute, Genome Campus, Cambridge U.K. rf3@sanger.ac.uk

    Ionotropic receptors, including the NMDAR (N-methyl-D-aspartate receptor) mediate fast neurotransmission, neurodevelopment, neuronal excitability and learning. In the present article, the structure and function of the NMDAR is reviewed with the aim to condense our current understanding and highlight frontiers where important questions regarding the biology of this receptor remain unanswered. In the second part of the present review, new biochemical and genetic approaches for the investigation of ion channel receptor complexes will be discussed.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    Biochemical Society transactions 2011;39;3;707-18

  • Clustered coding variants in the glutamate receptor complexes of individuals with schizophrenia and bipolar disorder.

    Frank RA, McRae AF, Pocklington AJ, van de Lagemaat LN, Navarro P, Croning MD, Komiyama NH, Bradley SJ, Challiss RA, Armstrong JD, Finn RD, Malloy MP, MacLean AW, Harris SE, Starr JM, Bhaskar SS, Howard EK, Hunt SE, Coffey AJ, Ranganath V, Deloukas P, Rogers J, Muir WJ, Deary IJ, Blackwood DH, Visscher PM and Grant SG

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Current models of schizophrenia and bipolar disorder implicate multiple genes, however their biological relationships remain elusive. To test the genetic role of glutamate receptors and their interacting scaffold proteins, the exons of ten glutamatergic 'hub' genes in 1304 individuals were re-sequenced in case and control samples. No significant difference in the overall number of non-synonymous single nucleotide polymorphisms (nsSNPs) was observed between cases and controls. However, cluster analysis of nsSNPs identified two exons encoding the cysteine-rich domain and first transmembrane helix of GRM1 as a risk locus with five mutations highly enriched within these domains. A new splice variant lacking the transmembrane GPCR domain of GRM1 was discovered in the human brain and the GRM1 mutation cluster could perturb the regulation of this variant. The predicted effect on individuals harbouring multiple mutations distributed in their ten hub genes was also examined. Diseased individuals possessed an increased load of deleteriousness from multiple concurrent rare and common coding variants. Together, these data suggest a disease model in which the interplay of compound genetic coding variants, distributed among glutamate receptors and their interacting proteins, contribute to the pathogenesis of schizophrenia and bipolar disorders.

    Funded by: Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: MC_U127592696; Wellcome Trust

    PloS one 2011;6;4;e19011

  • Maternally inherited partial monosomy 9p (pter → p24.1) and partial trisomy 20p (pter → p12.1) characterized by microarray comparative genomic hybridization.

    Freitas ÉL, Gribble SM, Simioni M, Vieira TP, Silva-Grecco RL, Balarin MA, Prigmore E, Krepischi-Santos AC, Rosenberg C, Szuhai K, van Haeringen A, Carter NP and Gil-da-Silva-Lopes VL

    Faculty of Medical Sciences, Department of Medical Genetics, University of Campinas, Campinas, São Paulo, Brazil.

    We report on a 17-year-old patient with midline defects, ocular hypertelorism, neuropsychomotor development delay, neonatal macrosomy, and dental anomalies. DNA copy number investigations using a Whole Genome TilePath array consisting, of 30K BAC/PAC clones showed a 6.36 Mb deletion in the 9p24.1-p24.3 region and a 14.83 Mb duplication in the 20p12.1-p13 region, which derived from a maternal balanced t(9;20)(p24.1;p12.1) as shown by FISH studies. Monosomy 9p is a well-delineated chromosomal syndrome with characteristic clinical features, while chromosome 20p duplication is a rare genetic condition. Only a handful of cases of monosomy 9/trisomy 20 have been previously described. In this report, we compare the phenotype of our patient with those already reported in the literature, and discuss the role of DMRT, DOCK8, FOXD4, VLDLR, RSPO4, AVP, RASSF2, PROKR2, BMP2, MKKS, and JAG1, all genes mapping to the deleted and duplicated regions.

    Funded by: Wellcome Trust: 077008, WT077008

    American journal of medical genetics. Part A 2011;155A;11;2754-61

  • Perilipin deficiency and autosomal dominant partial lipodystrophy.

    Gandotra S, Le Dour C, Bottomley W, Cervera P, Giral P, Reznik Y, Charpentier G, Auclair M, Delépine M, Barroso I, Semple RK, Lathrop M, Lascols O, Capeau J, O'Rahilly S, Magré J, Savage DB and Vigouroux C

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Perilipin is the most abundant adipocyte-specific protein that coats lipid droplets, and it is required for optimal lipid incorporation and release from the droplet. We identified two heterozygous frameshift mutations in the perilipin gene (PLIN1) in three families with partial lipodystrophy, severe dyslipidemia, and insulin-resistant diabetes. Subcutaneous fat from the patients was characterized by smaller-than-normal adipocytes, macrophage infiltration, and fibrosis. In contrast to wild-type perilipin, mutant forms of the protein failed to increase triglyceride accumulation when expressed heterologously in preadipocytes. These findings define a novel dominant form of inherited lipodystrophy and highlight the serious metabolic consequences of a primary defect in the formation of lipid droplets in adipose tissue.

    Funded by: Medical Research Council; Wellcome Trust: 077016, 077016/Z/05/Z, 091551

    The New England journal of medicine 2011;364;8;740-8

  • Meticillin-resistant Staphylococcus aureus with a novel mecA homologue in human and bovine populations in the UK and Denmark: a descriptive study.

    García-Álvarez L, Holden MT, Lindsay H, Webb CR, Brown DF, Curran MD, Walpole E, Brooks K, Pickard DJ, Teale C, Parkhill J, Bentley SD, Edwards GF, Girvan EK, Kearns AM, Pichon B, Hill RL, Larsen AR, Skov RL, Peacock SJ, Maskell DJ and Holmes MA

    Department of Veterinary Medicine, University of Cambridge, UK.

    Background: Animals can act as a reservoir and source for the emergence of novel meticillin-resistant Staphylococcus aureus (MRSA) clones in human beings. Here, we report the discovery of a strain of S aureus (LGA251) isolated from bulk milk that was phenotypically resistant to meticillin but tested negative for the mecA gene and a preliminary investigation of the extent to which such strains are present in bovine and human populations.

    Methods: Isolates of bovine MRSA were obtained from the Veterinary Laboratories Agency in the UK, and isolates of human MRSA were obtained from diagnostic or reference laboratories (two in the UK and one in Denmark). From these collections, we searched for mecA PCR-negative bovine and human S aureus isolates showing phenotypic meticillin resistance. We used whole-genome sequencing to establish the genetic basis for the observed antibiotic resistance.

    Findings: A divergent mecA homologue (mecA(LGA251)) was discovered in the LGA251 genome located in a novel staphylococcal cassette chromosome mec element, designated type-XI SCCmec. The mecA(LGA251) was 70% identical to S aureus mecA homologues and was initially detected in 15 S aureus isolates from dairy cattle in England. These isolates were from three different multilocus sequence type lineages (CC130, CC705, and ST425); spa type t843 (associated with CC130) was identified in 60% of bovine isolates. When human mecA-negative MRSA isolates were tested, the mecA(LGA251) homologue was identified in 12 of 16 isolates from Scotland, 15 of 26 from England, and 24 of 32 from Denmark. As in cows, t843 was the most common spa type detected in human beings.

    Interpretation: Although routine culture and antimicrobial susceptibility testing will identify S aureus isolates with this novel mecA homologue as meticillin resistant, present confirmatory methods will not identify them as MRSA. New diagnostic guidelines for the detection of MRSA should consider the inclusion of tests for mecA(LGA251).

    Funding: Department for Environment, Food and Rural Affairs, Higher Education Funding Council for England, Isaac Newton Trust (University of Cambridge), and the Wellcome Trust.

    Funded by: Wellcome Trust

    The Lancet. Infectious diseases 2011;11;8;595-603

  • Protein-based signatures of functional evolution in Plasmodium falciparum.

    Gardner KB, Sinha I, Bustamante LY, Day NP, White NJ and Woodrow CJ

    Wellcome Trust Mahidol University-Oxford Tropical Medicine Research Unit (MORU), 420/6 Rajwithi Road, Bangkok, 10400 Thailand.

    Background: It has been known for over a decade that Plasmodium falciparum proteins are enriched in non-globular domains of unknown function. The potential for these regions of protein sequence to undergo high levels of genetic drift provides a fundamental challenge to attempts to identify the molecular basis of adaptive change in malaria parasites. Results: Evolutionary comparisons were undertaken using a set of forty P. falciparum metabolic enzyme genes, both within the hominid malaria clade (P. reichenowi) and across the genus (P. chabaudi). All genes contained coding elements highly conserved across the genus, but there were also a large number of regions of weakly or non-aligning coding sequence. These displayed remarkable levels of non-synonymous fixed differences within the hominid malaria clade indicating near complete release from purifying selection (dN/dS ratio at residues non-aligning across genus: 0.64, dN/dS ratio at residues identical across genus: 0.03). Regions of low conservation also possessed high levels of hydrophilicity, a marker of non-globularity. The propensity for such regions to act as potent sources of non-synonymous genetic drift within extant P. falciparum isolates was confirmed at chromosomal regions containing genes known to mediate drug resistance in field isolates, where 150 of 153 amino acid variants were located in poorly conserved regions. In contrast, all 22 amino acid variants associated with drug resistance were restricted to highly conserved regions. Additional mutations associated with laboratory-selected drug resistance, such as those in PfATPase4 selected by spiroindolone, were similarly restricted while mutations in another calcium ATPase (PfSERCA, a gene proposed to mediate artemisinin resistance) that reach significant frequencies in field isolates were located exclusively in poorly conserved regions consistent with genetic drift. Conclusion: Coding sequences of malaria parasites contain prospectively definable domains subject to neutral or nearly neutral evolution on a scale that appears unrivalled in biology. This distinct evolutionary landscape has potential to confound analytical methods developed for other genera. Against this tide of genetic drift, polymorphisms mediating functional change stand out to such an extent that evolutionary context provides a useful signal for identifying the molecular basis of drug resistance in malaria parasites, a finding that is of relevance to both genome-wide and candidate gene studies in this genus.

    Funded by: Wellcome Trust

    BMC evolutionary biology 2011;11;257

  • RNIE: genome-wide prediction of bacterial intrinsic terminators.

    Gardner PP, Barquist L, Bateman A, Nawrocki EP and Weinberg Z

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA0, UK. pg5@sanger.ac.uk

    Bacterial Rho-independent terminators (RITs) are important genomic landmarks involved in gene regulation and terminating gene expression. In this investigation we present RNIE, a probabilistic approach for predicting RITs. The method is based upon covariance models which have been known for many years to be the most accurate computational tools for predicting homology in structural non-coding RNAs. We show that RNIE has superior performance in model species from a spectrum of bacterial phyla. Further analysis of species where a low number of RITs were predicted revealed a highly conserved structural sequence motif enriched near the genic termini of the pathogenic Actinobacteria, Mycobacterium tuberculosis. This motif, together with classical RITs, account for up to 90% of all the significantly structured regions from the termini of M. tuberculosis genic elements. The software, predictions and alignments described below are available from http://github.com/ppgardne/RNIE.

    Funded by: Howard Hughes Medical Institute

    Nucleic acids research 2011;39;14;5845-52

  • Rfam: Wikipedia, clans and the "decimal" release.

    Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA0, USA. pg5@sanger.ac.uk

    The Rfam database aims to catalogue non-coding RNAs through the use of sequence alignments and statistical profile models known as covariance models. In this contribution, we discuss the pros and cons of using the online encyclopedia, Wikipedia, as a source of community-derived annotation. We discuss the addition of groupings of related RNA families into clans and new developments to the website. Rfam is available on the Web at http://rfam.sanger.ac.uk.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: 077044, WT077044/Z/05/Z

    Nucleic acids research 2011;39;Database issue;D141-5

  • Analysis of XMRV integration sites from human prostate cancer tissues suggests PCR contamination rather than genuine human infection.

    Garson JA, Kellam P and Towers GJ

    MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, 46 Cleveland St, London W1T 4JF, UK.

    XMRV is a gammaretrovirus associated in some studies with human prostate cancer and chronic fatigue syndrome. Central to the hypothesis of XMRV as a human pathogen is the description of integration sites in DNA from prostate tumour tissues. Here we demonstrate that 2 of 14 patient-derived sites are identical to sites cloned in the same laboratory from experimentally infected DU145 cells. Identical integration sites have never previously been described in any retrovirus infection. We propose that the patient-derived sites are the result of PCR contamination. This observation further undermines the notion that XMRV is a genuine human pathogen.

    Funded by: Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT076608, WT090940

    Retrovirology 2011;8;13

  • Towards BioDBcore: a community-defined information specification for biological databases.

    Gaudet P, Bairoch A, Field D, Sansone SA, Taylor C, Attwood TK, Bateman A, Blake JA, Bult CJ, Cherry JM, Chisholm RL, Cochrane G, Cook CE, Eppig JT, Galperin MY, Gentleman R, Goble CA, Gojobori T, Hancock JM, Howe DG, Imanishi T, Kelso J, Landsman D, Lewis SE, Mizrachi IK, Orchard S, Ouellette BF, Ranganathan S, Richardson L, Rocca-Serra P, Schofield PN, Smedley D, Southan C, Tan TW, Tatusova T, Whetzel PL, White O, Yamasaki C and BioDBCore Working Group

    The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

    Funded by: NIGMS NIH HHS: R01 GM064426-10, R01 GM087371-03

    Nucleic acids research 2011;39;Database issue;D7-10

  • New gene functions in megakaryopoiesis and platelet formation.

    Gieger C, Radhakrishnan A, Cvejic A, Tang W, Porcu E, Pistis G, Serbanovic-Canic J, Elling U, Goodall AH, Labrune Y, Lopez LM, Mägi R, Meacham S, Okada Y, Pirastu N, Sorice R, Teumer A, Voss K, Zhang W, Ramirez-Solis R, Bis JC, Ellinghaus D, Gögele M, Hottenga JJ, Langenberg C, Kovacs P, O'Reilly PF, Shin SY, Esko T, Hartiala J, Kanoni S, Murgia F, Parsa A, Stephens J, van der Harst P, Ellen van der Schoot C, Allayee H, Attwood A, Balkau B, Bastardot F, Basu S, Baumeister SE, Biino G, Bomba L, Bonnefond A, Cambien F, Chambers JC, Cucca F, D'Adamo P, Davies G, de Boer RA, de Geus EJ, Döring A, Elliott P, Erdmann J, Evans DM, Falchi M, Feng W, Folsom AR, Frazer IH, Gibson QD, Glazer NL, Hammond C, Hartikainen AL, Heckbert SR, Hengstenberg C, Hersch M, Illig T, Loos RJ, Jolley J, Khaw KT, Kühnel B, Kyrtsonis MC, Lagou V, Lloyd-Jones H, Lumley T, Mangino M, Maschio A, Mateo Leach I, McKnight B, Memari Y, Mitchell BD, Montgomery GW, Nakamura Y, Nauck M, Navis G, Nöthlings U, Nolte IM, Porteous DJ, Pouta A, Pramstaller PP, Pullat J, Ring SM, Rotter JI, Ruggiero D, Ruokonen A, Sala C, Samani NJ, Sambrook J, Schlessinger D, Schreiber S, Schunkert H, Scott J, Smith NL, Snieder H, Starr JM, Stumvoll M, Takahashi A, Tang WH, Taylor K, Tenesa A, Lay Thein S, Tönjes A, Uda M, Ulivi S, van Veldhuisen DJ, Visscher PM, Völker U, Wichmann HE, Wiggins KL, Willemsen G, Yang TP, Hua Zhao J, Zitting P, Bradley JR, Dedoussis GV, Gasparini P, Hazen SL, Metspalu A, Pirastu M, Shuldiner AR, Joost van Pelt L, Zwaginga JJ, Boomsma DI, Deary IJ, Franke A, Froguel P, Ganesh SK, Jarvelin MR, Martin NG, Meisinger C, Psaty BM, Spector TD, Wareham NJ, Akkerman JW, Ciullo M, Deloukas P, Greinacher A, Jupe S, Kamatani N, Khadake J, Kooner JS, Penninger J, Prokopenko I, Stemple D, Toniolo D, Wernisch L, Sanna S, Hicks AA, Rendon A, Ferreira MA, Ouwehand WH and Soranzo N

    Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstr 1, 85764 Neuherberg, Germany. christian.gieger@helmholtz-muenchen.de

    Platelets are the second most abundant cell type in blood and are essential for maintaining haemostasis. Their count and volume are tightly controlled within narrow physiological ranges, but there is only limited understanding of the molecular processes controlling both traits. Here we carried out a high-powered meta-analysis of genome-wide association studies (GWAS) in up to 66,867 individuals of European ancestry, followed by extensive biological and functional assessment. We identified 68 genomic loci reliably associated with platelet count and volume mapping to established and putative novel regulators of megakaryopoiesis and platelet formation. These genes show megakaryocyte-specific gene expression patterns and extensive network connectivity. Using gene silencing in Danio rerio and Drosophila melanogaster, we identified 11 of the genes as novel regulators of blood cell formation. Taken together, our findings advance understanding of novel gene functions controlling fate-determining events during megakaryopoiesis and platelet formation, providing a new example of successful translation of GWAS to function.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; British Heart Foundation: RG/09/012/28096; Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: G0601966, G0700704, G0700931, G0701120, G0701863, G0801056, G1000143, MC_U105260799, MC_U106179471, MC_U106188470; NCRR NIH HHS: K12 RR023250, K12 RR023250-05, M01 RR016500-08, U54 RR020278-06, UL1 RR025005, UL1 RR025005-05; NHGRI NIH HHS: P41 HG003751, T32 HG002536; NHLBI NIH HHS: N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC085079, P01 HL076491, P01 HL076491-09, P01 HL098055, P01 HL098055-03, R01 HL059367, R01 HL059367-11, R01 HL068986, R01 HL068986-06, R01 HL073410-08, R01 HL085251, R01 HL085251-04, R01 HL086694, R01 HL086694-05, R01 HL087641, R01 HL087641-03, R01 HL087679-03, R01 HL088119, R01 HL088119-04, R01 HL103866, R01 HL103866-03, R01 HL105756, U01 HL072515, U01 HL072515-06, U01 HL084756, U01 HL084756-03; NIA NIH HHS: R01 AG018728, R01 AG018728-05S1; NICHD NIH HHS: R01 HD042157-01A1; NIDDK NIH HHS: P30 DK072488, P30 DK072488-08; NIGMS NIH HHS: R01 GM053275, R01 GM053275-14, U01 GM074518, U01 GM074518-04; NIMH NIH HHS: RL1 MH083268, RL1 MH083268-05; Wellcome Trust: 092731, 098051, WT077037/Z/05/Z, WT077047/Z/05/Z, WT082597/Z/07/Z

    Nature 2011;480;7376;201-8

  • Common variants near ATM are associated with glycemic response to metformin in type 2 diabetes.

    GoDARTS and UKPDS Diabetes Pharmacogenetics Study Group, Wellcome Trust Case Control Consortium 2, Zhou K, Bellenguez C, Spencer CC, Bennett AJ, Coleman RL, Tavendale R, Hawley SA, Donnelly LA, Schofield C, Groves CJ, Burch L, Carr F, Strange A, Freeman C, Blackwell JM, Bramon E, Brown MA, Casas JP, Corvin A, Craddock N, Deloukas P, Dronov S, Duncanson A, Edkins S, Gray E, Hunt S, Jankowski J, Langford C, Markus HS, Mathew CG, Plomin R, Rautanen A, Sawcer SJ, Samani NJ, Trembath R, Viswanathan AC, Wood NW, MAGIC investigators, Harries LW, Hattersley AT, Doney AS, Colhoun H, Morris AD, Sutherland C, Hardie DG, Peltonen L, McCarthy MI, Holman RR, Palmer CN, Donnelly P and Pearson ER

    Biomedical Research Institute, University of Dundee, Dundee, UK.

    Metformin is the most commonly used pharmacological therapy for type 2 diabetes. We report a genome-wide association study for glycemic response to metformin in 1,024 Scottish individuals with type 2 diabetes with replication in two cohorts including 1,783 Scottish individuals and 1,113 individuals from the UK Prospective Diabetes Study. In a combined meta-analysis, we identified a SNP, rs11212617, associated with treatment success (n = 3,920, P = 2.9 × 10(-9), odds ratio = 1.35, 95% CI 1.22-1.49) at a locus containing ATM, the ataxia telangiectasia mutated gene. In a rat hepatoma cell line, inhibition of ATM with KU-55933 attenuated the phosphorylation and activation of AMP-activated protein kinase in response to metformin. We conclude that ATM, a gene known to be involved in DNA repair and cell cycle control, plays a role in the effect of metformin upstream of AMP-activated protein kinase, and variation in this gene alters glycemic response to metformin.

    Funded by: Chief Scientist Office; Department of Health: PDA/02/06/016; Medical Research Council: G0601261, G0901310, G19/2; Wellcome Trust: 084726, 084726/Z/08/Z, 085475/B/08/Z, 085475/Z/08/Z

    Nature genetics 2011;43;2;117-20

  • In situ thrombolysis for cerebral venous thrombosis complicating anti-leukemic therapy.

    Godfrey AL, Higgins JN, Beer PA, Craig JI and Vassiliou GS

    Department of Haematology, Addenbrooke's Hospital, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Cambridge, UK. alp52@cam.ac.uk

    Leukemia research 2011;35;8;1127-9

  • Transition of Plasmodium sporozoites into liver stage-like forms is regulated by the RNA binding protein Pumilio.

    Gomes-Santos CS, Braks J, Prudêncio M, Carret C, Gomes AR, Pain A, Feltwell T, Khan S, Waters A, Janse C, Mair GR and Mota MM

    Malaria Unit, Instituto de Medicina Molecular, Lisboa, Portugal.

    Many eukaryotic developmental and cell fate decisions that are effected post-transcriptionally involve RNA binding proteins as regulators of translation of key mRNAs. In malaria parasites (Plasmodium spp.), the development of round, non-motile and replicating exo-erythrocytic liver stage forms from slender, motile and cell-cycle arrested sporozoites is believed to depend on environmental changes experienced during the transmission of the parasite from the mosquito vector to the vertebrate host. Here we identify a Plasmodium member of the RNA binding protein family PUF as a key regulator of this transformation. In the absence of Pumilio-2 (Puf2) sporozoites initiate EEF development inside mosquito salivary glands independently of the normal transmission-associated environmental cues. Puf2- sporozoites exhibit genome-wide transcriptional changes that result in loss of gliding motility, cell traversal ability and reduction in infectivity, and, moreover, trigger metamorphosis typical of early Plasmodium intra-hepatic development. These data demonstrate that Puf2 is a key player in regulating sporozoite developmental control, and imply that transformation of salivary gland-resident sporozoites into liver stage-like parasites is regulated by a post-transcriptional mechanism.

    PLoS pathogens 2011;7;5;e1002046

  • No evidence of XMRV or related retroviruses in a London HIV-1-positive patient cohort.

    Gray ER, Garson JA, Breuer J, Edwards S, Kellam P, Pillay D and Towers GJ

    Department of Infection and Immunity, University College London, London, United Kingdom. e.gray@ucl.ac.uk

    Background: Several studies have implicated a recently discovered gammaretrovirus, XMRV (Xenotropic murine leukaemia virus-related virus), in chronic fatigue syndrome and prostate cancer, though whether as causative agent or opportunistic infection is unclear. It has also been suggested that the virus can be found circulating amongst the general population. The discovery has been controversial, with conflicting results from attempts to reproduce the original studies.

    We extracted peripheral blood DNA from a cohort of 540 HIV-1-positive patients (approximately 20% of whom have never been on anti-retroviral treatment) and determined the presence of XMRV and related viruses using TaqMan PCR. While we were able to amplify as few as 5 copies of positive control DNA, we did not find any positive samples in the patient cohort.

    In view of these negative findings in this highly susceptible group, we conclude that it is unlikely that XMRV or related viruses are circulating at a significant level, if at all, in HIV-1-positive patients in London or in the general population.

    Funded by: Department of Health; Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT090940

    PloS one 2011;6;3;e18096

  • Binding of more than one Tva800 molecule is required for ASLV-A entry.

    Gray ER, Illingworth CJ, Coffin JM and Stoye JP

    Division of Virology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.

    Background: Understanding the mechanism by which viruses enter their target cell is an essential part of understanding their infectious cycle. Previous studies have focussed on the multiplicity of viral envelope proteins that need to bind to their cognate receptor to initiate entry. Avian sarcoma and leukosis virus Envelope protein (ASLV Env) mediates entry via a receptor, Tva, which can be attached to the cell surface either by a phospholipid anchor (Tva800) or a transmembrane domain (Tva950). In these studies, we have now investigated the number of target receptors necessary for entry of ASLV Env-pseudotyped virions.

    Results: Using titration and modelling experiments we provide evidence that binding of more than one receptor, probably two, is needed for entry of virions via Tva800. However, binding of just one Tva950 receptor is sufficient for successful entry.

    Conclusions: The different modes of attachment of Tva800 and Tva950 to the cell membrane have important implications for the utilisation of these proteins as receptors for viral binding and/or uptake.

    Funded by: Medical Research Council: U117512710; NCI NIH HHS: R37 CA 089441, R37 CA089441-12; Wellcome Trust: 091747

    Retrovirology 2011;8;96

  • Targets downstream of Cdk8 in Dictyostelium development.

    Greene DM, Bloomfield G, Skelton J, Ivens A and Pears CJ

    Biochemistry Department, Oxford University, South Parks Road, Oxford OX1 3QU UK. catherine.pears@bioch.ox.ac.uk

    Background: Cdk8 is a component of the mediator complex which facilitates transcription by RNA polymerase II and has been shown to play an important role in development of Dictyostelium discoideum. This eukaryote feeds as single cells but starvation triggers the formation of a multicellular organism in response to extracellular pulses of cAMP and the eventual generation of spores. Strains in which the gene encoding Cdk8 have been disrupted fail to form multicellular aggregates unless supplied with exogenous pulses of cAMP and later in development, cdk8- cells show a defect in spore production.

    Results: Microarray analysis revealed that the cdk8- strain previously described (cdk8-HL) contained genome duplications. Regeneration of the strain in a background lacking detectable gene duplication generated strains (cdk8-2) with identical defects in growth and early development, but a milder defect in spore generation, suggesting that the severity of this defect depends on the genetic background. The failure of cdk8- cells to aggregate unless rescued by exogenous pulses of cAMP is consistent with a failure to express the catalytic subunit of protein kinase A. However, overexpression of the gene encoding this protein was not sufficient to rescue the defect, suggesting that this is not the only important target for Cdk8 at this stage of development. Proteomic analysis revealed two potential targets for Cdk8 regulation, one regulated post-transcriptionally (4-hydroxyphenylpyruvate dioxygenase (HPD)) and one transcriptionally (short chain dehydrogenase/reductase (SDR1)).

    Conclusions: This analysis has confirmed the importance of Cdk8 at multiple stages of Dictyostelium development, although the severity of the defect in spore production depends on the genetic background. Potential targets of Cdk8-mediated gene regulation have been identified in Dictyostelium which will allow the mechanism of Cdk8 action and its role in development to be determined.

    BMC developmental biology 2011;11;2

  • LIF-independent JAK signalling to chromatin in embryonic stem cells uncovered from an adult stem cell disease.

    Griffiths DS, Li J, Dawson MA, Trotter MW, Cheng YH, Smith AM, Mansfield W, Liu P, Kouzarides T, Nichols J, Bannister AJ, Green AR and Göttgens B

    Department of Haematology and Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK. dsg29@cam.ac.uk

    Activating mutations in the tyrosine kinase Janus kinase 2 (JAK2) cause myeloproliferative neoplasms, clonal blood stem cell disorders with a propensity for leukaemic transformation. Leukaemia inhibitory factor (LIF) signalling through the JAK-signal transducer and activator of transcription (STAT) pathway enables self-renewal of embryonic stem (ES) cells. Here we show that mouse ES cells carrying the human JAK2V617F mutation were able to self-renew in chemically defined conditions without cytokines or small-molecule inhibitors, independently of JAK signalling through the STAT3 or phosphatidylinositol-3-OH kinase pathways. Phosphorylation of histone H3 tyrosine 41 (H3Y41) by JAK2 was recently shown to interfere with binding of heterochromatin protein 1α (HP1α). Levels of chromatin-bound HP1α were lower in JAK2V617F ES cells but increased following inhibition of JAK2, coincident with a global reduction in histone H3Y41 phosphorylation. JAK2 inhibition reduced levels of the pluripotency regulator Nanog, with a reduction in H3Y41 phosphorylation and concomitant increase in HP1α levels at the Nanog promoter. Furthermore, Nanog was required for factor independence of JAK2V617F ES cells. Taken together, these results uncover a previously unrecognized role for direct signalling to chromatin by JAK2 as an important mediator of ES cell self-renewal.

    Funded by: Cancer Research UK: A8043; Medical Research Council

    Nature cell biology 2011;13;1;13-21

  • BioMart Central Portal: an open database network for the biological community.

    Guberman JM, Ai J, Arnaiz O, Baran J, Blake A, Baldock R, Chelala C, Croft D, Cros A, Cutts RJ, Di Génova A, Forbes S, Fujisawa T, Gadaleta E, Goodstein DM, Gundem G, Haggarty B, Haider S, Hall M, Harris T, Haw R, Hu S, Hubbard S, Hsu J, Iyer V, Jones P, Katayama T, Kinsella R, Kong L, Lawson D, Liang Y, Lopez-Bigas N, Luo J, Lush M, Mason J, Moreews F, Ndegwa N, Oakley D, Perez-Llamas C, Primig M, Rivkin E, Rosanoff S, Shepherd R, Simon R, Skarnes B, Smedley D, Sperling L, Spooner W, Stevenson P, Stone K, Teague J, Wang J, Wang J, Whitty B, Wong DT, Wong-Erasmus M, Yao L, Youens-Clark K, Yung C, Zhang J and Kasprzyk A

    Ontario Institute for Cancer Research, Toronto, M5G 0A3, Canada.

    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.

    Database : the journal of biological databases and curation 2011;2011;bar041

  • Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis.

    Guida A, Lindstädt C, Maguire SL, Ding C, Higgins DG, Corton NJ, Berriman M and Butler G

    School of Medicine and Medical Science, Conway Institute, UniversityCollege Dublin, Belfield, Dublin 4, Ireland.

    Background: Candida parapsilosis is one of the most common causes of Candida infection worldwide. However, the genome sequence annotation was made without experimental validation and little is known about the transcriptional landscape. The transcriptional response of C. parapsilosis to hypoxic (low oxygen) conditions, such as those encountered in the host, is also relatively unexplored.

    Results: We used next generation sequencing (RNA-seq) to determine the transcriptional profile of C. parapsilosis growing in several conditions including different media, temperatures and oxygen concentrations. We identified 395 novel protein-coding sequences that had not previously been annotated. We removed > 300 unsupported gene models, and corrected approximately 900. We mapped the 5' and 3' UTR for thousands of genes. We also identified 422 introns, including two introns in the 3' UTR of one gene. This is the first report of 3' UTR introns in the Saccharomycotina. Comparing the introns in coding sequences with other species shows that small numbers have been gained and lost throughout evolution. Our analysis also identified a number of novel transcriptional active regions (nTARs). We used both RNA-seq and microarray analysis to determine the transcriptional profile of cells grown in normoxic and hypoxic conditions in rich media, and we showed that there was a high correlation between the approaches. We also generated a knockout of the UPC2 transcriptional regulator, and we found that similar to C. albicans, Upc2 is required for conferring resistance to azole drugs, and for regulation of expression of the ergosterol pathway in hypoxia.

    Conclusion: We provide the first detailed annotation of the C. parapsilosis genome, based on gene predictions and transcriptional analysis. We identified a number of novel ORFs and other transcribed regions, and detected transcripts from approximately 90% of the annotated protein coding genes. We found that the transcription factor Upc2 role has a conserved role as a major regulator of the hypoxic response in C. parapsilosis and C. albicans.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    BMC genomics 2011;12;628

  • A PiggyBac-based recessive screening method to identify pluripotency regulators.

    Guo G, Huang Y, Humphreys P, Wang X and Smith A

    Wellcome Trust Centre for Stem Cell Research, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.

    Phenotype driven genetic screens allow unbiased exploration of the genome to discover new biological regulators. Bloom syndrome gene (Blm) deficient embryonic stem (ES) cells provide an opportunity for recessive screening due to frequent loss of heterozygosity. We describe a strategy for isolating regulators of mammalian pluripotency based on conversion to homozygosity of PiggyBac gene trap insertions combined with stringent selection for differentiation resistance. From a screen of 2000 mutants we obtained a disruptive integration in the Tcf3 gene. Homozygous Tcf3 mutants showed impaired differentiation and enhanced self-renewal. This phenotype was reverted in a dosage sensitive manner by excision of one or both copies of the gene trap. These results provide new evidence confirming that Tcf3 is a potent negative regulator of pluripotency and validate a forward screening methodology to identify modulators of pluripotent stem cell biology.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    PloS one 2011;6;4;e18189

  • Mapping a new spontaneous preterm birth susceptibility gene, IGF1R, using linkage, haplotype sharing, and association analysis.

    Haataja R, Karjalainen MK, Luukkonen A, Teramo K, Puttonen H, Ojaniemi M, Varilo T, Chaudhari BP, Plunkett J, Murray JC, McCarroll SA, Peltonen L, Muglia LJ, Palotie A and Hallman M

    Department of Pediatrics, Institute of Clinical Medicine, University of Oulu, Oulu, Finland.

    Preterm birth is the major cause of neonatal death and serious morbidity. Most preterm births are due to spontaneous onset of labor without a known cause or effective prevention. Both maternal and fetal genomes influence the predisposition to spontaneous preterm birth (SPTB), but the susceptibility loci remain to be defined. We utilized a combination of unique population structures, family-based linkage analysis, and subsequent case-control association to identify a susceptibility haplotype for SPTB. Clinically well-characterized SPTB families from northern Finland, a subisolate founded by a relatively small founder population that has subsequently experienced a number of bottlenecks, were selected for the initial discovery sample. Genome-wide linkage analysis using a high-density single-nucleotide polymorphism (SNP) array in seven large northern Finnish non-consanginous families identified a locus on 15q26.3 (HLOD 4.68). This region contains the IGF1R gene, which encodes the type 1 insulin-like growth factor receptor IGF-1R. Haplotype segregation analysis revealed that a 55 kb 12-SNP core segment within the IGF1R gene was shared identical-by-state (IBS) in five families. A follow-up case-control study in an independent sample representing the more general Finnish population showed an association of a 6-SNP IGF1R haplotype with SPTB in the fetuses, providing further evidence for IGF1R as a SPTB predisposition gene (frequency in cases versus controls 0.11 versus 0.05, P = 0.001, odds ratio 2.3). This study demonstrates the identification of a predisposing, low-frequency haplotype in a multifactorial trait using a well-characterized population and a combination of family and case-control designs. Our findings support the identification of the novel susceptibility gene IGF1R for predisposition by the fetal genome to being born preterm.

    Funded by: NCRR NIH HHS: U54 RR020278; NICHD NIH HHS: R01 HD057192-03; Wellcome Trust: WTO89062

    PLoS genetics 2011;7;2;e1001293

  • Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon.

    Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA, Youhanna SC, Saade S, Soria-Hernanz DF, Royyuru A, Wells RS, Tyler-Smith C, Zalloua PA and Genographic Consortium

    The Lebanese American University, Chouran, Beirut, Lebanon.

    Cultural expansions, including of religions, frequently leave genetic traces of differentiation and in-migration. These expansions may be driven by complex doctrinal differentiation, together with major population migrations and gene flow. The aim of this study was to explore the genetic signature of the establishment of religious communities in a region where some of the most influential religions originated, using the Y chromosome as an informative male-lineage marker. A total of 3139 samples were analyzed, including 647 Lebanese and Iranian samples newly genotyped for 28 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y chromosome. Genetic organization was identified by geography and religion across Lebanon in the context of surrounding populations important in the expansions of the major sects of Lebanon, including Italy, Turkey, the Balkans, Syria, and Iran by employing principal component analysis, multidimensional scaling, and AMOVA. Timing of population differentiations was estimated using BATWING, in comparison with dates of historical religious events to determine if these differentiations could be caused by religious conversion, or rather, whether religious conversion was facilitated within already differentiated populations. Our analysis shows that the great religions in Lebanon were adopted within already distinguishable communities. Once religious affiliations were established, subsequent genetic signatures of the older differentiations were reinforced. Post-establishment differentiations are most plausibly explained by migrations of peoples seeking refuge to avoid the turmoil of major historical events.

    Funded by: Wellcome Trust

    European journal of human genetics : EJHG 2011;19;3;334-40

  • Y-chromosome R-M343 African lineages and sickle cell disease reveal structured assimilation in Lebanon.

    Haber M, Platt DE, Khoury S, Badro DA, Abboud M, Tyler-Smith C and Zalloua PA

    Medical School, The Lebanese American University, Beirut, Lebanon.

    We have sought to identify signals of assimilation of African male lines in Lebanon by exploring the association of sickle cell disease (SCD) in Lebanon with Y-chromosome haplogroups that are informative of the disease origin and its exclusivity to the Muslim community. A total of 732 samples were analyzed, including 33 SCD patients from Lebanon genotyped for 28 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y chromosome. Genetic organization was identified using populations known to have influenced the genetic structure of the Lebanese population, in addition to African populations with high incidence of SCD. Y-chromosome haplogroup R-M343 sub-lineages distinguish between sub-Saharan African and Lebanese Y chromosomes. We detected a limited penetration of SCD into Lebanese R-M343 carriers, restricted to Lebanese Muslims. We suggest that this penetration brought the sickle cell gene along with the African R-M343, probably with the Saharan caravan slave trade.

    Funded by: Wellcome Trust: 077009

    Journal of human genetics 2011;56;1;29-33

  • At-risk variant in TCF7L2 for type II diabetes increases risk of schizophrenia.

    Hansen T, Ingason A, Djurovic S, Melle I, Fenger M, Gustafsson O, Jakobsen KD, Rasmussen HB, Tosato S, Rietschel M, Frank J, Owen M, Bonetto C, Suvisaari J, Thygesen JH, Pétursson H, Lönnqvist J, Sigurdsson E, Giegling I, Craddock N, O'Donovan MC, Ruggeri M, Cichon S, Ophoff RA, Pietiläinen O, Peltonen L, Nöthen MM, Rujescu D, St Clair D, Collier DA, Andreassen OA and Werge T

    Mental Health Centre Sct. Hans, Copenhagen University Hospital, Research Institute of Biological Psychiatry, Roskilde, Denmark; Copenhagen University, Center for Pharmacogenomics, Copenhagen, Denmark.

    Background: Schizophrenia is associated with increased risk of type II diabetes and metabolic disorders. However, it is unclear whether this comorbidity reflects shared genetic risk factors, at-risk lifestyle, or side effects of antipsychotic medication.

    Methods: Eleven known risk variants of type II diabetes were genotyped in patients with schizophrenia in a sample of 410 Danish patients, each matched with two healthy control subjects on sex, birth year, and month. Replication was carried out in a large multinational European sample of 4089 patients with schizophrenia and 17,597 controls (SGENE+) using Mantel-Haenszel test.

    Results: One type II diabetes at-risk allele located in TCF7L2, rs7903146 [T], was associated with schizophrenia in the discovery sample (p = .0052) and in the replication with an odds ratio of 1.07 (95% confidence interval 1.01-1.14, p = .033).

    Conclusion: The association reported here with a well-known diabetes variant suggests that the observed comorbidity is partially caused by genetic risk variants. This study also demonstrates how genetic studies can successfully examine an epidemiologically derived hypothesis of comorbidity.

    Biological psychiatry 2011;70;1;59-63

  • A worldwide analysis of beta-defensin copy number variation suggests recent selection of a high-expressing DEFB103 gene copy in East Asia.

    Hardwick RJ, Machado LR, Zuccherato LW, Antolinos S, Xue Y, Shawa N, Gilman RH, Cabrera L, Berg DE, Tyler-Smith C, Kelly P, Tarazona-Santos E and Hollox EJ

    Department of Genetics, University of Leicester, University Road, Leicester, United Kingdom.

    Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region.

    Funded by: Medical Research Council: G0801123, GO801123; Wellcome Trust: 067948, 077009, 087663

    Human mutation 2011;32;7;743-50

  • The transcriptional repressor Blimp1/Prdm1 regulates postnatal reprogramming of intestinal enterocytes.

    Harper J, Mould A, Andrews RM, Bikoff EK and Robertson EJ

    Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom.

    Female mammals produce milk to feed their newborn offspring before teeth develop and permit the consumption of solid food. Intestinal enterocytes dramatically alter their biochemical signature during the suckling-to-weaning transition. The transcriptional repressor Blimp1 is strongly expressed in immature enterocytes in utero, but these are gradually replaced by Blimp1(-) crypt-derived adult enterocytes. Here we used a conditional inactivation strategy to eliminate Blimp1 function in the developing intestinal epithelium. There was no noticeable effect on gross morphology or formation of mature cell types before birth. However, survival of mutant neonates was severely compromised. Transcriptional profiling experiments reveal global changes in gene expression patterns. Key components of the adult enterocyte biochemical signature were substantially and prematurely activated. In contrast, those required for processing maternal milk were markedly reduced. Thus, we conclude Blimp1 governs the developmental switch responsible for postnatal intestinal maturation.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;26;10585-90

  • Genomic Analysis of Hepatitis B Virus Reveals Antigen State and Genotype as Sources of Evolutionary Rate Variation

    Harrison A, Lemey P, HURLES ME, Moyes C, Horn S, Pryor J, Malani J, Supuri M, Masta A, Teriboriki B, Toatu T, Penny D, Rambaut A, Shapiro B

    Viruses-Basel. 2011;3;83-101

  • EpiChIP: gene-by-gene quantification of epigenetic modification levels.

    Hebenstreit D, Gu M, Haider S, Turner DJ, Liò P and Teichmann SA

    MRC Laboratory of Molecular Biology, Hills Rd, CB2 0QH Cambridge, UK. danielh@mrc-lmb.cam.ac.uk

    The combination of chromatin immunoprecipitation with next-generation sequencing technology (ChIP-seq) is a powerful and increasingly popular method for mapping protein-DNA interactions in a genome-wide fashion. The conventional way of analyzing this data is to identify sequencing peaks along the chromosomes that are significantly higher than the read background. For histone modifications and other epigenetic marks, it is often preferable to find a characteristic region of enrichment in sequencing reads relative to gene annotations. For instance, many histone modifications are typically enriched around transcription start sites. Calculating the optimal window that describes this enrichment allows one to quantify modification levels for each individual gene. Using data sets for the H3K9/14ac histone modification in Th cells and an accompanying IgG control, we present an analysis strategy that alternates between single gene and global data distribution levels and allows a clear distinction between experimental background and signal. Curve fitting permits false discovery rate-based classification of genes as modified versus unmodified. We have developed a software package called EpiChIP that carries out this type of analysis, including integration with and visualization of gene expression data.

    Funded by: Medical Research Council: MC_U105161047

    Nucleic acids research 2011;39;5;e27

  • Genome sequence of Staphylococcus lugdunensis N920143 allows identification of putative colonization and virulence factors.

    Heilbronner S, Holden MT, van Tonder A, Geoghegan JA, Foster TJ, Parkhill J and Bentley SD

    Microbiology Department, Trinity College, Dublin, Ireland.

    Staphylococcus lugdunensis is an opportunistic pathogen related to Staphylococcus aureus and Staphylococcus epidermidis. The genome sequence of S. lugdunensis strain N920143 has been compared with other staphylococci, and genes were identified that could promote survival of S. lugdunensis on human skin and pathogenesis of infections. Staphylococcus lugdunensis lacks virulence factors that characterize S. aureus and harbours a smaller number of genes encoding surface proteins. It is the only staphylococcal species other than S. aureus that possesses a locus encoding iron-regulated surface determinant (Isd) proteins involved in iron acquisition from haemoglobin.

    Funded by: Wellcome Trust

    FEMS microbiology letters 2011;322;1;60-7

  • Exome sequencing identifies a missense mutation in Isl1 associated with low penetrance otitis media in dearisch mice.

    Hilton JM, Lewis MA, Grati M, Ingham N, Pearson S, Laskowski RA, Adams DJ and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Background: Inflammation of the middle ear (otitis media) is very common and can lead to serious complications if not resolved. Genetic studies suggest an inherited component, but few of the genes that contribute to this condition are known. Mouse mutants have contributed significantly to the identification of genes predisposing to otitis media

    Results: The dearisch mouse mutant is an ENU-induced mutant detected by its impaired Preyer reflex (ear flick in response to sound). Auditory brainstem responses revealed raised thresholds from as early as three weeks old. Pedigree analysis suggested a dominant but partially penetrant mode of inheritance. The middle ear of dearisch mutants shows a thickened mucosa and cellular effusion suggesting chronic otitis media with effusion with superimposed acute infection. The inner ear, including the sensory hair cells, appears normal. Due to the low penetrance of the phenotype, normal backcross mapping of the mutation was not possible. Exome sequencing was therefore employed to identify a non-conservative tyrosine to cysteine (Y71C) missense mutation in the Islet1 gene, Isl1(Drsh). Isl1 is expressed in the normal middle ear mucosa. The findings suggest the Isl1(Drsh) mutation is likely to predispose carriers to otitis media.

    Conclusions: Dearisch, Isl1(Drsh), represents the first point mutation in the mouse Isl1 gene and suggests a previously unrecognized role for this gene. It is also the first recorded exome sequencing of the C3HeB/FeJ background relevant to many ENU-induced mutants. Most importantly, the power of exome resequencing to identify ENU-induced mutations without a mapped gene locus is illustrated.

    Funded by: Medical Research Council: G0300212, G0800024, MC_QA137918; Wellcome Trust: 077189

    Genome biology 2011;12;9;R90

  • Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease.

    Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, Carrasquillo MM, Abraham R, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Jones N, Stretton A, Thomas C, Richards A, Ivanov D, Widdowson C, Chapman J, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Beaumont H, Warden D, Wilcock G, Love S, Kehoe PG, Hooper NM, Vardy ER, Hardy J, Mead S, Fox NC, Rossor M, Collinge J, Maier W, Jessen F, Rüther E, Schürmann B, Heun R, Kölsch H, van den Bussche H, Heuser I, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Gallacher J, Hüll M, Rujescu D, Giegling I, Goate AM, Kauwe JS, Cruchaga C, Nowotny P, Morris JC, Mayo K, Sleegers K, Bettens K, Engelborghs S, De Deyn PP, Van Broeckhoven C, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel KH, Klopp N, Wichmann HE, Pankratz VS, Sando SB, Aasly JO, Barcikowska M, Wszolek ZK, Dickson DW, Graff-Radford NR, Petersen RC, Alzheimer's Disease Neuroimaging Initiative, van Duijn CM, Breteler MM, Ikram MA, DeStefano AL, Fitzpatrick AL, Lopez O, Launer LJ, Seshadri S, CHARGE consortium, Berr C, Campion D, Epelbaum J, Dartigues JF, Tzourio C, Alpérovitch A, Lathrop M, EADI1 consortium, Feulner TM, Friedrich P, Riehle C, Krawczak M, Schreiber S, Mayhaus M, Nicolhaus S, Wagenpfeil S, Steinberg S, Stefansson H, Stefansson K, Snaedal J, Björnsson S, Jonsson PV, Chouraki V, Genier-Boley B, Hiltunen M, Soininen H, Combarros O, Zelenika D, Delepine M, Bullido MJ, Pasquier F, Mateo I, Frank-Garcia A, Porcellini E, Hanon O, Coto E, Alvarez V, Bosco P, Siciliano G, Mancuso M, Panza F, Solfrizzi V, Nacmias B, Sorbi S, Bossù P, Piccardi P, Arosio B, Annoni G, Seripa D, Pilotto A, Scarpini E, Galimberti D, Brice A, Hannequin D, Licastro F, Jones L, Holmans PA, Jonsson T, Riemenschneider M, Morgan K, Younkin SG, Owen MJ, O'Donovan M, Amouyel P and Williams J

    Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Neurosciences and Mental Health Research Institute, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, UK.

    We sought to identify new susceptibility loci for Alzheimer's disease through a staged association study (GERAD+) and by testing suggestive loci reported by the Alzheimer's Disease Genetic Consortium (ADGC) in a companion paper. We undertook a combined analysis of four genome-wide association datasets (stage 1) and identified ten newly associated variants with P ≤ 1 × 10(-5). We tested these variants for association in an independent sample (stage 2). Three SNPs at two loci replicated and showed evidence for association in a further sample (stage 3). Meta-analyses of all data provided compelling evidence that ABCA7 (rs3764650, meta P = 4.5 × 10(-17); including ADGC data, meta P = 5.0 × 10(-21)) and the MS4A gene cluster (rs610932, meta P = 1.8 × 10(-14); including ADGC data, meta P = 1.2 × 10(-16)) are new Alzheimer's disease susceptibility loci. We also found independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance: CD2AP (GERAD+, P = 8.0 × 10(-4); including ADGC data, meta P = 8.6 × 10(-9)), CD33 (GERAD+, P = 2.2 × 10(-4); including ADGC data, meta P = 1.6 × 10(-9)) and EPHA1 (GERAD+, P = 3.4 × 10(-4); including ADGC data, meta P = 6.0 × 10(-10)).

    Funded by: Medical Research Council: G0300429(66813), G9810900(63319); Wellcome Trust: 082604

    Nature genetics 2011;43;5;429-35

  • Effect modification by population dietary folate on the association between MTHFR genotype, homocysteine, and stroke risk: a meta-analysis of genetic studies and randomised trials.

    Holmes MV, Newcombe P, Hubacek JA, Sofat R, Ricketts SL, Cooper J, Breteler MM, Bautista LE, Sharma P, Whittaker JC, Smeeth L, Fowkes FG, Algra A, Shmeleva V, Szolnoki Z, Roest M, Linnebank M, Zacho J, Nalls MA, Singleton AB, Ferrucci L, Hardy J, Worrall BB, Rich SS, Matarin M, Norman PE, Flicker L, Almeida OP, van Bockxmeer FM, Shimokata H, Khaw KT, Wareham NJ, Bobak M, Sterne JA, Smith GD, Talmud PJ, van Duijn C, Humphries SE, Price JF, Ebrahim S, Lawlor DA, Hankey GJ, Meschia JF, Sandhu MS, Hingorani AD and Casas JP

    Research Department of Epidemiology and Public Health, University College London, London, UK.

    Background: The MTHFR 677C→T polymorphism has been associated with raised homocysteine concentration and increased risk of stroke. A previous overview showed that the effects were greatest in regions with low dietary folate consumption, but differentiation between the effect of folate and small-study bias was difficult. A meta-analysis of randomised trials of homocysteine-lowering interventions showed no reduction in coronary heart disease events or stroke, but the trials were generally set in populations with high folate consumption. We aimed to reduce the effect of small-study bias and investigate whether folate status modifies the association between MTHFR 677C→T and stroke in a genetic analysis and meta-analysis of randomised controlled trials.

    Methods: We established a collaboration of genetic studies consisting of 237 datasets including 59,995 individuals with data for homocysteine and 20,885 stroke events. We compared the genetic findings with a meta-analysis of 13 randomised trials of homocysteine-lowering treatments and stroke risk (45,549 individuals, 2314 stroke events, 269 transient ischaemic attacks).

    Findings: The effect of the MTHFR 677C→T variant on homocysteine concentration was larger in low folate regions (Asia; difference between individuals with TT versus CC genotype, 3·12 μmol/L, 95% CI 2·23 to 4·01) than in areas with folate fortification (America, Australia, and New Zealand, high; 0·13 μmol/L, -0·85 to 1·11). The odds ratio (OR) for stroke was also higher in Asia (1·68, 95% CI 1·44 to 1·97) than in America, Australia, and New Zealand, high (1·03, 0·84 to 1·25). Most randomised trials took place in regions with high or increasing population folate concentrations. The summary relative risk (RR) of stroke in trials of homocysteine-lowering interventions (0·94, 95% CI 0·85 to 1·04) was similar to that predicted for the same extent of homocysteine reduction in large genetic studies in populations with similar folate status (predicted RR 1·00, 95% CI 0·90 to 1·11). Although the predicted effect of homocysteine reduction from large genetic studies in low folate regions (Asia) was larger (RR 0·78, 95% CI 0·68 to 0·90), no trial has evaluated the effect of lowering of homocysteine on stroke risk exclusively in a low folate region.

    Interpretation: In regions with increasing levels or established policies of population folate supplementation, evidence from genetic studies and randomised trials is concordant in suggesting an absence of benefit from lowering of homocysteine for prevention of stroke. Further large-scale genetic studies of the association between MTHFR 677C→T and stroke in low folate settings are needed to distinguish effect modification by folate from small-study bias. If future randomised trials of homocysteine-lowering interventions for stroke prevention are undertaken, they should take place in regions with low folate consumption.

    Funding: Full funding sources listed at end of paper (see Acknowledgments).

    Funded by: British Heart Foundation: FS/07/01, FS05/125; Department of Health; Medical Research Council: G0600580, G0600705, G0802432; NINDS NIH HHS: R01 NS39987, R01 NS42733; Wellcome Trust: 082178

    Lancet 2011;378;9791;584-94

  • A very early-branching Staphylococcus aureus lineage lacking the carotenoid pigment staphyloxanthin.

    Holt DC, Holden MT, Tong SY, Castillo-Ramirez S, Clarke L, Quail MA, Currie BJ, Parkhill J, Bentley SD, Feil EJ and Giffard PM

    Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia.

    Here we discuss the evolution of the northern Australian Staphylococcus aureus isolate MSHR1132 genome. MSHR1132 belongs to the divergent clonal complex 75 lineage. The average nucleotide divergence between orthologous genes in MSHR1132 and typical S. aureus is approximately sevenfold greater than the maximum divergence observed in this species to date. MSHR1132 has a small accessory genome, which includes the well-characterized genomic islands, νSAα and νSaβ, suggesting that these elements were acquired well before the expansion of the typical S. aureus population. Other mobile elements show mosaic structure (the prophage ϕSa3) or evidence of recent acquisition from a typical S. aureus lineage (SCCmec, ICE6013 and plasmid pMSHR1132). There are two differences in gene repertoire compared with typical S. aureus that may be significant clues as to the genetic basis underlying the successful emergence of S. aureus as a pathogen. First, MSHR1132 lacks the genes for production of staphyloxanthin, the carotenoid pigment that confers upon S. aureus its characteristic golden color and protects against oxidative stress. The lack of pigment was demonstrated in 126 of 126 CC75 isolates. Second, a mobile clustered regularly interspaced short palindromic repeat (CRISPR) element is inserted into orfX of MSHR1132. Although common in other staphylococcal species, these elements are very rare within S. aureus and may impact accessory genome acquisition. The CRISPR spacer sequences reveal a history of attempted invasion by known S. aureus mobile elements. There is a case for the creation of a new taxon to accommodate this and related isolates.

    Genome biology and evolution 2011;3;881-95

  • Temporal fluctuation of multidrug resistant salmonella typhi haplotypes in the mekong river delta region of Vietnam.

    Holt KE, Dolecek C, Chau TT, Duy PT, La TT, Hoang NV, Nga TV, Campbell JI, Manh BH, Vinh Chau NV, Hien TT, Farrar J, Dougan G and Baker S

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. kholt@unimelb.edu.au

    Background: typhoid fever remains a public health problem in Vietnam, with a significant burden in the Mekong River delta region. Typhoid fever is caused by the bacterial pathogen Salmonella enterica serovar Typhi (S. Typhi), which is frequently multidrug resistant with reduced susceptibility to fluoroquinolone-based drugs, the first choice for the treatment of typhoid fever. We used a GoldenGate (Illumina) assay to type 1,500 single nucleotide polymorphisms (SNPs) and analyse the genetic variation of S. Typhi isolated from 267 typhoid fever patients in the Mekong delta region participating in a randomized trial conducted between 2004 and 2005.

    the population of S. Typhi circulating during the study was highly clonal, with 91% of isolates belonging to a single clonal complex of the S. Typhi H58 haplogroup. The patterns of disease were consistent with the presence of an endemic haplotype H58-C and a localised outbreak of S. Typhi haplotype H58-E2 in 2004. H58-E2-associated typhoid fever cases exhibited evidence of significant geo-spatial clustering along the Sông H u branch of the Mekong River. Multidrug resistance was common in the established clone H58-C but not in the outbreak clone H58-E2, however all H58 S. Typhi were nalidixic acid resistant and carried a Ser83Phe amino acid substitution in the gyrA gene.

    Significance: the H58 haplogroup dominates S. Typhi populations in other endemic areas, but the population described here was more homogeneous than previously examined populations, and the dominant clonal complex (H58-C, -E1, -E2) observed in this study has not been detected outside Vietnam. IncHI1 plasmid-bearing S. Typhi H58-C was endemic during the study period whilst H58-E2, which rarely carried the plasmid, was only transient, suggesting a selective advantage for the plasmid. These data add insight into the outbreak dynamics and local molecular epidemiology of S. Typhi in southern Vietnam.

    Funded by: Wellcome Trust

    PLoS neglected tropical diseases 2011;5;1;e929

  • Emergence of a globally dominant IncHI1 plasmid type associated with multiple drug resistant typhoid.

    Holt KE, Phan MD, Baker S, Duy PT, Nga TV, Nair S, Turner AK, Walsh C, Fanning S, Farrell-Ward S, Dutta S, Kariuki S, Weill FX, Parkhill J, Dougan G and Wain J

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. kholt@unimelb.edu.au

    Typhoid fever, caused by Salmonella enterica serovar Typhi (S. Typhi), remains a serious global health concern. Since their emergence in the mid-1970s multi-drug resistant (MDR) S. Typhi now dominate drug sensitive equivalents in many regions. MDR in S. Typhi is almost exclusively conferred by self-transmissible IncHI1 plasmids carrying a suite of antimicrobial resistance genes. We identified over 300 single nucleotide polymorphisms (SNPs) within conserved regions of the IncHI1 plasmid, and genotyped both plasmid and chromosomal SNPs in over 450 S. Typhi dating back to 1958. Prior to 1995, a variety of IncHI1 plasmid types were detected in distinct S. Typhi haplotypes. Highly similar plasmids were detected in co-circulating S. Typhi haplotypes, indicative of plasmid transfer. In contrast, from 1995 onwards, 98% of MDR S. Typhi were plasmid sequence type 6 (PST6) and S. Typhi haplotype H58, indicating recent global spread of a dominant MDR clone. To investigate whether PST6 conferred a selective advantage compared to other IncHI1 plasmids, we used a phenotyping array to compare the impact of IncHI1 PST6 and PST1 plasmids in a common S. Typhi host. The PST6 plasmid conferred the ability to grow in high salt medium (4.7% NaCl), which we demonstrate is due to the presence in PST6 of the Tn6062 transposon encoding BetU.

    Funded by: Wellcome Trust

    PLoS neglected tropical diseases 2011;5;7;e1245

  • A homozygous mutant embryonic stem cell bank applicable for phenotype-driven genetic screening.

    Horie K, Kokubu C, Yoshida J, Akagi K, Isotani A, Oshitani A, Yusa K, Ikeda R, Huang Y, Bradley A and Takeda J

    Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan. horie@mr-envi.med.osaka-u.ac.jp

    Genome-wide mutagenesis in mouse embryonic stem cells (ESCs) is a powerful tool, but the diploid nature of the mammalian genome hampers its application for recessive genetic screening. We have previously reported a method to induce homozygous mutant ESCs from heterozygous mutants by tetracycline-dependent transient disruption of the Bloom's syndrome gene. However, we could not purify homozygous mutants from a large population of heterozygous mutant cells, limiting the applications. Here we developed a strategy for rapid enrichment of homozygous mutant mouse ESCs and demonstrated its feasibility for cell-based phenotypic analysis. The method uses G418-plus-puromycin double selection to enrich for homozygotes and single-nucleotide polymorphism analysis for identification of homozygosity. We combined this simple approach with gene-trap mutagenesis to construct a homozygous mutant ESC bank with 138 mutant lines and demonstrate its use in phenotype-driven genetic screening.

    Nature methods 2011;8;12;1071-7

  • Large duplications at reciprocal translocation breakpoints that might be the counterpart of large deletions and could arise from stalled replication bubbles.

    Howarth KD, Pole JC, Beavis JC, Batty EM, Newman S, Bignell GR and Edwards PA

    Hutchison/MRC Research Centre and Department of Pathology, University of Cambridge, Cambridge, UK. kdh29@cam.ac.uk

    Reciprocal chromosome translocations are often not exactly reciprocal. Most familiar are deletions at the breakpoints, up to megabases in extent. We describe here the opposite phenomenon-duplication of tens or hundreds of kilobases at the breakpoint junction, so that the same sequence is present on both products of a translocation. When the products of the translocation are mapped on the genome, they overlap. We report several of these "overlapping-breakpoint" duplications in breast cancer cell lines HCC1187, HCC1806, and DU4475. These lines also had deletions and essentially balanced translocations. In HCC1187 and HCC1806, we identified five cases of duplication ranging between 46 kb and 200 kb, with the partner chromosome showing deletions between 29 bp and 31 Mb. DU4475 had a duplication of at least 200 kb. Breakpoints were mapped using array painting, i.e., hybridization of chromosomes isolated by flow cytometry to custom oligonucleotide microarrays. Duplications were verified by fluorescent in situ hybridization (FISH), PCR on isolated chromosomes, and cloning of breakpoints. We propose that these duplications are the counterpart of deletions and that they are produced at a replication bubble, comprising two replication forks with the duplicated sequence in between. Both copies of the duplicated sequence would go to one daughter cell, on different products of the translocation, while the other daughter cell would show deletion. These duplications may have been overlooked because they may be missed by FISH and array-CGH and may be interpreted as insertions by paired-end sequencing. Such duplications may therefore be quite frequent.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Genome research 2011;21;4;525-34

  • The genetic structure of the Swedish population.

    Humphreys K, Grankvist A, Leu M, Hall P, Liu J, Ripatti S, Rehnström K, Groop L, Klareskog L, Ding B, Grönberg H, Xu J, Pedersen NL, Lichtenstein P, Mattingsdal M, Andreassen OA, O'Dushlaine C, Purcell SM, Sklar P, Sullivan PF, Hultman CM, Palmgren J and Magnusson PK

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

    Patterns of genetic diversity have previously been shown to mirror geography on a global scale and within continents and individual countries. Using genome-wide SNP data on 5174 Swedes with extensive geographical coverage, we analyzed the genetic structure of the Swedish population. We observed strong differences between the far northern counties and the remaining counties. The population of Dalarna county, in north middle Sweden, which borders southern Norway, also appears to differ markedly from other counties, possibly due to this county having more individuals with remote Finnish or Norwegian ancestry than other counties. An analysis of genetic differentiation (based on pairwise F(st)) indicated that the population of Sweden's southernmost counties are genetically closer to the HapMap CEU samples of Northern European ancestry than to the populations of Sweden's northernmost counties. In a comparison of extended homozygous segments, we detected a clear divide between southern and northern Sweden with small differences between the southern counties and considerably more segments in northern Sweden. Both the increased degree of homozygosity in the north and the large genetic differences between the south and the north may have arisen due to a small population in the north and the vast geographical distances between towns and villages in the north, in contrast to the more densely settled southern parts of Sweden. Our findings have implications for future genome-wide association studies (GWAS) with respect to the matching of cases and controls and the need for within-county matching. We have shown that genetic differences within a single country may be substantial, even when viewed on a European scale. Thus, population stratification needs to be accounted for, even within a country like Sweden, which is often perceived to be relatively homogenous and a favourable resource for genetic mapping, otherwise inferences based on genetic data may lead to false conclusions.

    Funded by: NCI NIH HHS: R01 CA58427; NIMH NIH HHS: MH077139

    PloS one 2011;6;8;e22547

  • An activating mutation of AKT2 and human hypoglycemia.

    Hussain K, Challis B, Rocha N, Payne F, Minic M, Thompson A, Daly A, Scott C, Harris J, Smillie BJ, Savage DB, Ramaswami U, De Lonlay P, O'Rahilly S, Barroso I and Semple RK

    Clinical and Molecular Genetics Unit, Developmental Endocrinology Research Group, Institute of Child Health, University College London, London WC1N 1EH, UK.

    Pathological fasting hypoglycemia in humans is usually explained by excessive circulating insulin or insulin-like molecules or by inborn errors of metabolism impairing liver glucose production. We studied three unrelated children with unexplained, recurrent, and severe fasting hypoglycemia and asymmetrical growth. All were found to carry the same de novo mutation, p.Glu17Lys, in the serine/threonine kinase AKT2, in two cases as heterozygotes and in one case in mosaic form. In heterologous cells, the mutant AKT2 was constitutively recruited to the plasma membrane, leading to insulin-independent activation of downstream signaling. Thus, systemic metabolic disease can result from constitutive, cell-autonomous activation of signaling pathways normally controlled by insulin.

    Funded by: Medical Research Council: G0502115; Wellcome Trust: 077016, 077016/Z/05/Z, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z, 091551, 091551/Z/10/Z, 095515

    Science (New York, N.Y.) 2011;334;6055;474

  • Large-scale gene-centric analysis identifies novel variants for coronary artery disease.

    IBC 50K CAD Consortium

    Coronary artery disease (CAD) has a significant genetic contribution that is incompletely characterized. To complement genome-wide association (GWA) studies, we conducted a large and systematic candidate gene study of CAD susceptibility, including analysis of many uncommon and functional variants. We examined 49,094 genetic variants in ∼2,100 genes of cardiovascular relevance, using a customised gene array in 15,596 CAD cases and 34,992 controls (11,202 cases and 30,733 controls of European descent; 4,394 cases and 4,259 controls of South Asian origin). We attempted to replicate putative novel associations in an additional 17,121 CAD cases and 40,473 controls. Potential mechanisms through which the novel variants could affect CAD risk were explored through association tests with vascular risk factors and gene expression. We confirmed associations of several previously known CAD susceptibility loci (eg, 9p21.3:p<10(-33); LPA:p<10(-19); 1p13.3:p<10(-17)) as well as three recently discovered loci (COL4A1/COL4A2, ZC3HC1, CYP17A1:p<5×10(-7)). However, we found essentially null results for most previously suggested CAD candidate genes. In our replication study of 24 promising common variants, we identified novel associations of variants in or near LIPA, IL5, TRIB1, and ABCG5/ABCG8, with per-allele odds ratios for CAD risk with each of the novel variants ranging from 1.06-1.09. Associations with variants at LIPA, TRIB1, and ABCG5/ABCG8 were supported by gene expression data or effects on lipid levels. Apart from the previously reported variants in LPA, none of the other ∼4,500 low frequency and functional variants showed a strong effect. Associations in South Asians did not differ appreciably from those in Europeans, except for 9p21.3 (per-allele odds ratio: 1.14 versus 1.27 respectively; P for heterogeneity = 0.003). This large-scale gene-centric analysis has identified several novel genes for CAD that relate to diverse biochemical and cellular functions and clarified the literature with regard to many previously suggested genes.

    Funded by: British Heart Foundation: RG/08/014/24067, RG/09/12/28096; Medical Research Council: G0401527, G0601966, G0700931, G0701863, G0801056, G1000143, MC_U105260792, MC_U106179471; NHLBI NIH HHS: R01 HL087647; Wellcome Trust: 090532

    PLoS genetics 2011;7;9;e1002260

  • Distinguishing driver and passenger mutations in an evolutionary history categorized by interference.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    In many biological scenarios, from the development of drug resistance in pathogens to the progression of healthy cells toward cancer, quantifying the selection acting on observed mutations is a central question. One difficulty in answering this question is the complexity of the background upon which mutations can arise, with multiple potential interactions between genetic loci. We here present a method for discerning selection from a population history that accounts for interference between mutations. Given sequences sampled from multiple time points in the history of a population, we infer selection at each locus by maximizing a likelihood function derived from a multilocus evolution model. We apply the method to the question of distinguishing between loci where new mutations are under positive selection (drivers) and loci that emit neutral mutations (passengers) in a Wright-Fisher model of evolution. Relative to an otherwise equivalent method in which the genetic background of mutations was ignored, our method inferred selection coefficients more accurately for both driver mutations evolving under clonal interference and passenger mutations reaching fixation in the population through genetic drift or hitchhiking. In a population history recorded by 750 sets of sequences of 100 individuals taken at intervals of 100 generations, a set of 50 loci were divided into drivers and passengers with a mean accuracy of >0.95 across a range of numbers of driver loci. The potential application of our model, either in full or in part, to a range of biological systems, is discussed.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;189;3;989-1000

  • Maternally derived microduplications at 15q11-q13: implication of imprinted genes in psychotic illness.

    Ingason A, Kirov G, Giegling I, Hansen T, Isles AR, Jakobsen KD, Kristinsson KT, le Roux L, Gustafsson O, Craddock N, Möller HJ, McQuillin A, Muglia P, Cichon S, Rietschel M, Ophoff RA, Djurovic S, Andreassen OA, Pietiläinen OP, Peltonen L, Dempster E, Collier DA, St Clair D, Rasmussen HB, Glenthøj BY, Kiemeney LA, Franke B, Tosato S, Bonetto C, Saemundsen E, Hreidarsson SJ, GROUP Investigators, Nöthen MM, Gurling H, O'Donovan MC, Owen MJ, Sigurdsson E, Petursson H, Stefansson H, Rujescu D, Stefansson K and Werge T

    Research Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark.

    Objective: Rare copy number variants have been implicated in different neurodevelopmental disorders, with the same copy number variants often increasing risk of more than one of these phenotypes. In a discovery sample of 22 schizophrenia patients with an early onset of illness (10-15 years of age), the authors observed in one patient a maternally derived 15q11-q13 duplication overlapping the Prader-Willi/Angelman syndrome critical region. This prompted investigation of the role of 15q11-q13 duplications in psychotic illness. Method: The authors scanned 7,582 patients with schizophrenia or schizoaffective disorder and 41,370 comparison subjects without known psychiatric illness for copy number variants at 15q11-q13 and determined the parental origin of duplications using methylation-sensitive Southern hybridization analysis. Results: Duplications were found in four case patients and five comparison subjects. All four case patients had maternally derived duplications (0.05%), while only three of the five comparison duplications were maternally derived (0.007%), resulting in a significant excess of maternally derived duplications in case patients (odds ratio=7.3). This excess is compatible with earlier observations that risk for psychosis in people with Prader-Willi syndrome caused by maternal uniparental disomy is much higher than in those caused by deletion of the paternal chromosome. Conclusions: These findings suggest that the presence of two maternal copies of a fragment of chromosome 15q11.2-q13.1 that overlaps with the Prader-Willi/Angelman syndrome critical region may be a rare risk factor for schizophrenia and other psychoses. Given that maternal duplications of this region are among the most consistent cytogenetic observations in autism, the findings provide further support for a shared genetic etiology between autism and psychosis.

    Funded by: Medical Research Council; NIMH NIH HHS: MH071425; Wellcome Trust: 089061

    The American journal of psychiatry 2011;168;4;408-17

  • Genotype Calling

    Inouye,M. and Teo,Y.Y.

    Analysis of Complex Disease Association Studies 2011;Chapter 5;69-86

  • Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study.

    InterAct Consortium, Langenberg C, Sharp S, Forouhi NG, Franks PW, Schulze MB, Kerrison N, Ekelund U, Barroso I, Panico S, Tormo MJ, Spranger J, Griffin S, van der Schouw YT, Amiano P, Ardanaz E, Arriola L, Balkau B, Barricarte A, Beulens JW, Boeing H, Bueno-de-Mesquita HB, Buijsse B, Chirlaque Lopez MD, Clavel-Chapelon F, Crowe FL, de Lauzon-Guillan B, Deloukas P, Dorronsoro M, Drogan D, Froguel P, Gonzalez C, Grioni S, Groop L, Groves C, Hainaut P, Halkjaer J, Hallmans G, Hansen T, Huerta Castaño JM, Kaaks R, Key TJ, Khaw KT, Koulman A, Mattiello A, Navarro C, Nilsson P, Norat T, Overvad K, Palla L, Palli D, Pedersen O, Peeters PH, Quirós JR, Ramachandran A, Rodriguez-Suarez L, Rolandsson O, Romaguera D, Romieu I, Sacerdote C, Sánchez MJ, Sandbaek A, Slimani N, Sluijs I, Spijkerman AM, Teucher B, Tjonneland A, Tumino R, van der A DL, Verschuren WM, Tuomilehto J, Feskens E, McCarthy M, Riboli E and Wareham NJ

    Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital, Box 285, Cambridge CB2 0QQ, UK e-mail: claudia.langenberg@mrc-epid.cam.ac.uk

    Studying gene-lifestyle interaction may help to identify lifestyle factors that modify genetic susceptibility and uncover genetic loci exerting important subgroup effects. Adequately powered studies with prospective, unbiased, standardised assessment of key behavioural factors for gene-lifestyle studies are lacking. This case-cohort study aims to investigate how genetic and potentially modifiable lifestyle and behavioural factors, particularly diet and physical activity, interact in their influence on the risk of developing type 2 diabetes.

    Methods: Incident cases of type 2 diabetes occurring in European Prospective Investigation into Cancer and Nutrition (EPIC) cohorts between 1991 and 2007 from eight of the ten EPIC countries were ascertained and verified. Prentice-weighted Cox regression and random-effects meta-analyses were used to investigate differences in diabetes incidence by age and sex.

    Results: A total of 12,403 verified incident cases of type 2 diabetes occurred during 3.99 million person-years of follow-up of 340,234 EPIC participants eligible for InterAct. We defined a centre-stratified subcohort of 16,154 individuals for comparative analyses. Individuals with incident diabetes who were randomly selected into the subcohort (n = 778) were included as cases in the analyses. All prevalent diabetes cases were excluded from the study. InterAct cases were followed-up for an average of 6.9 years; 49.7% were men. Mean baseline age and age at diagnosis were 55.6 and 62.5 years, mean BMI and waist circumference values were 29.4 kg/m(2) and 102.7 cm in men, and 30.1 kg/m(2) and 92.8 cm in women, respectively. Risk of type 2 diabetes increased linearly with age, with an overall HR of 1.56 (95% CI 1.48-1.64) for a 10 year age difference, adjusted for sex. A male excess in the risk of incident diabetes was consistently observed across all countries, with a pooled HR of 1.51 (95% CI 1.39-1.64), adjusted for age.

    InterAct is a large, well-powered, prospective study that will inform our understanding of the interplay between genes and lifestyle factors on the risk of type 2 diabetes development.

    Funded by: Canadian Institutes of Health Research: G0601261; Cancer Research UK: 11692; Medical Research Council: G0401527, G0601261, G1000143, MC_U106179471, MC_U106179473, MC_U106179474, MC_UP_A090_1006, MC_UP_A100_1003; Wellcome Trust: 083270/083270/z

    Diabetologia 2011;54;9;2272-82

  • Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk.

    International Consortium for Blood Pressure Genome-Wide Association Studies, Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang SJ, Pihur V, Vollenweider P, O'Reilly PF, Amin N, Bragg-Gresham JL, Teumer A, Glazer NL, Launer L, Zhao JH, Aulchenko Y, Heath S, Sõber S, Parsa A, Luan J, Arora P, Dehghan A, Zhang F, Lucas G, Hicks AA, Jackson AU, Peden JF, Tanaka T, Wild SH, Rudan I, Igl W, Milaneschi Y, Parker AN, Fava C, Chambers JC, Fox ER, Kumari M, Go MJ, van der Harst P, Kao WH, Sjögren M, Vinay DG, Alexander M, Tabara Y, Shaw-Hawkins S, Whincup PH, Liu Y, Shi G, Kuusisto J, Tayo B, Seielstad M, Sim X, Nguyen KD, Lehtimäki T, Matullo G, Wu Y, Gaunt TR, Onland-Moret NC, Cooper MN, Platou CG, Org E, Hardy R, Dahgam S, Palmen J, Vitart V, Braund PS, Kuznetsova T, Uiterwaal CS, Adeyemo A, Palmas W, Campbell H, Ludwig B, Tomaszewski M, Tzoulaki I, Palmer ND, CARDIoGRAM consortium, CKDGen Consortium, KidneyGen Consortium, EchoGen consortium, CHARGE-HF consortium, Aspelund T, Garcia M, Chang YP, O'Connell JR, Steinle NI, Grobbee DE, Arking DE, Kardia SL, Morrison AC, Hernandez D, Najjar S, McArdle WL, Hadley D, Brown MJ, Connell JM, Hingorani AD, Day IN, Lawlor DA, Beilby JP, Lawrence RW, Clarke R, Hopewell JC, Ongen H, Dreisbach AW, Li Y, Young JH, Bis JC, Kähönen M, Viikari J, Adair LS, Lee NR, Chen MH, Olden M, Pattaro C, Bolton JA, Köttgen A, Bergmann S, Mooser V, Chaturvedi N, Frayling TM, Islam M, Jafar TH, Erdmann J, Kulkarni SR, Bornstein SR, Grässler J, Groop L, Voight BF, Kettunen J, Howard P, Taylor A, Guarrera S, Ricceri F, Emilsson V, Plump A, Barroso I, Khaw KT, Weder AB, Hunt SC, Sun YV, Bergman RN, Collins FS, Bonnycastle LL, Scott LJ, Stringham HM, Peltonen L, Perola M, Vartiainen E, Brand SM, Staessen JA, Wang TJ, Burton PR, Soler Artigas M, Dong Y, Snieder H, Wang X, Zhu H, Lohman KK, Rudock ME, Heckbert SR, Smith NL, Wiggins KL, Doumatey A, Shriner D, Veldre G, Viigimaa M, Kinra S, Prabhakaran D, Tripathy V, Langefeld CD, Rosengren A, Thelle DS, Corsi AM, Singleton A, Forrester T, Hilton G, McKenzie CA, Salako T, Iwai N, Kita Y, Ogihara T, Ohkubo T, Okamura T, Ueshima H, Umemura S, Eyheramendy S, Meitinger T, Wichmann HE, Cho YS, Kim HL, Lee JY, Scott J, Sehmi JS, Zhang W, Hedblad B, Nilsson P, Smith GD, Wong A, Narisu N, Stančáková A, Raffel LJ, Yao J, Kathiresan S, O'Donnell CJ, Schwartz SM, Ikram MA, Longstreth WT, Mosley TH, Seshadri S, Shrine NR, Wain LV, Morken MA, Swift AJ, Laitinen J, Prokopenko I, Zitting P, Cooper JA, Humphries SE, Danesh J, Rasheed A, Goel A, Hamsten A, Watkins H, Bakker SJ, van Gilst WH, Janipalli CS, Mani KR, Yajnik CS, Hofman A, Mattace-Raso FU, Oostra BA, Demirkan A, Isaacs A, Rivadeneira F, Lakatta EG, Orru M, Scuteri A, Ala-Korpela M, Kangas AJ, Lyytikäinen LP, Soininen P, Tukiainen T, Würtz P, Ong RT, Dörr M, Kroemer HK, Völker U, Völzke H, Galan P, Hercberg S, Lathrop M, Zelenika D, Deloukas P, Mangino M, Spector TD, Zhai G, Meschia JF, Nalls MA, Sharma P, Terzic J, Kumar MV, Denniff M, Zukowska-Szczechowska E, Wagenknecht LE, Fowkes FG, Charchar FJ, Schwarz PE, Hayward C, Guo X, Rotimi C, Bots ML, Brand E, Samani NJ, Polasek O, Talmud PJ, Nyberg F, Kuh D, Laan M, Hveem K, Palmer LJ, van der Schouw YT, Casas JP, Mohlke KL, Vineis P, Raitakari O, Ganesh SK, Wong TY, Tai ES, Cooper RS, Laakso M, Rao DC, Harris TB, Morris RW, Dominiczak AF, Kivimaki M, Marmot MG, Miki T, Saleheen D, Chandak GR, Coresh J, Navis G, Salomaa V, Han BG, Zhu X, Kooner JS, Melander O, Ridker PM, Bandinelli S, Gyllensten UB, Wright AF, Wilson JF, Ferrucci L, Farrall M, Tuomilehto J, Pramstaller PP, Elosua R, Soranzo N, Sijbrands EJ, Altshuler D, Loos RJ, Shuldiner AR, Gieger C, Meneton P, Uitterlinden AG, Wareham NJ, Gudnason V, Rotter JI, Rettig R, Uda M, Strachan DP, Witteman JC, Hartikainen AL, Beckmann JS, Boerwinkle E, Vasan RS, Boehnke M, Larson MG, Järvelin MR, Psaty BM, Abecasis GR, Chakravarti A, Elliott P, van Duijn CM, Newton-Cheh C, Levy D, Caulfield MJ and Johnson T

    Blood pressure is a heritable trait influenced by several biological pathways and responsive to environmental stimuli. Over one billion people worldwide have hypertension (≥140 mm Hg systolic blood pressure or  ≥90 mm Hg diastolic blood pressure). Even small increments in blood pressure are associated with an increased risk of cardiovascular events. This genome-wide association study of systolic and diastolic blood pressure, which used a multi-stage design in 200,000 individuals of European descent, identified sixteen novel loci: six of these loci contain genes previously known or suspected to regulate blood pressure (GUCY1A3-GUCY1B3, NPR3-C5orf23, ADM, FURIN-FES, GOSR2, GNAS-EDN3); the other ten provide new clues to blood pressure physiology. A genetic risk score based on 29 genome-wide significant variants was associated with hypertension, left ventricular wall thickness, stroke and coronary artery disease, but not kidney disease or kidney function. We also observed associations with blood pressure in East Asian, South Asian and African ancestry individuals. Our findings provide new insights into the genetics and biology of blood pressure, and suggest potential novel therapeutic pathways for cardiovascular disease prevention.

    Funded by: AHRQ HHS: HS06516; Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: CH/03/001, FS05/125, G0501942, PG/02/128, PG97012, PG97027, RG/07/005/23633, RG/08/008/25291, RG/08/013/25942, RG/08/014/24067, RG/98002, RG08/01, SP/04/002, SP/08/005/25115; Canadian Institutes of Health Research: MOP-82810, MOP172605, MOP77682; Chief Scientist Office: CZB/4/276, CZB/4/710; FIC NIH HHS: R03 TW007165, TW008288, TW05596; Howard Hughes Medical Institute: 55005617; Medical Research Council: G0000934, G0400874, G0401527, G0500539, G0501942, G0600331, G0600705, G0601966, G0700931, G0701863, G0801056, G0902037, G0902313, G1000143, G19/35, G9521010, G9521010D, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092723, MC_UP_A100_1003; NCI NIH HHS: 5U01CA086308, P01CA055075, P01CA087969; NCRR NIH HHS: 2M01RR010284, K12RR023250, M01 RR16500, M01-RR00425, RR-024156, RR20649, U54 RR020278, UL1RR025005; NHGRI NIH HHS: HG003054, HG005581, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004436, U01HG004438, U01HG004446, U01HG004726, U01HG004728, U01HG004729, U01HG004735, U01HG004738; NHLBI NIH HHS: 5R01HL086694-03, 5R01HL087679-02, 5R01HL08770002, HL 54512, HL-87660, HL043851, HL080025, HL084729, HL085144, HL086718, HL087647, HL098283, HL36310, HL45508, HL53353, HL54512, N01 HC-15103, N01 HC-55222, N01 HC-95159, N01 HC-95169, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N02-HL-6-4278, R01 HL073410, R01 HL085251, R01 HL086694, R01 HL086694-03, R01 HL086694-04A1, R01 HL086694-05, R01 HL087647, R01 HL087652, R01 HL088119, R01HL056931, R01HL060894, R01HL060919, R01HL06094, R01HL061019, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, R01HL086694, R01HL087641, R01HL089650-02, R01HL59367, R37HL051021, U01 HL054466, U01 HL054466-11, U01 HL054471, U01 HL054473, U01 HL054527, U01 HL072515-06, U01 HL080295, U01 HL084756, U10 HL054512, U10HL054512; NIA NIH HHS: 1R01AG032098-01A, AG13196, N01-AG-1-2109, N01-AG-12100, N01AG6210, N01AG62101, N01AG62103, R01 AG017644-09S1, R01 AG18728; NICHD NIH HHS: N01-HD-1-3107; NIDCR NIH HHS: U01DE018903, U01DE01899; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK075787, DK078150, DK56350, P30 DK072488, R01 DK072193, R01 DK078150, R01DK058845, R01DK066574, U01 DK062418; NIEHS NIH HHS: ES10126, P30 ES010126, P30ES007033; NIGMS NIH HHS: S06GM008016-320107, S06GM008016-380111, U01 GM074518-04; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164; NINDS NIH HHS: R01 NS39987, R01 NS42733, U01 NS069208, U01 NS069208-01; PHS HHS: 263-MA-410953, 33014, HHSN268200625226C, HHSN268200782096, HHSN268200782096C; Wellcome Trust: 068545/Z/02, 070191/Z/03/Z, 077016/Z/05/Z, 079895, 080747/Z/06/Z, 090532

    Nature 2011;478;7367;103-9

  • The genetic association of variants in CD6, TNFRSF1A and IRF8 to multiple sclerosis: a multicenter case-control study.

    International Multiple Sclerosis Genetics Consortium

    Background: In the recently published meta-analysis of multiple sclerosis genome-wide association studies De Jager et al. identified three single nucleotide polymorphisms associated to MS: rs17824933 (CD6), rs1800693 (TNFRSF1A) and rs17445836 (61.5 kb from IRF8). To refine our understanding of these associations we sought to replicate these findings in a large more extensive independent sample set of 11 populations of European origin.

    We calculated individual and combined associations using a meta-analysis method by Kazeem and Farral (2005). We confirmed the association of rs1800693 in TNFRSF1A (p 4.19 × 10-7, OR 1.12, 7,665 cases, 8,051 controls) and rs17445836 near IRF8 (p 5.35 × 10-10, OR 0.84, 6,895 cases, 7,580 controls and 596 case-parent trios) The SNP rs17824933 in CD6 also showed nominally significant evidence for association (p 2.19 × 10-5, OR 1.11, 8,047 cases, 9,174 controls, 604 case-parent trios).

    Conclusions: Variants in TNFRSF1A and in the vicinity of IRF8 were confirmed to be associated in these independent cohorts, which supports the role of these loci in etiology of multiple sclerosis. The variant in CD6 reached genome-wide significance after combining the data with the original meta-analysis. Fine mapping is required to identify the predisposing variants in the loci and future functional studies will refine their molecular role in MS pathogenesis.

    Funded by: NINDS NIH HHS: R01 NS 43559, R01 NS049477, R01 NS067305; Wellcome Trust: 089061/Z/09/Z

    PloS one 2011;6;4;e18813

  • Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies.

    International Parkinson Disease Genomics Consortium, Nalls MA, Plagnol V, Hernandez DG, Sharma M, Sheerin UM, Saad M, Simón-Sánchez J, Schulte C, Lesage S, Sveinbjörnsdóttir S, Stefánsson K, Martinez M, Hardy J, Heutink P, Brice A, Gasser T, Singleton AB and Wood NW

    Background: Genome-wide association studies (GWAS) for Parkinson's disease have linked two loci (MAPT and SNCA) to risk of Parkinson's disease. We aimed to identify novel risk loci for Parkinson's disease.

    Methods: We did a meta-analysis of datasets from five Parkinson's disease GWAS from the USA and Europe to identify loci associated with Parkinson's disease (discovery phase). We then did replication analyses of significantly associated loci in an independent sample series. Estimates of population-attributable risk were calculated from estimates from the discovery and replication phases combined, and risk-profile estimates for loci identified in the discovery phase were calculated.

    Findings: The discovery phase consisted of 5333 case and 12 019 control samples, with genotyped and imputed data at 7 689 524 SNPs. The replication phase consisted of 7053 case and 9007 control samples. We identified 11 loci that surpassed the threshold for genome-wide significance (p<5×10(-8)). Six were previously identified loci (MAPT, SNCA, HLA-DRB5, BST1, GAK and LRRK2) and five were newly identified loci (ACMSD, STK39, MCCC1/LAMP3, SYT11, and CCDC62/HIP1R). The combined population-attributable risk was 60·3% (95% CI 43·7-69·3). In the risk-profile analysis, the odds ratio in the highest quintile of disease risk was 2·51 (95% CI 2·23-2·83) compared with 1·00 in the lowest quintile of disease risk.

    Interpretation: These data provide an insight into the genetics of Parkinson's disease and the molecular cause of the disease and could provide future targets for therapies.

    Funding: Wellcome Trust, National Institute on Aging, and US Department of Defense.

    Funded by: Medical Research Council: G0700943; NCRR NIH HHS: RR024992; NIA NIH HHS: Z01-AG000949-02; NIEHS NIH HHS: Z01-ES101986; NINDS NIH HHS: NS057105, R01 NS060722-04; Parkinson's UK: J-0804, J-0901; Wellcome Trust: 083948/Z/07/Z, WT089698, WT089698/Z/09/Z, WTCCC1, WTCCC2

    Lancet 2011;377;9766;641-9

  • A two-stage meta-analysis identifies several new loci for Parkinson's disease.

    International Parkinson's Disease Genomics Consortium (IPDGC) and Wellcome Trust Case Control Consortium 2 (WTCCC2)

    A previous genome-wide association (GWA) meta-analysis of 12,386 PD cases and 21,026 controls conducted by the International Parkinson's Disease Genomics Consortium (IPDGC) discovered or confirmed 11 Parkinson's disease (PD) loci. This first analysis of the two-stage IPDGC study focused on the set of loci that passed genome-wide significance in the first stage GWA scan. However, the second stage genotyping array, the ImmunoChip, included a larger set of 1,920 SNPs selected on the basis of the GWA analysis. Here, we analyzed this set of 1,920 SNPs, and we identified five additional PD risk loci (combined p<5×10(-10), PARK16/1q32, STX1B/16p11, FGF20/8p22, STBD1/4q21, and GPNMB/7p15). Two of these five loci have been suggested by previous association studies (PARK16/1q32, FGF20/8p22), and this study provides further support for these findings. Using a dataset of post-mortem brain samples assayed for gene expression (n = 399) and methylation (n = 292), we identified methylation and expression changes associated with PD risk variants in PARK16/1q32, GPNMB/7p15, and STX1B/16p11 loci, hence suggesting potential molecular mechanisms and candidate genes at these risk loci.

    Funded by: Medical Research Council: G0700943, G0901254; NHGRI NIH HHS: ZIA HG200336-05; NIA NIH HHS: Z01 AG000949-02; NIEHS NIH HHS: Z01-ES101986; Parkinson's UK: J-0804, J-0901; Wellcome Trust: 085475/B/08/Z, 085475/Z/08/Z, WT089698

    PLoS genetics 2011;7;6;e1002142

  • A Salmonella Typhimurium-Typhi genomic chimera: a model to study Vi polysaccharide capsule function in vivo.

    Jansen AM, Hall LJ, Clare S, Goulding D, Holt KE, Grant AJ, Mastroeni P, Dougan G and Kingsley RA

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The Vi capsular polysaccharide is a virulence-associated factor expressed by Salmonella enterica serotype Typhi but absent from virtually all other Salmonella serotypes. In order to study this determinant in vivo, we characterised a Vi-positive S. Typhimurium (C5.507 Vi(+)), harbouring the Salmonella pathogenicity island (SPI)-7, which encodes the Vi locus. S. Typhimurium C5.507 Vi(+) colonised and persisted in mice at similar levels compared to the parent strain, S. Typhimurium C5. However, the innate immune response to infection with C5.507 Vi(+) and SGB1, an isogenic derivative not expressing Vi, differed markedly. Infection with C5.507 Vi(+) resulted in a significant reduction in cellular trafficking of innate immune cells, including PMN and NK cells, compared to SGB1 Vi(-) infected animals. C5.507 Vi(+) infection stimulated reduced numbers of TNF-α, MIP-2 and perforin producing cells compared to SGB1 Vi(-). The modulating effect associated with Vi was not observed in MyD88(-/-) and was reduced in TLR4(-/-) mice. The presence of the Vi capsule also correlated with induction of the anti-inflammatory cytokine IL-10 in vivo, a factor that impacted on chemotaxis and the activation of immune cells in vitro.

    Funded by: Wellcome Trust

    PLoS pathogens 2011;7;7;e1002131

  • The rise and fall of supervised machine learning techniques.

    Jensen LJ and Bateman A

    Bioinformatics (Oxford, England) 2011;27;24;3331-2

  • myKaryoView: a light-weight client for visualization of genomic data.

    Jimenez RC, Salazar GA, Gel B, Dopazo J, Mulder N and Corpas M

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The Distributed Annotation System (DAS) is a protocol for easy sharing and integration of biological annotations. In order to visualize feature annotations in a genomic context a client is required. Here we present myKaryoView, a simple light-weight DAS tool for visualization of genomic annotation. myKaryoView has been specifically configured to help analyse data derived from personal genomics, although it can also be used as a generic genome browser visualization. Several well-known data sources are provided to facilitate comparison of known genes and normal variation regions. The navigation experience is enhanced by simultaneous rendering of different levels of detail across chromosomes. A simple interface is provided to allow searches for any SNP, gene or chromosomal region. User-defined DAS data sources may also be added when querying the system. We demonstrate myKaryoView capabilities for adding user-defined sources with a set of genetic profiles of family-related individuals downloaded directly from 23andMe. myKaryoView is a web tool for visualization of genomic data specifically designed for direct-to-consumer genomic data that uses publicly available data distributed throughout the Internet. It does not require data to be held locally and it is capable of rendering any feature as long as it conforms to DAS specifications. Configuration and addition of sources to myKaryoView can be done through the interface. Here we show a proof of principle of myKaryoView's ability to display personal genomics data with 23andMe genome data sources. The tool is available at: http://mykaryoview.com.

    PloS one 2011;6;10;e26345

  • Blood pressure loci identified with a gene-centric array.

    Johnson T, Gaunt TR, Newhouse SJ, Padmanabhan S, Tomaszewski M, Kumari M, Morris RW, Tzoulaki I, O'Brien ET, Poulter NR, Sever P, Shields DC, Thom S, Wannamethee SG, Whincup PH, Brown MJ, Connell JM, Dobson RJ, Howard PJ, Mein CA, Onipinla A, Shaw-Hawkins S, Zhang Y, Davey Smith G, Day IN, Lawlor DA, Goodall AH, Cardiogenics Consortium, Fowkes FG, Abecasis GR, Elliott P, Gateva V, Global BPgen Consortium, Braund PS, Burton PR, Nelson CP, Tobin MD, van der Harst P, Glorioso N, Neuvrith H, Salvi E, Staessen JA, Stucchi A, Devos N, Jeunemaitre X, Plouin PF, Tichet J, Juhanson P, Org E, Putku M, Sõber S, Veldre G, Viigimaa M, Levinsson A, Rosengren A, Thelle DS, Hastie CE, Hedner T, Lee WK, Melander O, Wahlstrand B, Hardy R, Wong A, Cooper JA, Palmen J, Chen L, Stewart AF, Wells GA, Westra HJ, Wolfs MG, Clarke R, Franzosi MG, Goel A, Hamsten A, Lathrop M, Peden JF, Seedorf U, Watkins H, Ouwehand WH, Sambrook J, Stephens J, Casas JP, Drenos F, Holmes MV, Kivimaki M, Shah S, Shah T, Talmud PJ, Whittaker J, Wallace C, Delles C, Laan M, Kuh D, Humphries SE, Nyberg F, Cusi D, Roberts R, Newton-Cheh C, Franke L, Stanton AV, Dominiczak AF, Farrall M, Hingorani AD, Samani NJ, Caulfield MJ and Munroe PB

    Clinical Pharmacology and Barts and The London Genome Centre, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK. t.johnson@qmul.ac.uk

    Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a bespoke gene-centric array to genotype an independent discovery sample of 25,118 individuals that combined hypertensive case-control and general population samples. We followed up four SNPs associated with BP at our p < 8.56 × 10(-7) study-specific significance threshold and six suggestively associated SNPs in a further 59,349 individuals. We identified and replicated a SNP at LSP1/TNNT3, a SNP at MTHFR-NPPB independent (r(2) = 0.33) of previous reports, and replicated SNPs at AGT and ATP2B1 reported previously. An analysis of combined discovery and follow-up data identified SNPs significantly associated with BP at p < 8.56 × 10(-7) at four further loci (NPR3, HFE, NOS3, and SOX6). The high number of discoveries made with modest genotyping effort can be attributed to using a large-scale yet targeted genotyping array and to the development of a weighting scheme that maximized power when meta-analyzing results from samples ascertained with extreme phenotypes, in combination with results from nonascertained or population samples. Chromatin immunoprecipitation and transcript expression data highlight potential gene regulatory mechanisms at the MTHFR and NOS3 loci. These results provide candidates for further study to help dissect mechanisms affecting BP and highlight the utility of studying SNPs and samples that are independent of those studied previously even when the sample size is smaller than that in previous studies.

    Funded by: AHRQ HHS: HS06516; British Heart Foundation: CH/98001, FS05/125, PG/07/131/24254, PG/07/132/24256, PG/07/133/24260, PG/97012, RG/07/005/23633, RG/07/008/23674, RG/08/008, RG/08/008/25291, RG/08/013/25942, RG/2001004, SP/07/007/2367, SP/08/005/25115; Canadian Institutes of Health Research: MOP172605, MOP77682, MOP82810; Department of Health; Medical Research Council: G0400874, G0401527, G0501942, G0701863, G0801056, G0802432, G0902037, G1000143, G19/35, G8802774, G9521010, G9521010D, MC_U106179471, MC_U123092720, MC_U123092723, MC_UP_A100_1003; NIA NIH HHS: AG13196, R01 AG017644-09S1; Wellcome Trust: 070191/Z/03/A, 070191/Z/03/Z, 076113/C/04/Z, 090532, 093078/Z/10/Z

    American journal of human genetics 2011;89;6;688-700

  • Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry.

    Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, Whibley A, Becuwe M, Baxter SW, Ferguson L, Wilkinson PA, Salazar C, Davidson C, Clark R, Quail MA, Beasley H, Glithero R, Lloyd C, Sims S, Jones MC, Rogers J, Jiggins CD and ffrench-Constant RH

    CNRS UMR 7205, Muséum National d'Histoire Naturelle, CP50, 45 Rue Buffon, 75005 Paris, France. joron@mnhn.fr

    Supergenes are tight clusters of loci that facilitate the co-segregation of adaptive variation, providing integrated control of complex adaptive phenotypes. Polymorphic supergenes, in which specific combinations of traits are maintained within a single population, were first described for 'pin' and 'thrum' floral types in Primula and Fagopyrum, but classic examples are also found in insect mimicry and snail morphology. Understanding the evolutionary mechanisms that generate these co-adapted gene sets, as well as the mode of limiting the production of unfit recombinant forms, remains a substantial challenge. Here we show that individual wing-pattern morphs in the polymorphic mimetic butterfly Heliconius numata are associated with different genomic rearrangements at the supergene locus P. These rearrangements tighten the genetic linkage between at least two colour-pattern loci that are known to recombine in closely related species, with complete suppression of recombination being observed in experimental crosses across a 400-kilobase interval containing at least 18 genes. In natural populations, notable patterns of linkage disequilibrium (LD) are observed across the entire P region. The resulting divergent haplotype clades and inversion breakpoints are found in complete association with wing-pattern morphs. Our results indicate that allelic combinations at known wing-patterning loci have become locked together in a polymorphic rearrangement at the P locus, forming a supergene that acts as a simple switch between complex adaptive phenotypes found in sympatry. These findings highlight how genomic rearrangements can have a central role in the coexistence of adaptive phenotypes involving several genes acting in concert, by locally limiting recombination and gene flow.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0118451; Medical Research Council: G0900740; Wellcome Trust: 079643, 098051

    Nature 2011;477;7363;203-6

  • Genetic risk prediction in complex disease.

    Jostins L and Barrett JC

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambs, UK

    Attempting to classify patients into high or low risk for disease onset or outcomes is one of the cornerstones of epidemiology. For some (but by no means all) diseases, clinically usable risk prediction can be performed using classical risk factors such as body mass index, lipid levels, smoking status, family history and, under certain circumstances, genetics (e.g. BRCA1/2 in breast cancer). The advent of genome-wide association studies (GWAS) has led to the discovery of common risk loci for the majority of common diseases. These discoveries raise the possibility of using these variants for risk prediction in a clinical setting. We discuss the different ways in which the predictive accuracy of these loci can be measured, and survey the predictive accuracy of GWAS variants for 18 common diseases. We show that predictive accuracy from genetic models varies greatly across diseases, but that the range is similar to that of non-genetic risk-prediction models. We discuss what factors drive differences in predictive accuracy, and how much value these predictions add over classical predictive tests. We also review the uses and pitfalls of idealized models of risk prediction. Finally, we look forward towards possible future clinical implementation of genetic risk prediction, and discuss realistic expectations for future utility.

    Funded by: Wellcome Trust: WT089120/Z/09/Z

    Human molecular genetics 2011;20;R2;R182-8

  • Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

    Jostins L, Morley KI and Barrett JC

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.

    Funded by: Wellcome Trust: WT089120/Z/09/Z

    European journal of human genetics : EJHG 2011;19;6;662-6

  • Candidate gene analysis of the human natural killer-1 carbohydrate pathway and perineuronal nets in schizophrenia: B3GAT2 is associated with disease risk and cortical surface area.

    Kähler AK, Djurovic S, Rimol LM, Brown AA, Athanasiu L, Jönsson EG, Hansen T, Gústafsson O, Hall H, Giegling I, Muglia P, Cichon S, Rietschel M, Pietiläinen OP, Peltonen L, Bramon E, Collier D, St Clair D, Sigurdsson E, Petursson H, Rujescu D, Melle I, Werge T, Steen VM, Dale AM, Matthews RT, Agartz I and Andreassen OA

    Institute of Psychiatry, University of Oslo, Oslo University Hospital-Ulleval, Norway. a.k.kahler@medisin.uio.no

    Background: The Human Natural Killer-1 carbohydrate (HNK-1) is involved in neurodevelopment and synaptic plasticity. Extracellular matrix structures called perineuronal nets, condensed around subsets of neurons and proximal dendrites during brain maturation, regulate synaptic transmission and plasticity.

    Methods: Ten genes of importance for HNK-1 biosynthesis (B3GAT1, B3GAT2, and CHST10) or for the formation of perineuronal nets (TNR, BCAN, NCAN, HAPLN1, HAPLN2, HAPLN3, and HAPLN4) were investigated for potential involvement in schizophrenia (SCZ) susceptibility, by genotyping 104 tagSNPs in the Scandinavian Collaboration on Psychiatric Etiology sample (849 cases; 1602 control subjects). Genome-wide association study imputation data from the European SGENE-plus sample (2663 cases; 13,498 control subjects) were used for comparison. The effect of SCZ risk alleles on brain structure was investigated in a Norwegian subset (98 cases; 177 control subjects) with structural magnetic resonance imaging data.

    Results: Five single nucleotide polymorphisms (SNPs), located in two adjacent estimated linkage disequilibrium blocks in the first intron of β-1,3-glucuronyltransferase 2 (B3GAT2), were nominally associated with SCZ (.004 ≤ P(empirical) ≤ .05). The rs2460691 was significantly associated in the comparison sample and in the meta-analysis after correction for all 121 SNP/haplotype tests (P(raw) = 1 × 10(-4); P(corrected) = .018). Increased dosage of the rs2460691 SCZ risk allele was associated with decreased cortical area (p = .002) but not thickness or hippocampal volume. A second SNP (r(2) = .24 with rs10945275), which conferred the highest SCZ risk effect in the Norwegian subset, was also associated with cortical area.

    Conclusions: The present results suggest that effects on biosynthesis of the neuronal epitope HNK-1, through common B3GAT2 variation, could increase the risk of SCZ, possibly by decreasing cortical area.

    Biological psychiatry 2011;69;1;90-6

  • CEP152 is a genome maintenance protein disrupted in Seckel syndrome.

    Kalay E, Yigit G, Aslan Y, Brown KE, Pohl E, Bicknell LS, Kayserili H, Li Y, Tüysüz B, Nürnberg G, Kiess W, Koegl M, Baessmann I, Buruk K, Toraman B, Kayipmaz S, Kul S, Ikbal M, Turner DJ, Taylor MS, Aerts J, Scott C, Milstein K, Dollfus H, Wieczorek D, Brunner HG, Hurles M, Jackson AP, Rauch A, Nürnberg P, Karagüzel A and Wollnik B

    Department of Medical Biology, Faculty of Medicine, Karadeniz Technical University, Trabzon, Turkey. ersankalay@hotmail.com

    Functional impairment of DNA damage response pathways leads to increased genomic instability. Here we describe the centrosomal protein CEP152 as a new regulator of genomic integrity and cellular response to DNA damage. Using homozygosity mapping and exome sequencing, we identified CEP152 mutations in Seckel syndrome and showed that impaired CEP152 function leads to accumulation of genomic defects resulting from replicative stress through enhanced activation of ATM signaling and increased H2AX phosphorylation.

    Funded by: Medical Research Council: MC_U120081295, MC_U127580972, MC_U127597124; Wellcome Trust: 077014

    Nature genetics 2011;43;1;23-6

  • Total zinc intake may modify the glucose-raising effect of a zinc transporter (SLC30A8) variant: a 14-cohort meta-analysis.

    Kanoni S, Nettleton JA, Hivert MF, Ye Z, van Rooij FJ, Shungin D, Sonestedt E, Ngwa JS, Wojczynski MK, Lemaitre RN, Gustafsson S, Anderson JS, Tanaka T, Hindy G, Saylor G, Renstrom F, Bennett AJ, van Duijn CM, Florez JC, Fox CS, Hofman A, Hoogeveen RC, Houston DK, Hu FB, Jacques PF, Johansson I, Lind L, Liu Y, McKeown N, Ordovas J, Pankow JS, Sijbrands EJ, Syvänen AC, Uitterlinden AG, Yannakoulia M, Zillikens MC, MAGIC Investigators, Wareham NJ, Prokopenko I, Bandinelli S, Forouhi NG, Cupples LA, Loos RJ, Hallmans G, Dupuis J, Langenberg C, Ferrucci L, Kritchevsky SB, McCarthy MI, Ingelsson E, Borecki IB, Witteman JC, Orho-Melander M, Siscovick DS, Meigs JB, Franks PW and Dedoussis GV

    Department of Nutrition-Dietetics, Harokopio University, Athens, Greece. stavroula.kanoni@sanger.ac.uk

    Objective: Many genetic variants have been associated with glucose homeostasis and type 2 diabetes in genome-wide association studies. Zinc is an essential micronutrient that is important for β-cell function and glucose homeostasis. We tested the hypothesis that zinc intake could influence the glucose-raising effect of specific variants.

    We conducted a 14-cohort meta-analysis to assess the interaction of 20 genetic variants known to be related to glycemic traits and zinc metabolism with dietary zinc intake (food sources) and a 5-cohort meta-analysis to assess the interaction with total zinc intake (food sources and supplements) on fasting glucose levels among individuals of European ancestry without diabetes.

    Results: We observed a significant association of total zinc intake with lower fasting glucose levels (β-coefficient ± SE per 1 mg/day of zinc intake: -0.0012 ± 0.0003 mmol/L, summary P value = 0.0003), while the association of dietary zinc intake was not significant. We identified a nominally significant interaction between total zinc intake and the SLC30A8 rs11558471 variant on fasting glucose levels (β-coefficient ± SE per A allele for 1 mg/day of greater total zinc intake: -0.0017 ± 0.0006 mmol/L, summary interaction P value = 0.005); this result suggests a stronger inverse association between total zinc intake and fasting glucose in individuals carrying the glucose-raising A allele compared with individuals who do not carry it. None of the other interaction tests were statistically significant.

    Conclusions: Our results suggest that higher total zinc intake may attenuate the glucose-raising effect of the rs11558471 SLC30A8 (zinc transporter) variant. Our findings also support evidence for the association of higher total zinc intake with lower fasting glucose levels.

    Funded by: Medical Research Council: G0701863, MC_U106179471, MC_U106188470, MC_UP_A100_1003; NIGMS NIH HHS: T32 GM074905; Wellcome Trust: 090532

    Diabetes 2011;60;9;2407-16

  • Salmonella Typhi sense host neuroendocrine stress hormones and release the toxin haemolysin E.

    Karavolos MH, Bulmer DM, Spencer H, Rampioni G, Schmalen I, Baker S, Pickard D, Gray J, Fookes M, Winzer K, Ivens A, Dougan G, Williams P and Khan CM

    Institute for Cell and Molecular Biosciences, The Medical School, Newcastle University, Newcastle NE2 4HH, UK.

    Salmonella enterica serovar Typhi (S. typhi) causes typhoid fever. We show that exposure of S. typhi to neuroendocrine stress hormones results in haemolysis, which is associated with the release of haemolysin E in membrane vesicles. This effect is attributed to increased expression of the small RNA micA and RNA chaperone Hfq, with concomitant downregulation of outer membrane protein A. Deletion of micA or the two-component signal-transduction system, CpxAR, abolishes the phenotype. The hormone response is inhibited by the β-blocker propranolol. We provide mechanistic insights into the basis of neuroendocrine hormone-mediated haemolysis by S. typhi, increasing our understanding of inter-kingdom signalling.

    Funded by: Medical Research Council; Wellcome Trust

    EMBO reports 2011;12;3;252-8

  • In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma.

    Karreth FA, Tay Y, Perna D, Ala U, Tan SM, Rust AG, DeNicola G, Webster KA, Weiss D, Perez-Mancera PA, Krauthammer M, Halaban R, Provero P, Adams DJ, Tuveson DA and Pandolfi PP

    Cancer Genetics Program, Division of Genetics, Beth Israel Deaconess Cancer Center, Department of Medicine and Pathology, Harvard Medical School, Boston, MA 02215, USA.

    We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of significant enrichment of putative PTEN ceRNAs among genes whose loss accelerates tumorigenesis following Sleeping Beauty insertional mutagenesis in a mouse model of melanoma. We validated several putative PTEN ceRNAs and further characterized one, the ZEB2 transcript. We show that ZEB2 modulates PTEN protein levels in a microRNA-dependent, protein coding-independent manner. Attenuation of ZEB2 expression activates the PI3K/AKT pathway, enhances cell transformation, and commonly occurs in human melanomas and other cancers expressing low PTEN levels. Our study genetically identifies multiple putative microRNA decoys for PTEN, validates ZEB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAF(V600E) to promote melanomagenesis.

    Funded by: Cancer Research UK; NCI NIH HHS: 1P50 CA121974, P50 CA121974, P50 CA121974-01, R01 CA-82328-09, R01 CA082328, R01 CA082328-09; NCRR NIH HHS: UL1 RR025758, UL1 RR025758-04; Wellcome Trust

    Cell 2011;147;2;382-95

  • Regulation of bone mass by serotonin: molecular biology and therapeutic implications.

    Karsenty G and Yadav VK

    Department of Genetics & Development, Columbia University Medical Center, New York, New York 10032, USA. jpb2@columbia.edu

    The molecular elucidation of two human skeletal dysplasias revealed that they are caused by an increase or a decrease in the synthesis of serotonin by enterochromaffin cells of the gut. This observation revealed a novel and powerful endocrine means to regulate bone mass. Exploiting these findings in the pharmacological arena led to the demonstration that inhibiting synthesis of gut-derived serotonin could be an effective means to treat low-bone-mass diseases such as osteoporosis.

    Annual review of medicine 2011;62;323-31

  • Phylogenetic analysis of murine leukemia virus sequences from longitudinally sampled chronic fatigue syndrome patients suggests PCR contamination rather than viral evolution.

    Katzourakis A, Hué S, Kellam P and Towers GJ

    Department of Zoology, University of Oxford, South Parks Road, Oxford OX13PS, United Kingdom.

    Xenotropic murine leukemia virus (MLV)-related virus (XMRV) has been amplified from human prostate cancer and chronic fatigue syndrome (CFS) patient samples. Other studies failed to replicate these findings and suggested PCR contamination with a prostate cancer cell line, 22Rv1, as a likely source. MLV-like sequences have also been detected in CFS patients in longitudinal samples 15 years apart. Here, we tested whether sequence data from these samples are consistent with viral evolution. Our phylogenetic analyses strongly reject a model of within-patient evolution and demonstrate that the sequences from the first and second time points represent distinct endogenous murine retroviruses, suggesting contamination.

    Funded by: Medical Research Council: G0801172, G9721629; Wellcome Trust: 090940, WT090940

    Journal of virology 2011;85;20;10909-13

  • Mouse genomic variation and its effect on phenotypes and gene regulation.

    Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellåker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assunção JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J and Adams DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F022697/1; Cancer Research UK: A6997; Medical Research Council: G0800024, MC_U127561112, MC_U137761446; NHLBI NIH HHS: K25 HL080079; NLM NIH HHS: 2T15LM007359; Wellcome Trust: 077192, 079912, 082356, 083573, 083573/Z/07/Z, 085906, 085906/Z/08/Z, 090532

    Nature 2011;477;7364;289-94

  • Functional analysis of conserved non-coding regions around the short stature hox gene (shox) in whole zebrafish embryos.

    Kenyon EJ, McEwen GK, Callaway H and Elgar G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, Cambridgeshire, United Kingdom. ek1@sanger.ac.uk

    Background: Mutations in the SHOX gene are responsible for Leri-Weill Dyschondrosteosis, a disorder characterised by mesomelic limb shortening. Recent investigations into regulatory elements surrounding SHOX have shown that deletions of conserved non-coding elements (CNEs) downstream of the SHOX gene produce a phenotype indistinguishable from Leri-Weill Dyschondrosteosis. As this gene is not found in rodents, we used zebrafish as a model to characterise the expression pattern of the shox gene across the whole embryo and characterise the enhancer domains of different CNEs associated with this gene.

    Expression of the shox gene in zebrafish was identified using in situ hybridization, with embryos showing expression in the blood, putative heart, hatching gland, brain pharyngeal arch, olfactory epithelium, and fin bud apical ectodermal ridge. By identifying sequences showing 65% identity over at least 40 nucleotides between Fugu, human, dog and opossum we uncovered 35 CNEs around the shox gene. These CNEs were compared with CNEs previously discovered by Sabherwal et al., resulting in the identification of smaller more deeply conserved sub-sequence. Sabherwal et al.'s CNEs were assayed for regulatory function in whole zebrafish embryos resulting in the identification of additional tissues under the regulatory control of these CNEs.

    Our results using whole zebrafish embryos have provided a more comprehensive picture of the expression pattern of the shox gene, and a better understanding of its regulation via deeply conserved noncoding elements. In particular, we identify additional tissues under the regulatory control of previously identified SHOX CNEs. We also demonstrate the importance of these CNEs in evolution by identifying duplicated shox CNEs and more deeply conserved sub-sequences within already identified CNEs.

    Funded by: Medical Research Council: G0401138

    PloS one 2011;6;6;e21498

  • High-throughput target-selected gene inactivation in zebrafish.

    Kettleborough RN, Bruijn Ed, Eeden Fv, Cuppen E and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    There is an increasing requirement for efficient reverse genetics in the zebrafish, Here we describe a method that takes advantage of conventional mutagenized libraries (identical to ones used in forward screens) and re-sequencing to identify ENU-induced mutations in genes of interest. The efficiency of TILLING (Targeting Induced Local Legions IN Genomes) depends on the rate of mutagenesis in the library being screened, the amount of base pairs screened, and the ability to effectively identify and retrieve mutations on interest. Here we show that by improving the mutagenesis protocol, using in silico methods to predict codon changes for target selection, efficient PCR and re-sequencing, and accurate mutation detection we can vastly improve current TILLING protocols. Importantly it is also possible to use this method for screening for splice and mis-sense mutations, and with even a relatively small library, there is a high chance of identifying mutations across any given gene.

    Funded by: Medical Research Council; Wellcome Trust

    Methods in cell biology 2011;104;121-7

  • Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.

    Kikuchi T, Cotton JA, Dalzell JJ, Hasegawa K, Kanzaki N, McVeigh P, Takanashi T, Tsai IJ, Assefa SA, Cock PJ, Otto TD, Hunt M, Reid AJ, Sanchez-Flores A, Tsuchihara K, Yokoi T, Larsson MC, Miwa J, Maule AG, Sahashi N, Jones JT and Berriman M

    Forestry and Forest Products Research Institute, Tsukuba, Japan. kikuchit@affrc.go.jp

    Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    PLoS pathogens 2011;7;9;e1002219

  • Genetic variation near IRS1 associates with reduced adiposity and an impaired metabolic profile.

    Kilpeläinen TO, Zillikens MC, Stančákova A, Finucane FM, Ried JS, Langenberg C, Zhang W, Beckmann JS, Luan J, Vandenput L, Styrkarsdottir U, Zhou Y, Smith AV, Zhao JH, Amin N, Vedantam S, Shin SY, Haritunians T, Fu M, Feitosa MF, Kumari M, Halldorsson BV, Tikkanen E, Mangino M, Hayward C, Song C, Arnold AM, Aulchenko YS, Oostra BA, Campbell H, Cupples LA, Davis KE, Döring A, Eiriksdottir G, Estrada K, Fernández-Real JM, Garcia M, Gieger C, Glazer NL, Guiducci C, Hofman A, Humphries SE, Isomaa B, Jacobs LC, Jula A, Karasik D, Karlsson MK, Khaw KT, Kim LJ, Kivimäki M, Klopp N, Kühnel B, Kuusisto J, Liu Y, Ljunggren O, Lorentzon M, Luben RN, McKnight B, Mellström D, Mitchell BD, Mooser V, Moreno JM, Männistö S, O'Connell JR, Pascoe L, Peltonen L, Peral B, Perola M, Psaty BM, Salomaa V, Savage DB, Semple RK, Skaric-Juric T, Sigurdsson G, Song KS, Spector TD, Syvänen AC, Talmud PJ, Thorleifsson G, Thorsteinsdottir U, Uitterlinden AG, van Duijn CM, Vidal-Puig A, Wild SH, Wright AF, Clegg DJ, Schadt E, Wilson JF, Rudan I, Ripatti S, Borecki IB, Shuldiner AR, Ingelsson E, Jansson JO, Kaplan RC, Gudnason V, Harris TB, Groop L, Kiel DP, Rivadeneira F, Walker M, Barroso I, Vollenweider P, Waeber G, Chambers JC, Kooner JS, Soranzo N, Hirschhorn JN, Stefansson K, Wichmann HE, Ohlsson C, O'Rahilly S, Wareham NJ, Speliotes EK, Fox CS, Laakso M and Loos RJ

    Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK.

    Genome-wide association studies have identified 32 loci influencing body mass index, but this measure does not distinguish lean from fat mass. To identify adiposity loci, we meta-analyzed associations between ∼2.5 million SNPs and body fat percentage from 36,626 individuals and followed up the 14 most significant (P < 10(-6)) independent loci in 39,576 individuals. We confirmed a previously established adiposity locus in FTO (P = 3 × 10(-26)) and identified two new loci associated with body fat percentage, one near IRS1 (P = 4 × 10(-11)) and one near SPRY2 (P = 3 × 10(-8)). Both loci contain genes with potential links to adipocyte physiology. Notably, the body-fat-decreasing allele near IRS1 is associated with decreased IRS1 expression and with an impaired metabolic profile, including an increased visceral to subcutaneous fat ratio, insulin resistance, dyslipidemia, risk of diabetes and coronary artery disease and decreased adiponectin levels. Our findings provide new insights into adiposity and insulin resistance.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: PG/07/133/24260, RG/07/008/23674, RG/08/008, RG/08/008/25291, SP/04/002, SP/07/007/23671; Cancer Research UK; Chief Scientist Office: CZB/4/710; Department of Health; Medical Research Council: G0401527, G0601966, G0700931, G0701863, G0802051, G0902037, G1000143, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128; NCRR NIH HHS: M01 RR 16500, M01 RR000425-36, M01 RR016500-04, M01-RR00425; NHLBI NIH HHS: N01 HC015103, N01 HC025195, N01 HC045133, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085086, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55222, N01-HC75150, N01-HC85079-86, N01HC25195, N02 HL64278, R01 HL087652, R01 HL087652-03, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-04, R01-HL036310-20A2, R01-HL087652, R01-HL08770003, R01-HL088119, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084756, U01 HL084756-03, U01-HL080295, U01-HL72515, U01-HL84756; NIA NIH HHS: AG13196, N01-AG12100, N01-AG62101, N01-AG62103, N01-AG62106, N01AG12100, N1AG62101A, N1AG62103A, N1AG62106A, R01 AG018728, R01 AG018728-05S1, R01 AG032098, R01 AG032098-01A1, R01-AG031890-01, R01-AG032098-01A1, R01-AG18728, R01-AR/AG41398; NIAMS NIH HHS: R01 AR041398, R01 AR041398-19, R01 AR046838, R01 AR046838-05, R01-AR046838; NIDDK NIH HHS: DK063491, K23 DK080145, K23 DK080145-05, K23-DK080145, P30 DK063491-03, P30 DK072488, P30 DK072488-04S1, P30-DK072488, R01 DK068336, R01 DK068336-03, R01 DK075681, R01 DK075681-04, R01 DK075787, R01 DK075787-05, R01 DK089256, R01-DK06833603, R01-DK07568102, R01-DK075787; Wellcome Trust: 077016/Z/05/Z, 084723/Z/08/Z, 091551, 091746/Z/10/Z

    Nature genetics 2011;43;8;753-60

  • Conventional and Mendelian randomization analyses suggest no association between lipoprotein(a) and early atherosclerosis: the Young Finns Study.

    Kivimäki M, Magnussen CG, Juonala M, Kähönen M, Kettunen J, Loo BM, Lehtimäki T, Viikari J and Raitakari OT

    Department of Epidemiology, University College London, London, UK. m.kivimaki@ucl.ac.uk

    Background: Lipoprotein(a) [Lp(a)] is an established risk factor for coronary disease and stroke, but mechanisms underlying this association are unknown. We examined the association of Lp(a) with early atherosclerosis by using conventional epidemiologic analysis and a Mendelian randomization analysis. The latter utilized genetic variants that are associated with Lp(a) to estimate causal effect.

    Methods: A prospective population-based cohort study of 939 men and 1141 women was conducted. Lp(a) was measured repeatedly at mean ages 17 and 38 years. Measurements of carotid intima-media thickness (IMT) and brachial flow-mediated dilation (FMD) at mean ages 32 and 38 years were used to determine the level and 6-year progression of subclinical atherosclerosis. Lp(a)-related genetic variant, rs783147, was identified by a genome wide association analysis (P = 3.1 × 10⁻⁵⁸), and a genetic score was constructed based on 10 Lp(a)-related variants. Mendelian randomization test was performed using a two-stage instrumental variables analysis.

    Results: rs783147 and the genetic score were strong instruments for nonconfounded Lp(a) levels (F-statistics 269.6 and 446.0 in the first-stage instrumental variable analysis). However, Lp(a) levels were not associated with the levels of or change in IMT or FMD in any of the conventional and instrumental variables tests. The null finding was observed both with rs783147 and the genetic score as instruments and remained unchanged after adjustment for clinical characteristics, such as age, sex, HDL and LDL cholesterol, ApoB, systolic and diastolic blood pressure, diabetes and smoking.

    Conclusions: Data from conventional and Mendelian randomization analyses provide no support for early atherogenic effects of increased Lp(a) levels.

    Funded by: NIA NIH HHS: AG034454

    International journal of epidemiology 2011;40;2;470-8

  • A Runx1-Smad6 rheostat controls Runx1 activity during embryonic hematopoiesis.

    Knezevic K, Bee T, Wilson NK, Janes ME, Kinston S, Polderdijk S, Kolb-Kokocinski A, Ottersbach K, Pencovich N, Groner Y, de Bruijn M, Göttgens B and Pimanda JE

    Lowy Cancer Research Centre and the Prince of Wales Clinical School, University of New South Wales, Sydney, New South Wales 2052, Australia.

    The oncogenic transcription factor Runx1 is required for the specification of definitive hematopoietic stem cells (HSC) in the developing embryo. The activity of this master regulator is tightly controlled during development. The transcription factors that upregulate the expression of Runx1 also upregulate the expression of Smad6, the inhibitory Smad, which controls Runx1 activity by targeting it to the proteasome. Here we show that Runx1, in conjunction with Fli1, Gata2, and Scl, directly regulates the expression of Smad6 in the aorta-gonad-mesonephros (AGM) region in the developing embryo, where HSCs originate. Runx1 regulates Smad6 activity via a novel upstream enhancer, and Runx1 null embryos show reduced Smad6 transcripts in the yolk-sac and c-Kit-positive fetal liver cells. By directly regulating the expression of Smad6, Runx1 sets up a functional rheostat to control its own activity. The perturbation of this rheostat, using a proteasomal inhibitor, results in an increase in Runx1 and Smad6 levels that can be directly attributed to increased Runx1 binding to tissue-specific regulatory elements of these genes. Taken together, we describe a scenario in which a key hematopoietic transcription factor controls its own expression levels by transcriptionally controlling its controller.

    Molecular and cellular biology 2011;31;14;2817-26

  • Glyburide is anti-inflammatory and associated with reduced mortality in melioidosis.

    Koh GC, Maude RR, Schreiber MF, Limmathurotsakul D, Wiersinga WJ, Wuthiekanun V, Lee SJ, Mahavanakul W, Chaowagul W, Chierakul W, White NJ, van der Poll T, Day NP, Dougan G and Peacock SJ

    Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK. gavin.koh@gmail.com

    Background: Patients with diabetes mellitus are more prone to bacterial sepsis, but there are conflicting data on whether outcomes are worse in diabetics after presentation with sepsis. Glyburide is an oral hypoglycemic agent used to treat diabetes mellitus. This K(ATP)-channel blocker and broad-spectrum ATP-binding cassette (ABC) transporter inhibitor has broad-ranging effects on the immune system, including inhibition of inflammasome assembly and would be predicted to influence the host response to infection.

    Methods: We studied a cohort of 1160 patients with gram-negative sepsis caused by a single pathogen (Burkholderia pseudomallei), 410 (35%) of whom were known to have diabetes. We subsequently studied prospectively diabetics with B. pseudomallei infection (n = 20) to compare the gene expression profile of peripheral whole blood leukocytes in patients who were taking glyburide against those not taking any sulfonylurea.

    Results: Survival was greater in diabetics than in nondiabetics (38% vs 45%, respectively, P = .04), but the survival benefit was confined to the patient group taking glyburide (adjusted odds ratio .47, 95% confidence interval .28-.74, P = .005). We identified differential expression of 63 immune-related genes (P = .001) in patients taking glyburide, the sum effect of which we predict to be antiinflammatory in the glyburide group.

    Conclusions: We present observational evidence for a glyburide-associated benefit during human melioidosis and correlate this with an anti-inflammatory effect of glyburide on the immune system.

    Funded by: Wellcome Trust: 093956

    Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2011;52;6;717-25

  • Diabetes does not influence activation of coagulation, fibrinolysis or anticoagulant pathways in Gram-negative sepsis (melioidosis).

    Koh GC, Meijers JC, Maude RR, Limmathurotsakul D, Day NP, Peacock SJ, van der Poll T and Wiersinga WJ

    Center for Experimental and Molecular Medicine, Department of Infectious Diseases, Tropical Medicine & AIDS, Academic Medical Center, Amsterdam, The Netherlands. gavin.koh@gmail.com

    Diabetes is associated with a disturbance of the haemostatic balance and is an important risk factor for sepsis, but the influence of diabetes on the pathogenesis of sepsis remains unclear. Melioidosis ( Burkholderia pseudomallei infection) is a common cause of community-acquired sepsis in Southeast Asia and northern Australia. We sought to investigate the impact of pre-existing diabetes on the coagulation and fibrinolytic systems during sepsis caused by B.pseudomallei . We recruited a cohort of 44 patients (34 with diabetes and 10 without diabetes) with culture-proven melioidosis. Diabetes was defined as a pre-admission diagnosis of diabetes or an HbA₁c>7.8% at enrolment. Thirty healthy blood donors and 52 otherwise healthy diabetes patients served as controls. Citrated plasma was collected from all subjects; additionally in melioidosis patients follow-up specimens were collected seven and ≥ 28 days after enrolment where possible. Relative to uninfected healthy controls, diabetes per se (i.e. in the absence of infection) was characterised by a procoagulant effect. Melioidosis was associated with activation of coagulation (thrombin-antithrombin complexes (TAT), prothrombin fragment F₁+₂ and fibrinogen concentrations were elevated; PT and PTT prolonged), suppression of anti-coagulation (antithrombin, protein C, total and free protein S levels were depressed) and abnormalities of fibrinolysis (D-dimer and plasmin-antiplasmin complex [PAP] were elevated). Remarkably, none of these haemostatic alterations were influenced by pre-existing diabetes. In conclusion, although diabetes is associated with multiple abnormalities of coagulation, anticoagulation and fibrinolysis, these changes are not detectable when superimposed on the background of larger abnormalities attributable to B. pseudomallei sepsis.

    Funded by: Wellcome Trust

    Thrombosis and haemostasis 2011;106;6;1139-48

  • iCLIP--transcriptome-wide mapping of protein-RNA interactions with individual nucleotide resolution.

    Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM and Ule J

    Laboratory of Molecular Biology, Medical Research Council - MRC.

    The unique composition and spatial arrangement of RNA-binding proteins (RBPs) on a transcript guide the diverse aspects of post-transcriptional regulation. Therefore, an essential step towards understanding transcript regulation at the molecular level is to gain positional information on the binding sites of RBPs. Protein-RNA interactions can be studied using biochemical methods, but these approaches do not address RNA binding in its native cellular context. Initial attempts to study protein-RNA complexes in their cellular environment employed affinity purification or immunoprecipitation combined with differential display or microarray analysis (RIP-CHIP). These approaches were prone to identifying indirect or non-physiological interactions. In order to increase the specificity and positional resolution, a strategy referred to as CLIP (UV cross-linking and immunoprecipitation) was introduced. CLIP combines UV cross-linking of proteins and RNA molecules with rigorous purification schemes including denaturing polyacrylamide gel electrophoresis. In combination with high-throughput sequencing technologies, CLIP has proven as a powerful tool to study protein-RNA interactions on a genome-wide scale (referred to as HITS-CLIP or CLIP-seq). Recently, PAR-CLIP was introduced that uses photoreactive ribonucleoside analogs for cross-linking. Despite the high specificity of the obtained data, CLIP experiments often generate cDNA libraries of limited sequence complexity. This is partly due to the restricted amount of co-purified RNA and the two inefficient RNA ligation reactions required for library preparation. In addition, primer extension assays indicated that many cDNAs truncate prematurely at the crosslinked nucleotide. Such truncated cDNAs are lost during the standard CLIP library preparation protocol. We recently developed iCLIP (individual-nucleotide resolution CLIP), which captures the truncated cDNAs by replacing one of the inefficient intermolecular RNA ligation steps with a more efficient intramolecular cDNA circularization (Figure 1). Importantly, sequencing the truncated cDNAs provides insights into the position of the cross-link site at nucleotide resolution. We successfully applied iCLIP to study hnRNP C particle organization on a genome-wide scale and assess its role in splicing regulation.

    Journal of visualized experiments : JoVE 2011;50

  • Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci.

    Kooner JS, Saleheen D, Sim X, Sehmi J, Zhang W, Frossard P, Been LF, Chia KS, Dimas AS, Hassanali N, Jafar T, Jowett JB, Li X, Radha V, Rees SD, Takeuchi F, Young R, Aung T, Basit A, Chidambaram M, Das D, Grundberg E, Hedman AK, Hydrie ZI, Islam M, Khor CC, Kowlessur S, Kristensen MM, Liju S, Lim WY, Matthews DR, Liu J, Morris AP, Nica AC, Pinidiyapathirage JM, Prokopenko I, Rasheed A, Samuel M, Shah N, Shera AS, Small KS, Suo C, Wickremasinghe AR, Wong TY, Yang M, Zhang F, DIAGRAM, MuTHER, Abecasis GR, Barnett AH, Caulfield M, Deloukas P, Frayling TM, Froguel P, Kato N, Katulanda P, Kelly MA, Liang J, Mohan V, Sanghera DK, Scott J, Seielstad M, Zimmet PZ, Elliott P, Teo YY, McCarthy MI, Danesh J, Tai ES and Chambers JC

    National Heart and Lung Institute (NHLI), Imperial College London, Hammersmith Hospital, London, UK. j.kooner@imperial.ac.uk

    We carried out a genome-wide association study of type-2 diabetes (T2D) in individuals of South Asian ancestry. Our discovery set included 5,561 individuals with T2D (cases) and 14,458 controls drawn from studies in London, Pakistan and Singapore. We identified 20 independent SNPs associated with T2D at P < 10(-4) for testing in a replication sample of 13,170 cases and 25,398 controls, also all of South Asian ancestry. In the combined analysis, we identified common genetic variants at six loci (GRB14, ST6GAL1, VPS26A, HMG20A, AP3S2 and HNF4A) newly associated with T2D (P = 4.1 × 10(-8) to P = 1.9 × 10(-11)). SNPs at GRB14 were also associated with insulin sensitivity (P = 5.0 × 10(-4)), and SNPs at ST6GAL1 and HNF4A were also associated with pancreatic beta-cell function (P = 0.02 and P = 0.001, respectively). Our findings provide additional insight into mechanisms underlying T2D and show the potential for new discovery from genetic association studies in South Asians, a population with increased susceptibility to T2D.

    Funded by: British Heart Foundation: SP/04/002; FIC NIH HHS: KO1TW006087; Medical Research Council: G0700931; NIDDK NIH HHS: DK-25446, R01DK082766; Wellcome Trust: 070854/Z/03/Z, 080747/Z/06/Z, 083270/Z/07/Z, 084723/Z/08/Z

    Nature genetics 2011;43;10;984-9

  • High-throughput semiquantitative analysis of insertional mutations in heterogeneous tumors.

    Koudijs MJ, Klijn C, van der Weyden L, Kool J, ten Hoeve J, Sie D, Prasetyanti PR, Schut E, Kas S, Whipp T, Cuppen E, Wessels L, Adams DJ and Jonkers J

    Division of Molecular Biology and Cancer Systems Biology Center, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands.

    Retroviral and transposon-based insertional mutagenesis (IM) screens are widely used for cancer gene discovery in mice. Exploiting the full potential of IM screens requires methods for high-throughput sequencing and mapping of transposon and retroviral insertion sites. Current protocols are based on ligation-mediated PCR amplification of junction fragments from restriction endonuclease-digested genomic DNA, resulting in amplification biases due to uneven genomic distribution of restriction enzyme recognition sites. Consequently, sequence coverage cannot be used to assess the clonality of individual insertions. We have developed a novel method, called shear-splink, for the semiquantitative high-throughput analysis of insertional mutations. Shear-splink employs random fragmentation of genomic DNA, which reduces unwanted amplification biases. Additionally, shear-splink enables us to assess clonality of individual insertions by determining the number of unique ligation points (LPs) between the adapter and genomic DNA. This parameter serves as a semiquantitative measure of the relative clonality of individual insertions within heterogeneous tumors. Mixing experiments with clonal cell lines derived from mouse mammary tumor virus (MMTV)-induced tumors showed that shear-splink enables the semiquantitative assessment of the clonality of MMTV insertions. Further, shear-splink analysis of 16 MMTV- and 127 Sleeping Beauty (SB)-induced tumors showed enrichment for cancer-relevant insertions by exclusion of irrelevant background insertions marked by single LPs, thereby facilitating the discovery of candidate cancer genes. To fully exploit the use of the shear-splink method, we set up the Insertional Mutagenesis Database (iMDB), offering a publicly available web-based application to analyze both retroviral- and transposon-based insertional mutagenesis data.

    Funded by: Cancer Research UK; Wellcome Trust

    Genome research 2011;21;12;2181-9

  • FoSTeS, MMBIR and NAHR at the human proximal Xp region and the mechanisms of human Xq isochromosome formation.

    Koumbaris G, Hatzisevastou-Loukidou H, Alexandrou A, Ioannides M, Christodoulou C, Fitzgerald T, Rajan D, Clayton S, Kitsiou-Tzeli S, Vermeesch JR, Skordis N, Antoniou P, Kurg A, Georgiou I, Carter NP and Patsalis PC

    Department of Medical Genetics, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands.

    The recently described DNA replication-based mechanisms of fork stalling and template switching (FoSTeS) and microhomology-mediated break-induced replication (MMBIR) were previously shown to catalyze complex exonic, genic and genomic rearrangements. By analyzing a large number of isochromosomes of the long arm of chromosome X (i(Xq)), using whole-genome tiling path array comparative genomic hybridization (aCGH), ultra-high resolution targeted aCGH and sequencing, we provide evidence that the FoSTeS and MMBIR mechanisms can generate large-scale gross chromosomal rearrangements leading to the deletion and duplication of entire chromosome arms, thus suggesting an important role for DNA replication-based mechanisms in both the development of genomic disorders and cancer. Furthermore, we elucidate the mechanisms of dicentric i(Xq) (idic(Xq)) formation and show that most idic(Xq) chromosomes result from non-allelic homologous recombination between palindromic low copy repeats and highly homologous palindromic LINE elements. We also show that non-recurrent-breakpoint idic(Xq) chromosomes have microhomology-associated breakpoint junctions and are likely catalyzed by microhomology-mediated replication-dependent recombination mechanisms such as FoSTeS and MMBIR. Finally, we stress the role of the proximal Xp region as a chromosomal rearrangement hotspot.

    Funded by: Wellcome Trust: 077008

    Human molecular genetics 2011;20;10;1925-36

  • 96-plex molecular barcoding for the Illumina Genome Analyzer.

    Kozarewa I and Turner DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Next-generation sequencing technologies have a massive throughput, which dramatically reduces the cost of sequencing per gigabase, compared to standard Sanger sequencing. To make the most efficient use of this throughput when sequencing small regions or genomes, we developed a barcoding method, which allows multiplexing of 96 or more samples per lane. The method employs 8 bp tags, incorporated into each sequencing library during the library preparation enrichment polymerase chain reaction (PCR), pooling bar-coded libraries in equimolar ratios based on quantitative PCR, and sequencing using the three-read Illumina method.

    Methods in molecular biology (Clifton, N.J.) 2011;733;279-98

  • Amplification-free library preparation for paired-end Illumina sequencing.

    Kozarewa I and Turner DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    The library preparation step is of critical importance for the quality of next-generation sequencing data. The use of the polymerase chain reaction (PCR) as a part of the standard Illumina library preparation protocol causes an appreciable proportion of the obtained sequences to be duplicates, making the sequencing run less efficient. Also, amplification introduces biases, particularly for genomes with high or low GC content, which reduces the complexity of the resulting library. To overcome these difficulties, we developed an amplification-free library preparation. By the use of custom adapters, unamplified, ligated samples can hybridize directly to the oligonucleotides on the flowcell surface.

    Methods in molecular biology (Clifton, N.J.) 2011;733;257-66

  • A bivariate genome-wide approach to metabolic syndrome: STAMPEED consortium.

    Kraja AT, Vaidya D, Pankow JS, Goodarzi MO, Assimes TL, Kullo IJ, Sovio U, Mathias RA, Sun YV, Franceschini N, Absher D, Li G, Zhang Q, Feitosa MF, Glazer NL, Haritunians T, Hartikainen AL, Knowles JW, North KE, Iribarren C, Kral B, Yanek L, O'Reilly PF, McCarthy MI, Jaquish C, Couper DJ, Chakravarti A, Psaty BM, Becker LC, Province MA, Boerwinkle E, Quertermous T, Palotie L, Jarvelin MR, Becker DM, Kardia SL, Rotter JI, Chen YD and Borecki IB

    Division of Statistical Genomics, Washington University School of Medicine, Saint Louis, Missouri, USA. aldi@wustl.edu

    OBJECTIVE The metabolic syndrome (MetS) is defined as concomitant disorders of lipid and glucose metabolism, central obesity, and high blood pressure, with an increased risk of type 2 diabetes and cardiovascular disease. This study tests whether common genetic variants with pleiotropic effects account for some of the correlated architecture among five metabolic phenotypes that define MetS. RESEARCH DESIGN AND METHODS Seven studies of the STAMPEED consortium, comprising 22,161 participants of European ancestry, underwent genome-wide association analyses of metabolic traits using a panel of ∼2.5 million imputed single nucleotide polymorphisms (SNPs). Phenotypes were defined by the National Cholesterol Education Program (NCEP) criteria for MetS in pairwise combinations. Individuals exceeding the NCEP thresholds for both traits of a pair were considered affected. RESULTS Twenty-nine common variants were associated with MetS or a pair of traits. Variants in the genes LPL, CETP, APOA5 (and its cluster), GCKR (and its cluster), LIPC, TRIB1, LOC100128354/MTNR1B, ABCB11, and LOC100129150 were further tested for their association with individual qualitative and quantitative traits. None of the 16 top SNPs (one per gene) associated simultaneously with more than two individual traits. Of them 11 variants showed nominal associations with MetS per se. The effects of 16 top SNPs on the quantitative traits were relatively small, together explaining from ∼9% of the variance in triglycerides, 5.8% of high-density lipoprotein cholesterol, 3.6% of fasting glucose, and 1.4% of systolic blood pressure. CONCLUSIONS Qualitative and quantitative pleiotropic tests on pairs of traits indicate that a small portion of the covariation in these traits can be explained by the reported common genetic variants.

    Funded by: NCRR NIH HHS: L1-RR-025005, M01RR00425; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: 5R01-HL-087647, 5R01-HL-087679-02, 5R01-HL-087698, HL-087652, HL-087660, HL-087700, HL-0877700, N01-HC-15103, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55022, N01-HC-55222, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, NC01-HC-55021, R01-HL-086694, R01-HL-087641, R01-HL-087652, R01-HL-59367, U01-HL-075572, U01-HL-080295; NIDDK NIH HHS: DK-063491; NIMH NIH HHS: 1RL1-MH-083268-01, 1RL1MH083268-01, 5R01-MH-63706:0

    Diabetes 2011;60;4;1329-39

  • miR-96 regulates the progression of differentiation in mammalian cochlear inner and outer hair cells.

    Kuhn S, Johnson SL, Furness DN, Chen J, Ingham N, Hilton JM, Steffes G, Lewis MA, Zampini V, Hackney CM, Masetto S, Holley MC, Steel KP and Marcotti W

    Department of Biomedical Science, University of Sheffield, Sheffield S10 2TN, United Kingdom.

    MicroRNAs (miRNAs) are small noncoding RNAs able to regulate a broad range of protein-coding genes involved in many biological processes. miR-96 is a sensory organ-specific miRNA expressed in the mammalian cochlea during development. Mutations in miR-96 cause nonsyndromic progressive hearing loss in humans and mice. The mouse mutant diminuendo has a single base change in the seed region of the Mir96 gene leading to widespread changes in the expression of many genes. We have used this mutant to explore the role of miR-96 in the maturation of the auditory organ. We found that the physiological development of mutant sensory hair cells is arrested at around the day of birth, before their biophysical differentiation into inner and outer hair cells. Moreover, maturation of the hair cell stereocilia bundle and remodelling of auditory nerve connections within the cochlea fail to occur in miR-96 mutants. We conclude that miR-96 regulates the progression of the physiological and morphological differentiation of cochlear hair cells and, as such, coordinates one of the most distinctive functional refinements of the mammalian auditory system.

    Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077189, 088719

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;6;2355-60

  • Characterization of a ViI-like phage specific to Escherichia coli O157:H7.

    Kutter EM, Skutt-Kakaria K, Blasdel B, El-Shibiny A, Castano A, Bryan D, Kropinski AM, Villegas A, Ackermann HW, Toribio AL, Pickard D, Anany H, Callaway T and Brabban AD

    The Evergreen State College, Olympia, WA, USA. kutterb@evergreen.edu

    Phage vB_EcoM_CBA120 (CBA120), isolated against Escherichia coli O157:H7 from a cattle feedlot, is morphologically very similar to the classic phage ViI of Salmonella enterica serovar Typhi. Until recently, little was known genetically or physiologically about the ViI-like phages, and none targeting E. coli have been described in the literature. The genome of CBA120 has been fully sequenced and is highly similar to those of both ViI and the Shigella phage AG3. The core set of structural and replication-related proteins of CBA120 are homologous to those from T-even phages, but generally are more closely related to those from T4-like phages of Vibrio, Aeromonas and cyanobacteria than those of the Enterobacteriaceae. The baseplate and method of adhesion to the host are, however, very different from those of either T4 or the cyanophages. None of the outer baseplate proteins are conserved. Instead of T4's long and short tail fibers, CBA120, like ViI, encodes tail spikes related to those normally seen on podoviruses. The 158 kb genome, like that of T4, is circularly permuted and terminally redundant, but unlike T4 CBA120 does not substitute hmdCyt for cytosine in its DNA. However, in contrast to other coliphages, CBA120 and related coliphages we have isolated cannot incorporate 3H-thymidine (3H-dThd) into their DNA. Protein sequence comparisons cluster the putative "thymidylate synthase" of CBA120, ViI and AG3 much more closely with those of Delftia phage φW-14, Bacillus subtilis phage SPO1, and Pseudomonas phage YuA, all known to produce and incorporate hydroxymethyluracil (hmdUra).

    Funded by: NIGMS NIH HHS: 2-R15GM063637-02, 2R15GM63637-3A1.

    Virology journal 2011;8;430

  • Population-prevalent desmosomal mutations predisposing to arrhythmogenic right ventricular cardiomyopathy.

    Lahtinen AM, Lehtonen E, Marjamaa A, Kaartinen M, Heliö T, Porthan K, Oikarinen L, Toivonen L, Swan H, Jula A, Peltonen L, Palotie A, Salomaa V and Kontula K

    Research Program for Molecular Medicine, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland.

    Background: Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a progressive myocardial disorder caused by mutations of desmosomal cell adhesion proteins. The prevalence of these variants in the general population is unknown.

    Objective: This study examined the spectrum and population prevalence of desmosomal mutations predisposing to ARVC in Finland.

    Methods: We screened 29 Finnish ARVC probands for mutations in the DSP, DSG2, and DSC2 genes. All Finnish-type ARVC-associated mutations, including those 3 previously identified in PKP2 in the same patient group, were analyzed in the population-based Health 2000 cohort of 6,334 individuals and tested for association with electrocardiographic variables.

    Results: We detected 2 novel mutations: DSG2 3059_3062delAGAG and DSP T1373A. DSG2 3059_3062delAGAG was present in a family with 5 mutation carriers. The endomyocardial samples of the DSG2 deletion carrier showed reduced immunoreactive signal for desmoglein-2, plakophilin-2, plakoglobin, and desmoplakin. DSP T1373A was found in 1 proband with typical right ventricular disease and exercise-related ventricular tachycardia. In the population sample, the collective prevalence of all 5 mutations identified in the 29 ARVC patients (PKP2 Q62K, Q59L, N613K, DSG2 3059_3062delAGAG, and DSP T1373A) was 31 of 6,334 individuals, or 0.5%. The apparent founder mutation PKP2 Q59L is present in 0.3% of Finns and was previously shown to have an approximately 20% disease penetrance.

    Conclusion: One of 200 Finns carries a desmosomal mutation that may predispose to ARVC and its clinical sequelae. ARVC-associated mutations may thus be more prevalent in the population than expected based on the published ARVC prevalence data.

    Heart rhythm : the official journal of the Heart Rhythm Society 2011;8;8;1214-21

  • X-box binding protein 1 induces the expression of the lytic cycle transactivator of Kaposi's sarcoma-associated herpesvirus but not Epstein-Barr virus in co-infected primary effusion lymphoma.

    Lai IY, Farrell PJ and Kellam P

    University College London, MRC Centre for Molecular Virology, Department of Infection, Division of Infection and Immunity, Windeyer Institute of Medical Science, 46 Cleveland Street, London W1T 4JF, UK.

    Cells of primary effusion lymphoma (PEL), a B-cell non-Hodgkin's lymphoma, are latently infected by Kaposi's sarcoma-associated herpesvirus (KSHV), with about 80 % of PEL also co-infected with Epstein-Barr virus (EBV). Both viruses can be reactivated into their lytic replication cycle in PEL by chemical inducers. However, simultaneous activation of both lytic cascades leads to mutual lytic cycle co-repression. The plasma cell-differentiation factor X-box binding protein 1 (XBP-1) transactivates the KSHV immediate-early promoter leading to the production of the replication and transcription activator protein (RTA), and reactivation of KSHV from latency. XBP-1 has been reported to act similarly on the EBV immediate-early promoter Zp, leading to the production of the lytic-cycle transactivator protein BZLF1. Here we show that activated B-cell terminal-differentiation transcription factor X-box binding protein 1 (XBP-1s) does not induce EBV BZLF1 and BRLF1 expression in PEL and BL cell lines, despite inducing lytic reactivation of KSHV in PEL. We show that XBP-1s transactivates the KSHV RTA promoter but does not transactivate the EBV BZLF1 promoter in non-B-cells by using a luciferase assay. Co-expression of activated protein kinase D, which can phosphorylate and inactivate class II histone deacetylases (HDACs), does not rescue XBP-1 activity on Zp nor does it induce BZLF1 and BRLF1 expression in PEL. Finally, chemical inducers of KSHV and EBV lytic replication in PEL, including HDAC inhibitors, do not lead to XBP-1 activation. We conclude that XBP-1 specifically reactivates the KSHV lytic cycle in dually infected PELs.

    Funded by: Cancer Research UK; Wellcome Trust

    The Journal of general virology 2011;92;Pt 2;421-31

  • Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.

    Laing R, Hunt M, Protasio AV, Saunders G, Mungall K, Laing S, Jackson F, Quail M, Beech R, Berriman M and Gilleard JS

    Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    PloS one 2011;6;8;e23216

  • Review article: faecal transplantation therapy for gastrointestinal disease.

    Landy J, Al-Hassi HO, McLaughlin SD, Walker AW, Ciclitira PJ, Nicholls RJ, Clark SK and Hart AL

    IBD Unit, St Mark's Hospital, Harrow, London, UK.

    Background: Evidence is emerging regarding the relationship between a dysbiosis of the human gut microbiota and a number of gastrointestinal diseases as well as diseases beyond the gut. Probiotics have been investigated in many gastrointestinal disease states, with variable and often modest outcomes. Faecal transplantation is an alternative approach to manipulate the gut microbiota.

    Aim: To review the use of faecal transplantation therapy for the management of gastrointestinal disorders.

    Methods: Available articles on faecal transplantation in the management of gastrointestinal disorders were identified using a Pubmed search and bibliographies of review articles on the subject were collated.

    Results: A total of 239 patients who had undergone faecal transplantation were reported. Seventeen of 22 studies of faecal transplantation were in fulminant or refractory Clostridium difficile. Studies of faecal transplantation are heterogeneous regarding the patients, donors, screening, methods of administration and definition of response. Faecal transplantation for C. difficile has been demonstrated to be effective in 145/166 (87%) patients. Small numbers of patients are reported to have undergone successful faecal transplantation for irritable bowel syndrome and inflammatory bowel disease.

    Conclusions: Faecal transplantation has been reported with good outcomes for fulminant and refractory C. difficile. No adverse effects of faecal transplantation have been reported. However, there are no level 1 data of faecal transplantation and reports to date may suffer from reporting bias of positive outcomes and under-reporting of adverse effects. This therapy holds great promise, where a dysbiosis of the gut microbiota is responsible for disease and further studies are necessary to explore this potential.

    Alimentary pharmacology & therapeutics 2011;34;4;409-15

  • Plant power: converting a kingdom.

    Langridge G

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. microbes@sanger.ac.uk

    This month, Genome Watch looks at the potential for bacterially derived enzymes to degrade lignocellulose from plant biomass and thus provide an efficient way of producing biofuels.

    Nature reviews. Microbiology 2011;9;5;316

  • Meta-analysis of Dense Genecentric Association Studies Reveals Common and Uncommon Variants Associated with Height.

    Lanktree MB, Guo Y, Murtaza M, Glessner JT, Bailey SD, Onland-Moret NC, Lettre G, Ongen H, Rajagopalan R, Johnson T, Shen H, Nelson CP, Klopp N, Baumert J, Padmanabhan S, Pankratz N, Pankow JS, Shah S, Taylor K, Barnard J, Peters BJ, Maloney CM, Lobmeyer MT, Stanton A, Zafarmand MH, Romaine SP, Mehta A, van Iperen EP, Gong Y, Price TS, Smith EN, Kim CE, Li YR, Asselbergs FW, Atwood LD, Bailey KM, Bhatt D, Bauer F, Behr ER, Bhangale T, Boer JM, Boehm BO, Bradfield JP, Brown M, Braund PS, Burton PR, Carty C, Chandrupatla HR, Chen W, Connell J, Dalgeorgou C, Boer A, Drenos F, Elbers CC, Fang JC, Fox CS, Frackelton EC, Fuchs B, Furlong CE, Gibson Q, Gieger C, Goel A, Grobbee DE, Hastie C, Howard PJ, Huang GH, Johnson WC, Li Q, Kleber ME, Klein BE, Klein R, Kooperberg C, Ky B, Lacroix A, Lanken P, Lathrop M, Li M, Marshall V, Melander O, Mentch FD, Meyer NJ, Monda KL, Montpetit A, Murugesan G, Nakayama K, Nondahl D, Onipinla A, Rafelt S, Newhouse SJ, Otieno FG, Patel SR, Putt ME, Rodriguez S, Safa RN, Sawyer DB, Schreiner PJ, Simpson C, Sivapalaratnam S, Srinivasan SR, Suver C, Swergold G, Sweitzer NK, Thomas KA, Thorand B, Timpson NJ, Tischfield S, Tobin M, Tomaszweski M, Verschuren WM, Wallace C, Winkelmann B, Zhang H, Zheng D, Zhang L, Zmuda JM, Clarke R, Balmforth AJ, Danesh J, Day IN, Schork NJ, de Bakker PI, Delles C, Duggan D, Hingorani AD, Hirschhorn JN, Hofker MH, Humphries SE, Kivimaki M, Lawlor DA, Kottke-Marchant K, Mega JL, Mitchell BD, Morrow DA, Palmen J, Redline S, Shields DC, Shuldiner AR, Sleiman PM, Smith GD, Farrall M, Jamshidi Y, Christiani DC, Casas JP, Hall AS, Doevendans PA, Christie JD, Berenson GS, Murray SS, Illig T, Dorn GW, Cappola TP, Boerwinkle E, Sever P, Rader DJ, Reilly MP, Caulfield M, Talmud PJ, Topol E, Engert JC, Wang K, Dominiczak A, Hamsten A, Curtis SP, Silverstein RL, Lange LA, Sabatine MS, Trip M, Saleheen D, Peden JF, Cruickshanks KJ, März W, O'Connell JR, Klungel OH, Wijmenga C, Maitland-van der Zee AH, Schadt EE, Johnson JA, Jarvik GP, Papanicolaou GJ, Hugh Watkins on behalf of PROCARDIS, Grant SF, Munroe PB, North KE, Samani NJ, Koenig W, Gaunt TR, Anand SS, van der Schouw YT, Meena Kumari on behalf of the Whitehall II Study and the WHII 50K Group, Soranzo N, Fitzgerald GA, Reiner A, Hegele RA, Hakonarson H and Keating BJ

    Department of Medicine and Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada.

    Height is a classic complex trait with common variants in a growing list of genes known to contribute to the phenotype. Using a genecentric genotyping array targeted toward cardiovascular-related loci, comprising 49,320 SNPs across approximately 2000 loci, we evaluated the association of common and uncommon SNPs with adult height in 114,223 individuals from 47 studies and six ethnicities. A total of 64 loci contained a SNP associated with height at array-wide significance (p < 2.4 × 10(-6)), with 42 loci surpassing the conventional genome-wide significance threshold (p < 5 × 10(-8)). Common variants with minor allele frequencies greater than 5% were observed to be associated with height in 37 previously reported loci. In individuals of European ancestry, uncommon SNPs in IL11 and SMAD3, which would not be genotyped with the use of standard genome-wide genotyping arrays, were strongly associated with height (p < 3 × 10(-11)). Conditional analysis within associated regions revealed five additional variants associated with height independent of lead SNPs within the locus, suggesting allelic heterogeneity. Although underpowered to replicate findings from individuals of European ancestry, the direction of effect of associated variants was largely consistent in African American, South Asian, and Hispanic populations. Overall, we show that dense coverage of genes for uncommon SNPs, coupled with large-scale meta-analysis, can successfully identify additional variants associated with a common complex trait.

    Funded by: Canadian Institutes of Health Research: 79533; NEI NIH HHS: U10 EY006594-24; NHLBI NIH HHS: K23 HL095661-02; NIA NIH HHS: R01 AG011099-07, R01 AG011099-08, R01 AG011099-09, R01 AG011099-10, R01 AG011099-11, R01 AG011099-12, R01 AG011099-13, R01 AG011099-14, R01 AG011099-15, R01 AG011099-15S1, R01 AG021917-08, R37 AG011099-17, R37 AG011099-19

    American journal of human genetics 2011;88;1;6-18

  • SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.

    Le SQ and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.

    Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.

    Funded by: Wellcome Trust: WT089088/Z/09/Z

    Genome research 2011;21;6;952-60

  • Chromosomal instability confers intrinsic multidrug resistance.

    Lee AJ, Endesfelder D, Rowan AJ, Walther A, Birkbak NJ, Futreal PA, Downward J, Szallasi Z, Tomlinson IP, Howell M, Kschischo M and Swanton C

    Translational Cancer Therapeutics Laboratory, Cancer Research UK London Research Institute, London, United Kingdom.

    Aneuploidy is associated with poor prognosis in solid tumors. Spontaneous chromosome missegregation events in aneuploid cells promote chromosomal instability (CIN) that may contribute to the acquisition of multidrug resistance in vitro and heighten risk for tumor relapse in animal models. Identification of distinct therapeutic agents that target tumor karyotypic complexity has important clinical implications. To identify distinct therapeutic approaches to specifically limit the growth of CIN tumors, we focused on a panel of colorectal cancer (CRC) cell lines, previously classified as either chromosomally unstable (CIN(+)) or diploid/near-diploid (CIN(-)), and treated them individually with a library of kinase inhibitors targeting components of signal transduction, cell cycle, and transmembrane receptor signaling pathways. CIN(+) cell lines displayed significant intrinsic multidrug resistance compared with CIN(-) cancer cell lines, and this seemed to be independent of somatic mutation status and proliferation rate. Confirming the association of CIN rather than ploidy status with multidrug resistance, tetraploid isogenic cells that had arisen from diploid cell lines displayed lower drug sensitivity than their diploid parental cells only with increasing chromosomal heterogeneity and isogenic cell line models of CIN(+) displayed multidrug resistance relative to their CIN(-) parental cancer cell line derivatives. In a meta-analysis of CRC outcome following cytotoxic treatment, CIN(+) predicted worse progression-free or disease-free survival relative to patients with CIN(-) disease. Our results suggest that stratifying tumor responses according to CIN status should be considered within the context of clinical trials to minimize the confounding effects of tumor CIN status on drug sensitivity.

    Funded by: Cancer Research UK; Medical Research Council: G0701935(86985)

    Cancer research 2011;71;5;1858-70

  • Induction of stable drug resistance in human breast cancer cells using a combinatorial zinc finger transcription factor library.

    Lee J, Hirsh AS, Wittner BS, Maeder ML, Singavarapu R, Lang M, Janarthanan S, McDermott U, Yajnik V, Ramaswamy S, Joung JK and Sgroi DC

    Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America.

    Combinatorial libraries of artificial zinc-finger transcription factors (ZF-TFs) provide a robust tool for inducing and understanding various functional components of the cancer phenotype. Herein, we utilized combinatorial ZF-TF library technology to better understand how breast cancer cells acquire resistance to fulvestrant, a clinically important anti-endocrine therapeutic agent. From a diverse collection of nearly 400,000 different ZF-TFs, we isolated six ZF-TF library members capable of inducing stable, long-term anti-endocrine drug-resistance in two independent estrogen receptor-positive breast cancer cell lines. Comparative gene expression profile analysis of the six different ZF-TF-transduced breast cancer cell lines revealed five distinct clusters of differentially expressed genes. One cluster was shared among all 6 ZF-TF-transduced cell lines and therefore constituted a common fulvestrant-resistant gene expression signature. Pathway enrichment-analysis of this common fulvestrant resistant signature also revealed significant overlap with gene sets associated with an estrogen receptor-negative-like state and with gene sets associated with drug resistance to different classes of breast cancer anti-endocrine therapeutic agents. Enrichment-analysis of the four remaining unique gene clusters revealed overlap with myb-regulated genes. Finally, we also demonstrated that the common fulvestrant-resistant signature is associated with poor prognosis by interrogating five independent, publicly available human breast cancer gene expression datasets. Our results demonstrate that artificial ZF-TF libraries can be used successfully to induce stable drug-resistance in human cancer cell lines and to identify a gene expression signature that is associated with a clinically relevant drug-resistance phenotype.

    Funded by: NCI NIH HHS: R01-CA112021, T32 CA009216; NIGMS NIH HHS: R01 GM069906

    PloS one 2011;6;7;e21112

  • New IBD genetics: common pathways with other diseases.

    Lees CW, Barrett JC, Parkes M and Satsangi J

    Gastrointestinal Unit, Molecular Medicine Centre, University of Edinburgh, Edinburgh, UK. charlie.lees@ed.ac.uk

    Complex disease genetics has been revolutionised in recent years by the advent of genome-wide association (GWA) studies. The chronic inflammatory bowel diseases (IBDs), Crohn's disease and ulcerative colitis have seen notable successes culminating in the discovery of 99 published susceptibility loci/genes (71 Crohn's disease; 47 ulcerative colitis) to date. Approximately one-third of loci described confer susceptibility to both Crohn's disease and ulcerative colitis. Amongst these are multiple genes involved in IL23/Th17 signalling (IL23R, IL12B, JAK2, TYK2 and STAT3), IL10, IL1R2, REL, CARD9, NKX2.3, ICOSLG, PRDM1, SMAD3 and ORMDL3. The evolving genetic architecture of IBD has furthered our understanding of disease pathogenesis. For Crohn's disease, defective processing of intracellular bacteria has become a central theme, following gene discoveries in autophagy and innate immunity (associations with NOD2, IRGM, ATG16L1 are specific to Crohn's disease). Genetic evidence has also demonstrated the importance of barrier function to the development of ulcerative colitis (HNF4A, LAMB1, CDH1 and GNA12). However, when the data are analysed in more detail, deeper themes emerge including the shared susceptibility seen with other diseases. Many immune-mediated diseases overlap in this respect, paralleling the reported epidemiological evidence. However, in several cases the reported shared susceptibility appears at odds with the clinical picture. Examples include both type 1 and type 2 diabetes mellitus. In this review we will detail the presently available data on the genetic overlap between IBD and other diseases. The discussion will be informed by the epidemiological data in the published literature and the implications for pathogenesis and therapy will be outlined. This arena will move forwards very quickly in the next few years. Ultimately, we anticipate that these genetic insights will transform the landscape of common complex diseases such as IBD.

    Gut 2011;60;12;1739-53

  • ITFoM - The IT future of medicine

    Lehrach,H., Subrak,R., Boyle,P., Pasterk,M., Zatloukal,K., Muller, H., HUBBARD,T., Brand,A., Girolami,M., Jameson,D., Bruggeman,F.J. and Westerhoff,H.V.

    Procedia Computer Science 2011;7;26-9

  • Q8IYL2 is a candidate gene for the familial epilepsy syndrome of Partial Epilepsy with Pericentral Spikes (PEPS).

    Leschziner GD, Coffey AJ, Andrew T, Gregorio SP, Dias-Neto E, Calafato M, Bentley DR, Kinton L, Sander JW and Johnson MR

    Division of Neuroscience, Imperial College London, UK; Wellcome Trust Sanger Institute, Cambridge, UK. guylesch@gmail.com

    Purpose: Partial Epilepsy with Pericentral Spikes (PEPS) is a novel Mendelian idiopathic epilepsy with evidence of linkage to Chromosome 4p15. Our aim was to identify the causative mutation in this epilepsy syndrome.

    Methods: We re-annotated all 42 genes in the linked chromosomal region and sequenced all genes within the linked interval. All exons, intron-exon boundaries and untranslated regions were sequenced in the original pedigree, and novel changes segregating correctly were subjected to bioinformatic analysis. Quantitative polymerase chain reaction was performed to examine for potential copy number variation (CNV).

    Results: 29 previously undescribed variants correctly segregating with the linked haplotype were identified. Bioinformatic analysis demonstrated that six variants were non-synonymous coding sequence polymorphisms, one of which, in Q8IYL2 (Gly400Ala), was found in neither Caucasian (n=243) and ancestry-matched Brazilian (n=180) control samples, nor subjects from the 1000 Genome Project. No gene duplications or deletions were identified in the linked region.

    Discussion: We postulate that Q8IYL2 is a causative gene for PEPS, after exhaustive resequencing and bioinformatic analysis. The function of this gene is unknown, but it is expressed in brain tissue.

    Epilepsy research 2011;96;1-2;109-15

  • SET nuclear oncogene associates with microcephalin/MCPH1 and regulates chromosome condensation.

    Leung JW, Leitch A, Wood JL, Shaw-Smith C, Metcalfe K, Bicknell LS, Jackson AP and Chen J

    Department of Experimental Radiation Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, USA.

    Primary microcephaly is an autosomal recessive disorder characterized by marked reduction in human brain size. Microcephalin (MCPH1), one of the genes mutated in primary microcephaly, plays an important role in DNA damage checkpoint control and mitotic entry. Additionally, MCPH1 ensures the proper temporal activation of chromosome condensation during mitosis, by acting as a negative regulator of the condensin II complex. We previously found that deletion of the of the MCPH1 N terminus leads to the premature chromosome condensation (PCC) phenotype. In the present study, we unexpectedly observed that a truncated form of MCPH1 appears to be expressed in MCPH1(S25X/S25X) patient cells. This likely results from utilization of an alternative translational start codon, which would produce a mutant MCPH1 protein with a small deletion of its N-terminal BRCT domain. Furthermore, missense mutations in the MCPH1 cluster at its N terminus, suggesting that intact function of this BRCT protein-interaction domain is required both for coordinating chromosome condensation and human brain development. Subsequently, we identified the SET nuclear oncogene as a direct binding partner of the MCPH1 N-terminal BRCT domain. Cells with SET knockdown exhibited abnormal condensed chromosomes similar to those observed in MCPH1-deficient mouse embryonic fibroblasts. Condensin II knockdown rescued the abnormal chromosome condensation phenotype in SET-depleted cells. In addition, MCPH1 V50G/I51V missense mutations, impair binding to SET and fail to fully rescue the abnormal chromosome condensation phenotype in Mcph1(-/-) mouse embryonic fibroblasts. Collectively, our findings suggest that SET is an important regulator of chromosome condensation/decondensation and that disruption of the MCPH1-SET interaction might be important for the pathogenesis of primary microcephaly.

    Funded by: Medical Research Council; NCI NIH HHS: CA089239, CA092312

    The Journal of biological chemistry 2011;286;24;21393-400

  • Inference of human population history from individual whole-genome sequences.

    Li H and Durbin R

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.

    Funded by: Wellcome Trust: 077192, WT077192

    Nature 2011;475;7357;493-6

  • A combined analysis of genome-wide association studies in breast cancer.

    Li J, Humphreys K, Heikkinen T, Aittomäki K, Blomqvist C, Pharoah PD, Dunning AM, Ahmed S, Hooning MJ, Martens JW, van den Ouweland AM, Alfredsson L, Palotie A, Peltonen-Palotie L, Irwanto A, Low HQ, Teoh GH, Thalamuthu A, Easton DF, Nevanlinna H, Liu J, Czene K and Hall P

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, 17177 Stockholm, Sweden.

    In an attempt to identify common disease susceptibility alleles for breast cancer, we performed a combined analysis of three genome-wide association studies (GWAS), involving 2,702 women of European ancestry with invasive breast cancer and 5,726 controls. Tests for association were performed for 285,984 SNPs. Evidence for association with SNPs in genes in specific pathways was assessed using a permutation-based approach. We confirmed associations with loci reported by previous GWAS on 1p11.2, 2q35, 3p, 5p12, 8q24, 10q23.13, 14q24.1 and 16q. Six SNPs with the strongest signals of association with breast cancer, and which have not been reported previously, were typed in two further studies; however, none of the associations could be confirmed. Suggestive evidence for an excess of associations was found for genes involved in the regulation of actin cytoskeleton, glycan degradation, alpha-linolenic acid metabolism, circadian rhythm, hematopoietic cell lineage and drug metabolism. Androgen and oestrogen metabolism, a pathway previously found to be associated with the development of postmenopausal breast cancer, was marginally significant (P = 0.051 [unadjusted]). These results suggest that further analysis of SNPs in these pathways may identify associations that would be difficult to detect through agnostic single SNP analyses. More effort focused in these aspects of oncology can potentially open up promising avenues for the understanding of breast cancer and its prevention.

    Funded by: Cancer Research UK: A10119, A10124; NCI NIH HHS: R01 CA58427

    Breast cancer research and treatment 2011;126;3;717-27

  • Crafting rat genomes with zinc fingers.

    Li MA and Bradley A

    Funded by: Wellcome Trust: 077187

    Nature biotechnology 2011;29;1;39-41

  • Mobilization of giant piggyBac transposons in the mouse genome.

    Li MA, Turner DJ, Ning Z, Yusa K, Liang Q, Eckert S, Rad L, Fitzgerald TW, Craig NL and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.

    The development of technologies that allow the stable delivery of large genomic DNA fragments in mammalian systems is important for genetic studies as well as for applications in gene therapy. DNA transposons have emerged as flexible and efficient molecular vehicles to mediate stable cargo transfer. However, the ability to carry DNA fragments >10 kb is limited in most DNA transposons. Here, we show that the DNA transposon piggyBac can mobilize 100-kb DNA fragments in mouse embryonic stem (ES) cells, making it the only known transposon with such a large cargo capacity. The integrity of the cargo is maintained during transposition, the copy number can be controlled and the inserted giant transposons express the genomic cargo. Furthermore, these 100-kb transposons can also be excised from the genome without leaving a footprint. The development of piggyBac as a large cargo vector will facilitate a wider range of genetic and genomic applications.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: WT077187

    Nucleic acids research 2011;39;22;e148

  • Zebrafish Fukutin family proteins link the unfolded protein response with dystroglycanopathies.

    Lin YY, White RJ, Torelli S, Cirak S, Muntoni F and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Allelic mutations in putative glycosyltransferase genes, fukutin and fukutin-related protein (fkrp), lead to a wide range of muscular dystrophies associated with hypoglycosylation of α-dystroglycan, commonly referred to as dystroglycanopathies. Defective glycosylation affecting dystroglycan-ligand interactions is considered to underlie the disease pathogenesis. We have modelled dystroglycanopathies in zebrafish using a novel loss-of-function dystroglycan allele and by inhibition of Fukutin family protein activities. We show that muscle pathology in embryos lacking Fukutin or FKRP is different from loss of dystroglycan. In addition to hypoglycosylated α-dystroglycan, knockdown of Fukutin or FKRP leads to a notochord defect and a perturbation of laminin expression before muscle degeneration. These are a consequence of endoplasmic reticulum stress and activation of the unfolded protein response (UPR), preceding loss of dystroglycan-ligand interactions. Together, our results suggest that Fukutin family proteins may play important roles in protein secretion and that the UPR may contribute to the phenotypic spectrum of some dystroglycanopathies in humans.

    Funded by: Medical Research Council: G0502130, G0601943; Wellcome Trust: 077037/Z/05/Z, 077047/Z/05/Z

    Human molecular genetics 2011;20;9;1763-75

  • Knime4Bio: a set of custom nodes for the interpretation of next-generation sequencing data with KNIME.

    Lindenbaum P, Le Scouarnec S, Portero V and Redon R

    Institut du thorax, Inserm UMR 915, Centre Hospitalier Universitaire de Nantes, 44000 Nantes, France.

    SUMMARY: Analysing large amounts of data generated by next-generation sequencing (NGS) technologies is difficult for researchers or clinicians without computational skills. They are often compelled to delegate this task to computer biologists working with command line utilities. The availability of easy-to-use tools will become essential with the generalization of NGS in research and diagnosis. It will enable investigators to handle much more of the analysis. Here, we describe Knime4Bio, a set of custom nodes for the KNIME (The Konstanz Information Miner) interactive graphical workbench, for the interpretation of large biological datasets. We demonstrate that this tool can be utilized to quickly retrieve previously published scientific findings.

    Funded by: Wellcome Trust: WT077008

    Bioinformatics (Oxford, England) 2011;27;22;3200-1

  • Stella-Cre mice are highly efficient Cre deleters.

    Liu H, Wang W, Chew SK, Lee SC, Li J, Vassiliou GS, Green T, Futreal PA, Bradley A, Zhang S and Liu P

    College of Animal Science and Technology, Huazhong Agriculture University, Wuhan, China.

    Cre-loxP recombination is widely used for genetic manipulation of the mouse genome. Here, we report generation and characterization of a new Cre line, Stella-Cre, where Cre expression cassette was targeted to the 3' UTR of the Stella locus. Stella is specifically expressed in preimplantation embryos and in the germline. Cre-loxP recombination efficiency in Stella-Cre mice was investigated at several genomic loci including Rosa26, Jak2, and Npm1. At all the loci examined, we observed 100% Cre-loxP recombination efficiency in the embryos and in the germline. Thus, Stella-Cre mice serve as a very efficient deleter line.

    Funded by: Wellcome Trust

    Genesis (New York, N.Y. : 2000) 2011;49;8;689-95

  • Comparative and demographic analysis of orang-utan genomes.

    Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang SP, Wang Z, Chinwalla AT, Minx P, Mitreva M, Cook L, Delehaunty KD, Fronick C, Schmidt H, Fulton LA, Fulton RS, Nelson JO, Magrini V, Pohl C, Graves TA, Markovic C, Cree A, Dinh HH, Hume J, Kovar CL, Fowler GR, Lunter G, Meader S, Heger A, Ponting CP, Marques-Bonet T, Alkan C, Chen L, Cheng Z, Kidd JM, Eichler EE, White S, Searle S, Vilella AJ, Chen Y, Flicek P, Ma J, Raney B, Suh B, Burhans R, Herrero J, Haussler D, Faria R, Fernando O, Darré F, Farré D, Gazave E, Oliva M, Navarro A, Roberto R, Capozzi O, Archidiacono N, Della Valle G, Purgato S, Rocchi M, Konkel MK, Walker JA, Ullmer B, Batzer MA, Smit AF, Hubley R, Casola C, Schrider DR, Hahn MW, Quesada V, Puente XS, Ordoñez GR, López-Otín C, Vinar T, Brejova B, Ratan A, Harris RS, Miller W, Kosiol C, Lawson HA, Taliwal V, Martins AL, Siepel A, Roychoudhury A, Ma X, Degenhardt J, Bustamante CD, Gutenkunst RN, Mailund T, Dutheil JY, Hobolth A, Schierup MH, Ryder OA, Yoshinaga Y, de Jong PJ, Weinstock GM, Rogers J, Mardis ER, Gibbs RA and Wilson RK

    The Genome Center at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, Missouri 63108, USA. dlocke@wustl.edu

    'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.

    Funded by: Medical Research Council: G0501331, MC_U137761446; NHGRI NIH HHS: HG002238, HG002385, U54 HG003079, U54 HG003079-08, U54 HG003273; NHLBI NIH HHS: T32 HL091823; NIA NIH HHS: P01 AG022064; NIGMS NIH HHS: R01 GM059290, R01 GM59290

    Nature 2011;469;7331;529-33

  • ATMIN is required for maintenance of genomic stability and suppression of B cell lymphoma.

    Loizou JI, Sancho R, Kanu N, Bolland DJ, Yang F, Rada C, Corcoran AE and Behrens A

    Mammalian Genetics Lab, Cancer Research UK, London Research Institute, 44, Lincoln's Inn Fields, London WC2A 3LY, UK.

    Defective V(D)J rearrangement of immunoglobulin heavy or light chain (IgH or IgL) or class switch recombination (CSR) can initiate chromosomal translocations. The DNA-damage kinase ATM is required for the suppression of chromosomal translocations but ATM regulation is incompletely understood. Here, we show that mice lacking the ATM cofactor ATMIN in B cells (ATMIN(ΔB/ΔB)) have impaired ATM signaling and develop B cell lymphomas. Notably, ATMIN(ΔB/ΔB) cells exhibited defective peripheral V(D)J rearrangement and CSR, resulting in translocations involving the Igh and Igl loci, indicating that ATMIN is required for efficient repair of DNA breaks generated during somatic recombination. Thus, our results identify a role for ATMIN in regulating the maintenance of genomic stability and tumor suppression in B cells.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/0000C163; Cancer Research UK; Medical Research Council: MC_U105178806; Wellcome Trust

    Cancer cell 2011;19;5;587-600

  • PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.

    Long Q, Jeffares DC, Zhang Q, Ye K, Nizhynska V, Ning Z, Tyler-Smith C and Nordborg M

    Gregor Mendel Institute, Vienna, Austria. quan.long@gmi.oeaw.ac.at

    With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/.

    Funded by: Wellcome Trust: 085775/Z/08/Z

    PloS one 2011;6;1;e15292

  • The Usher 1B protein, MYO7A, is required for normal localization and function of the visual retinoid cycle enzyme, RPE65.

    Lopes VS, Gibbs D, Libby RT, Aleman TS, Welch DL, Lillo C, Jacobson SG, Radu RA, Steel KP and Williams DS

    Jules Stein Eye Institute and Department of Neurobiology, UCLA School of Medicine, University of California-Los Angeles, 200 Stein Plaza, Los Angeles, CA 90095, USA.

    Mutations in the MYO7A gene cause a deaf-blindness disorder, known as Usher syndrome 1B.  In the retina, the majority of MYO7A is in the retinal pigmented epithelium (RPE), where many of the reactions of the visual retinoid cycle take place.  We have observed that the retinas of Myo7a-mutant mice are resistant to acute light damage. In exploring the basis of this resistance, we found that Myo7a-mutant mice have lower levels of RPE65, the RPE isomerase that has a key role in the retinoid cycle.  We show for the first time that RPE65 normally undergoes a light-dependent translocation to become more concentrated in the central region of the RPE cells.  This translocation requires MYO7A, so that, in Myo7a-mutant mice, RPE65 is partly mislocalized in the light.  RPE65 is degraded more quickly in Myo7a-mutant mice, perhaps due to its mislocalization, providing a plausible explanation for its lower levels.  Following a 50-60% photobleach, Myo7a-mutant retinas exhibited increased all-trans-retinyl ester levels during the initial stages of dark recovery, consistent with a deficiency in RPE65 activity.  Lastly, MYO7A and RPE65 were co-immunoprecipitated from RPE cell lysate by antibodies against either of the proteins, and the two proteins were partly colocalized, suggesting a direct or indirect interaction.  Together, the results support a role for MYO7A in the translocation of RPE65, illustrating the involvement of a molecular motor in the spatiotemporal organization of the retinoid cycle in vision.

    Funded by: Medical Research Council; NEI NIH HHS: EY00331, EY07042, EY13203, P30 EY000331-45, R01 EY007042, R01 EY007042-25; Wellcome Trust: 077189

    Human molecular genetics 2011;20;13;2560-70

  • ADAM-15 disintegrin-like domain structure and function

    LU D, Scully M, Kakkar V, Lu X

    Toxins. 2011;2;2411-27

  • Whole genome association scan for genetic polymorphisms influencing information processing speed.

    Luciano M, Hansell NK, Lahti J, Davies G, Medland SE, Räikkönen K, Tenesa A, Widen E, McGhee KA, Palotie A, Liewald D, Porteous DJ, Starr JM, Montgomery GW, Martin NG, Eriksson JG, Wright MJ and Deary IJ

    Centre for Cognitive Aging and Cognitive Epidemiology, Department of Psychology, University of Edinburgh, Scotland, UK. michelle.luciano@ed.ac.uk

    Processing speed is an important cognitive function that is compromised in psychiatric illness (e.g., schizophrenia, depression) and old age; it shares genetic background with complex cognition (e.g., working memory, reasoning). To find genes influencing speed we performed a genome-wide association scan in up to three cohorts: Brisbane (mean age 16 years; N = 1659); LBC1936 (mean age 70 years, N = 992); LBC1921 (mean age 82 years, N = 307), and; HBCS (mean age 64 years, N =1080). Meta-analysis of the common measures highlighted various suggestively significant (p < 1.21 × 10⁻⁵) SNPs and plausible candidate genes (e.g., TRIB3). A biological pathways analysis of the speed factor identified two common pathways from the KEGG database (cell junction, focal adhesion) in two cohorts, while a pathway analysis linked to the GO database revealed common pathways across pairs of speed measures (e.g., receptor binding, cellular metabolic process). These highlighted genes and pathways will be able to inform future research, including results for psychiatric disease.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 089062, WT089062

    Biological psychology 2011;86;3;193-202

  • A large palindrome with interchromosomal gene duplications in the pericentromeric region of the D. melanogaster Y chromosome.

    Méndez-Lago M, Bergman CM, de Pablos B, Tracey A, Whitehead SL and Villasante A

    Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain.

    The non-recombining Y chromosome is expected to degenerate over evolutionary time, however, gene gain is a common feature of Y chromosomes of mammals and Drosophila. Here, we report that a large palindrome containing interchromosomal segmental duplications is located in the vicinity of the first amplicon detected in the Y chromosome of D. melanogaster. The recent appearance of such amplicons suggests that duplications to the Y chromosome, followed by the amplification of the segmental duplications, are a mechanism for the continuing evolution of Drosophila Y chromosomes.

    Funded by: Wellcome Trust

    Molecular biology and evolution 2011;28;7;1967-71

  • Knockdown of mental disorder susceptibility genes disrupts neuronal network physiology in vitro.

    MacLaren EJ, Charlesworth P, Coba MP and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, UK.

    Schizophrenia and bipolar disorder are common diseases caused by multiple genes that disrupt brain circuits. While great progress has been made in identifying schizophrenia susceptibility genes, these studies have left two major unanswered mechanistic questions: is there a core biochemical mechanism that these genes regulate, and what are the electrophysiological consequences of the altered gene expression? Because clinical studies implicate abnormalities in neuronal networks, we developed a system for studying the neurophysiology of neuronal networks in vitro where the role of candidate disease genes can be rapidly assayed. Using this system we focused on three postsynaptic proteins DISC1, TNIK and PSD-93/DLG2 each of which is encoded by a schizophrenia susceptibility gene. We also examined the utility of this assay system in bipolar disorder (BD), which has a strong genetic overlap with schizophrenia, by examining the bipolar disorder susceptibility gene Dctn5. The global neuronal network firing behavior of primary cultures of mouse hippocampus neurons was examined on multi-electrode arrays (MEAs) and genes of interest were knocked down using RNAi interference. Measurement of multiple neural network parameters demonstrated phenotypes for these genes compared with controls. Moreover, the different genes disrupted network properties and showed distinct and overlapping effects. These data show multiple susceptibility genes for complex psychiatric disorders, regulate neural network physiology and demonstrate a new assay system with wide application.

    Molecular and cellular neurosciences 2011;47;2;93-9

  • Clinical significance of SF3B1 mutations in myelodysplastic syndromes and myelodysplastic/myeloproliferative neoplasms.

    Malcovati L, Papaemmanuil E, Bowen DT, Boultwood J, Della Porta MG, Pascutto C, Travaglino E, Groves MJ, Godfrey AL, Ambaglio I, Gallì A, Da Vià MC, Conte S, Tauro S, Keenan N, Hyslop A, Hinton J, Mudie LJ, Wainscoat JS, Futreal PA, Stratton MR, Campbell PJ, Hellström-Lindberg E, Cazzola M and Chronic Myeloid Disorders Working Group of the International Cancer Genome Consortium and of the Associazione Italiana per la Ricerca sul Cancro Gruppo Italiano Malattie Mieloproliferative

    Department of Hematology Oncology, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico Policlinico San Matteo and University of Pavia, Pavia, Italy.

    In a previous study, we identified somatic mutations of SF3B1, a gene encoding a core component of RNA splicing machinery, in patients with myelodysplastic syndrome (MDS). Here, we define the clinical significance of these mutations in MDS and myelodysplastic/myeloproliferative neoplasms (MDS/MPN). The coding exons of SF3B1 were screened using massively parallel pyrosequencing in patients with MDS, MDS/MPN, or acute myeloid leukemia (AML) evolving from MDS. Somatic mutations of SF3B1 were found in 150 of 533 (28.1%) patients with MDS, 16 of 83 (19.3%) with MDS/MPN, and 2 of 38 (5.3%) with AML. There was a significant association of SF3B1 mutations with the presence of ring sideroblasts (P < .001) and of mutant allele burden with their proportion (P = .002). The mutant gene had a positive predictive value for ring sideroblasts of 97.7% (95% confidence interval, 93.5%-99.5%). In multivariate analysis including established risk factors, SF3B1 mutations were found to be independently associated with better overall survival (hazard ratio = 0.15, P = .025) and lower risk of evolution into AML (hazard ratio = 0.33, P = .049). The close association between SF3B1 mutations and disease phenotype with ring sideroblasts across MDS and MDS/MPN is consistent with a causal relationship. Furthermore, SF3B1 mutations are independent predictors of favorable clinical outcome, and their incorporation into stratification systems might improve risk assessment in MDS.

    Funded by: Wellcome Trust: 077012/Z/05/Z, WT088340MA

    Blood 2011;118;24;6239-46

  • A research agenda for malaria eradication: basic science and enabling technologies.

    malERA Consultative Group on Basic Science and Enabling Technologies

    Today's malaria control efforts are limited by our incomplete understanding of the biology of Plasmodium and of the complex relationships between human populations and the multiple species of mosquito and parasite. Research priorities include the development of in vitro culture systems for the complete life cycle of P. falciparum and P. vivax and the development of an appropriate liver culture system to study hepatic stages. In addition, genetic technologies for the manipulation of Plasmodium need to be improved, the entire parasite metabolome needs to be characterized to identify new druggable targets, and improved information systems for monitoring the changes in epidemiology, pathology, and host-parasite-vector interactions as a result of intensified control need to be established to bridge the gap between bench, preclinical, clinical, and population-based sciences.

    Funded by: Medical Research Council: G0501670

    PLoS medicine 2011;8;1;e1000399

  • Low-bias, strand-specific transcriptome Illumina sequencing by on-flowcell reverse transcription (FRT-seq).

    Mamanova L and Turner DJ

    The Wellcome Trust Sanger Institute, Cambridge, UK. lm4@sanger.ac.uk

    The unifying feature of second-generation sequencing technologies is that single template strands are amplified clonally onto a solid surface prior to the sequencing reaction. To convert template strands into a compatible state for attachment to this surface, a multistep library preparation is required, which typically culminates in amplification by the PCR. PCR is an inherently biased process, which decreases the efficiency of data acquisition. Flowcell reverse transcription sequencing is a method of transcriptome sequencing for Illumina sequencers in which the reverse transcription reaction is performed on the flowcell by using unamplified, adapter-ligated mRNA as a template. This approach removes PCR biases and duplicates, generates strand-specific paired-end data and is highly reproducible. The procedure can be performed quickly, taking 2 d to generate clusters from mRNA.

    Funded by: Wellcome Trust: WT079643

    Nature protocols 2011;6;11;1736-47

  • APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.

    Mansfeld J, Collin P, Collins MO, Choudhary JS and Pines J

    The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.

    Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z

    Nature cell biology 2011;13;10;1234-43

  • Insertional mutagenesis identifies multiple networks of cooperating genes driving intestinal tumorigenesis.

    March HN, Rust AG, Wright NA, ten Hoeve J, de Ridder J, Eldridge M, van der Weyden L, Berns A, Gadiot J, Uren A, Kemp R, Arends MJ, Wessels LF, Winton DJ and Adams DJ

    Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, UK.

    The evolution of colorectal cancer suggests the involvement of many genes. To identify new drivers of intestinal cancer, we performed insertional mutagenesis using the Sleeping Beauty transposon system in mice carrying germline or somatic Apc mutations. By analyzing common insertion sites (CISs) isolated from 446 tumors, we identified many hundreds of candidate cancer drivers. Comparison to human data sets suggested that 234 CIS-targeted genes are also dysregulated in human colorectal cancers. In addition, we found 183 CIS-containing genes that are candidate Wnt targets and showed that 20 CISs-containing genes are newly discovered modifiers of canonical Wnt signaling. We also identified mutations associated with a subset of tumors containing an expanded number of Paneth cells, a hallmark of deregulated Wnt signaling, and genes associated with more severe dysplasia included those encoding members of the FGF signaling cascade. Some 70 genes had co-occurrence of CIS pairs, clustering into 38 sub-networks that may regulate tumor development.

    Funded by: Cancer Research UK: 13031, A6997; Wellcome Trust

    Nature genetics 2011;43;12;1202-9

  • Introducing the Human Brain Project

    Markram,H., Meier,K., Lippert,T., Grillner,S., Frackowiak,R., Dehaene,S., Knoll,A., Sompolinsky,H., Verstreken,K., DeFelipe,J., GRANT,S., Changeux,J.-P. and Sariam,A.

    Procedia Computer Science 2011;7;39-42

  • Enhanced virulence of Salmonella enterica serovar typhimurium after passage through mice.

    Mastroeni P, Morgan FJ, McKinley TJ, Shawcroft E, Clare S, Maskell DJ and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, United Kingdom.

    The interaction between Salmonella enterica and the host immune system is complex. The outcome of an infection is the result of a balance between the in vivo environment where the bacteria survive and grow and the regulation of fitness genes at a level sufficient for the bacteria to retain their characteristic rate of growth in a given host. Using bacteriological counts from tissue homogenates and fluorescence microscopy to determine the spread, localization, and distribution of S. enterica in the tissues, we show that, during a systemic infection, S. enterica adapts to the in vivo environment. The adaptation becomes a measurable phenotype when bacteria that have resided in a donor animal are introduced into a recipient naïve animal. This adaptation does not confer increased resistance to early host killing mechanisms but can be detected as an enhancement in the bacterial net growth rate later in the infection. The enhanced growth rate is lost upon a single passage in vitro, and it is therefore transient and not due to selection of mutants. The adapted bacteria on average reach higher intracellular numbers in individual infected cells and therefore have patterns of organ spread different from those of nonadapted bacteria. These experiments help in developing an understanding of the influence of passage in a host on the fitness and virulence of S. enterica.

    Funded by: Medical Research Council: G0801161; Wellcome Trust

    Infection and immunity 2011;79;2;636-43

  • Comparative cytogenetic mapping of Sox2 and Sox14 in cichlid fishes and inferences on the genomic organization of both genes in vertebrates.

    Mazzuchelli J, Yang F, Kocher TD and Martins C

    Department of Morphology, Bioscience Institute, UNESP, São Paulo State University, 18618-000 Botucatu, São Paulo, Brazil.

    To better understand the genomic organization and evolution of Sox genes in vertebrates, we cytogenetically mapped Sox2 and Sox14 genes in cichlid fishes and performed comparative analyses of their orthologs in several vertebrate species. The genomic regions neighboring Sox2 and Sox14 have been conserved during vertebrate diversification. Although cichlids seem to have undergone high rates of genomic rearrangements, Sox2 and Sox14 are linked in the same chromosome in the Etroplinae Etroplus maculatus that represents the sister group of all remaining cichlids. However, these genes are located on different chromosomes in several species of the sister group Pseudocrenilabrinae. Similarly, the ancestral synteny of Sox2 and Sox14 has been maintained in several vertebrates, but this synteny has been broken independently in all major groups as a consequence of karyotype rearrangements that took place during the vertebrate evolution.

    Funded by: NICHD NIH HHS: R01 HD058635-04

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2011;19;5;657-67

  • HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans.

    McCormack M, Alfirevic A, Bourgeois S, Farrell JJ, Kasperavičiūtė D, Carrington M, Sills GJ, Marson T, Jia X, de Bakker PI, Chinthapalli K, Molokhia M, Johnson MR, O'Connor GD, Chaila E, Alhusaini S, Shianna KV, Radtke RA, Heinzen EL, Walley N, Pandolfo M, Pichler W, Park BK, Depondt C, Sisodiya SM, Goldstein DB, Deloukas P, Delanty N, Cavalleri GL and Pirmohamed M

    Molecular and Cellular Therapeutics, the Royal College of Surgeons in Ireland, Dublin, Ireland.

    Background: Carbamazepine causes various forms of hypersensitivity reactions, ranging from maculopapular exanthema to severe blistering reactions. The HLA-B*1502 allele has been shown to be strongly correlated with carbamazepine-induced Stevens-Johnson syndrome and toxic epidermal necrolysis (SJS-TEN) in the Han Chinese and other Asian populations but not in European populations.

    Methods: We performed a genomewide association study of samples obtained from 22 subjects with carbamazepine-induced hypersensitivity syndrome, 43 subjects with carbamazepine-induced maculopapular exanthema, and 3987 control subjects, all of European descent. We tested for an association between disease and HLA alleles through proxy single-nucleotide polymorphisms and imputation, confirming associations by high-resolution sequence-based HLA typing. We replicated the associations in samples from 145 subjects with carbamazepine-induced hypersensitivity reactions.

    Results: The HLA-A*3101 allele, which has a prevalence of 2 to 5% in Northern European populations, was significantly associated with the hypersensitivity syndrome (P=3.5×10(-8)). An independent genomewide association study of samples from subjects with maculopapular exanthema also showed an association with the HLA-A*3101 allele (P=1.1×10(-6)). Follow-up genotyping confirmed the variant as a risk factor for the hypersensitivity syndrome (odds ratio, 12.41; 95% confidence interval [CI], 1.27 to 121.03), maculopapular exanthema (odds ratio, 8.33; 95% CI, 3.59 to 19.36), and SJS-TEN (odds ratio, 25.93; 95% CI, 4.93 to 116.18).

    Conclusions: The presence of the HLA-A*3101 allele was associated with carbamazepine-induced hypersensitivity reactions among subjects of Northern European ancestry. The presence of the allele increased the risk from 5.0% to 26.0%, whereas its absence reduced the risk from 5.0% to 3.8%. (Funded by the U.K. Department of Health and others.).

    Funded by: Department of Health; Medical Research Council: G0400126; PHS HHS: HHS-N261200800001E, HHSN261200800001E; Wellcome Trust: 084730

    The New England journal of medicine 2011;364;12;1134-43

  • Genomics and the continuum of cancer care.

    McDermott U, Downing JR and Stratton MR

    Wellcome Trust Sanger Institute,Hinxton, Cambridge, United Kingdom.

    The New England journal of medicine 2011;364;4;340-50

  • Reply: Ileal pouch microbial diversity

    McLaughlin SD, Walker AW, CHURCHER C, CLARK SK, Tekkis PP, Johnson MW, PARKHILL J, CICLITIRA PJ, DOUGAN G, Nicholls RJ, Petrovska L

    Annals of Surgery. 2011;254;669-70

  • Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis.

    Mells GF, Floyd JA, Morley KI, Cordell HJ, Franklin CS, Shin SY, Heneghan MA, Neuberger JM, Donaldson PT, Day DB, Ducker SJ, Muriithi AW, Wheater EF, Hammond CJ, Dawwas MF, UK PBC Consortium, Wellcome Trust Case Control Consortium 3, Jones DE, Peltonen L, Alexander GJ, Sandford RN and Anderson CA

    Academic Department of Medical Genetics, Cambridge University, Cambridge, UK; Department of Hepatology, Cambridge University Hospitals National Health Service (NHS) Foundation Trust, Cambridge, UK.

    In addition to the HLA locus, six genetic risk factors for primary biliary cirrhosis (PBC) have been identified in recent genome-wide association studies (GWAS). To identify additional loci, we carried out a GWAS using 1,840 cases from the UK PBC Consortium and 5,163 UK population controls as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3). We followed up 28 loci in an additional UK cohort of 620 PBC cases and 2,514 population controls. We identified 12 new susceptibility loci (at a genome-wide significance level of P < 5 × 10⁻⁸) and replicated all previously associated loci. We identified three further new loci in a meta-analysis of data from our study and previously published GWAS results. New candidate genes include STAT4, DENND1B, CD80, IL7R, CXCR5, TNFRSF1A, CLEC16A and NFKB1. This study has considerably expanded our knowledge of the genetic architecture of PBC.

    Funded by: Medical Research Council: G0500020, G0800460, G0802068; PHS HHS: 1R01LEY018246; Wellcome Trust: 085925/Z/08/Z, 091745, WT090355/B/09/Z, WT09355A/09/Z, WT91745/Z/10/Z

    Nature genetics 2011;43;4;329-32

  • Hostility in adolescents and adults: a genome-wide association study of the Young Finns.

    Merjonen P, Keltikangas-Järvinen L, Jokela M, Seppälä I, Lyytikäinen LP, Pulkki-Råback L, Kivimäki M, Elovainio M, Kettunen J, Ripatti S, Kähönen M, Viikari J, Palotie A, Peltonen L, Raitakari OT and Lehtimäki T

    IBS, Unit of Personality, Work and Health Psychology, University of Helsinki, Helsinki, Finland.

    Hostility is a multidimensional personality trait with changing expression over the life course. We performed a genome-wide association study (GWAS) of the components of hostility in a population-based sample of Finnish men and women for whom a total of 2.5 million single-nucleotide polymorphisms (SNPs) were available through direct or in silico genotyping. Hostility dimensions (anger, cynicism and paranoia) were assessed at four time points over a 15-year interval (age range 15-30 years at phase 1 and 30-45 years at phase 4) in 982-1780 participants depending on the hostility measure. Few promising areas from chromosome 14 at 99 cM (top SNPs rs3783337, rs7158754, rs3783332, rs2181102, rs7159195, rs11160570, rs941898, P values <3.9 × 10(-8) with nearest gene Enah/Vasp-like (EVL)) were found suggestively to be related to paranoia and from chromosome 7 at 86 cM (top SNPs rs802047, rs802028, rs802030, rs802026, rs802036, rs802025, rs802024, rs802032, rs802049, rs802051, P values <6.9 × 10(-7) with nearest gene CROT (carnitine O-octanoyltransferase)) to cynicism, respectively. Some shared suggestive genetic influence for both paranoia and cynicism was also found from chromosome 17 at 2.8 cM (SNPs rs12936442, rs894664, rs6502671, rs7216028) and chromosome 22 at 43 cM (SNPs rs7510759, rs7510924, rs7290560), with nearest genes RAP1 GTPase activating protein 2 (RAP1GAP2) and KIAA1644, respectively. These suggestive associations did not replicate across all measurement times, which warrants further study on these SNPs in other populations.

    Funded by: NHLBI NIH HHS: R01HL36310; NIA NIH HHS: R01AG013196, R01AG034454

    Translational psychiatry 2011;1;e11

  • The genetic association between personality and major depression or bipolar disorder. A polygenic score analysis using genome-wide association data.

    Middeldorp CM, de Moor MH, McGrath LM, Gordon SD, Blackwood DH, Costa PT, Terracciano A, Krueger RF, de Geus EJ, Nyholt DR, Tanaka T, Esko T, Madden PA, Derringer J, Amin N, Willemsen G, Hottenga JJ, Distel MA, Uda M, Sanna S, Spinhoven P, Hartman CA, Ripke S, Sullivan PF, Realo A, Allik J, Heath AC, Pergadia ML, Agrawal A, Lin P, Grucza RA, Widen E, Cousminer DL, Eriksson JG, Palotie A, Barnett JH, Lee PH, Luciano M, Tenesa A, Davies G, Lopez LM, Hansell NK, Medland SE, Ferrucci L, Schlessinger D, Montgomery GW, Wright MJ, Aulchenko YS, Janssens AC, Oostra BA, Metspalu A, Abecasis GR, Deary IJ, Räikkönen K, Bierut LJ, Martin NG, Wray NR, van Duijn CM, Smoller JW, Penninx BW and Boomsma DI

    Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands. cm.middeldorp@psy.vu.nl

    The relationship between major depressive disorder (MDD) and bipolar disorder (BD) remains controversial. Previous research has reported differences and similarities in risk factors for MDD and BD, such as predisposing personality traits. For example, high neuroticism is related to both disorders, whereas openness to experience is specific for BD. This study examined the genetic association between personality and MDD and BD by applying polygenic scores for neuroticism, extraversion, openness to experience, agreeableness and conscientiousness to both disorders. Polygenic scores reflect the weighted sum of multiple single-nucleotide polymorphism alleles associated with the trait for an individual and were based on a meta-analysis of genome-wide association studies for personality traits including 13,835 subjects. Polygenic scores were tested for MDD in the combined Genetic Association Information Network (GAIN-MDD) and MDD2000+ samples (N=8921) and for BD in the combined Systematic Treatment Enhancement Program for Bipolar Disorder and Wellcome Trust Case-Control Consortium samples (N=6329) using logistic regression analyses. At the phenotypic level, personality dimensions were associated with MDD and BD. Polygenic neuroticism scores were significantly positively associated with MDD, whereas polygenic extraversion scores were significantly positively associated with BD. The explained variance of MDD and BD, ∼0.1%, was highly comparable to the variance explained by the polygenic personality scores in the corresponding personality traits themselves (between 0.1 and 0.4%). This indicates that the proportions of variance explained in mood disorders are at the upper limit of what could have been expected. This study suggests shared genetic risk factors for neuroticism and MDD on the one hand and for extraversion and BD on the other.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; NCI NIH HHS: P01 CA089392, P01CA89392; NHGRI NIH HHS: U01 HG004422, U01 HG004446, U01HG004438; NIA NIH HHS: N01-AG-1-2109; NIAAA NIH HHS: AA07535, AA07580, AA07728, AA10248, AA11998, AA13320, AA13321, AA13326, AA14041, U10 AA008401, U10AA008401; NIDA NIH HHS: DA019951, DA12854, R01 DA013423, R01 DA019963; NIMH NIH HHS: MH66206, R01 MH059160; PHS HHS: HHSN268200782096C, R01 079799; Wellcome Trust: 076113

    Translational psychiatry 2011;1;e50

  • Identifying schizophrenia and other psychoses with psychological scales in the general population.

    Miettunen J, Veijola J, Isohanni M, Paunio T, Freimer N, Jääskeläinen E, Taanila A, Ekelund J, Järvelin MR, Peltonen L, Joukamaa M and Lichtermann D

    Department of Psychiatry, Oulu University and Oulu University Hospital, Oulu, Finland. jouko.miettunen@oulu.fi

    We study the predictive power and associations of several psychopathology and temperament scales with respect to schizophrenia and other psychotic disorders. Measures of psychopathology (Physical and Social Anhedonia Scales, Perceptual Aberration Scale, Hypomanic Personality Scale, Bipolar II Scale, and Schizoidia Scale) and the Temperament and Character Inventory were included in the 31-year follow-up of the prospective Northern Finland 1966 birth cohort (N = 4926). The Perceptual Aberration Scale was the best scale for concurrent validity in psychoses, and also the best psychopathology scale in terms of discriminant validity. Participants scoring high in hypomanic personality were at the highest risk for developing psychosis during the 11-year follow-up. Harm avoidance was a dominant temperament dimension in individuals with psychosis compared with participants without psychiatric diagnoses. These scales are useful as vulnerability markers in studying psychoses.

    Funded by: NIMH NIH HHS: 5R01MH63706:02

    The Journal of nervous and mental disease 2011;199;4;230-8

  • Cop1 constitutively regulates c-Jun protein stability and functions as a tumor suppressor in mice.

    Migliorini D, Bogaerts S, Defever D, Vyas R, Denecker G, Radaelli E, Zwolinska A, Depaepe V, Hochepied T, Skarnes WC and Marine JC

    Laboratory for Molecular Cancer Biology, Department of Molecular and Developmental Genetics, VIB-K.U.Leuven, Leuven, Belgium.

    Biochemical studies have suggested conflicting roles for the E3 ubiquitin ligase constitutive photomorphogenesis protein 1 (Cop1; also known as Rfwd2) in tumorigenesis, providing evidence for both the oncoprotein c-Jun and the tumor suppressor p53 as its targets. Here we present what we believe to be the first in vivo investigation of the role of Cop1 in cancer etiology. Using an innovative genetic approach to generate an allelic series of Cop1, we found that Cop1 hypomorphic mice spontaneously developed malignancy at a high frequency in the first year of life and were highly susceptible to radiation-induced lymphomagenesis. Further analysis revealed that c-Jun was a key physiological target for Cop1 and that Cop1 constitutively kept c-Jun at low levels in vivo and thereby modulated c-Jun/AP-1 transcriptional activity. Importantly, Cop1 deficiency stimulated cell proliferation in a c-Jun-dependent manner. Focal deletions of COP1 were observed at significant frequency across several cancer types, and COP1 loss was determined to be one of the mechanisms leading to c-Jun upregulation in human cancer. We therefore conclude that Cop1 is a tumor suppressor that functions, at least in part, by antagonizing c-Jun oncogenic activity. In the absence of evidence for a genetic interaction between Cop1 and p53, our data strongly argue against the use of Cop1-inhibitory drugs for cancer therapy.

    The Journal of clinical investigation 2011;121;4;1329-43

  • The use of genome-wide eQTL associations in lymphoblastoid cell lines to identify novel genetic pathways involved in complex traits.

    Min JL, Taylor JM, Richards JB, Watts T, Pettersson FH, Broxholme J, Ahmadi KR, Surdulescu GL, Lowy E, Gieger C, Newton-Cheh C, Perola M, Soranzo N, Surakka I, Lindgren CM, Ragoussis J, Morris AP, Cardon LR, Spector TD and Zondervan KT

    Genetic and Genomic Epidemiology Unit, The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom. josine.min@well.ox.ac.uk

    The integrated analysis of genotypic and expression data for association with complex traits could identify novel genetic pathways involved in complex traits. We profiled 19,573 expression probes in Epstein-Barr virus-transformed lymphoblastoid cell lines (LCLs) from 299 twins and correlated these with 44 quantitative traits (QTs). For 939 expressed probes correlating with more than one QT, we investigated the presence of eQTL associations in three datasets of 57 CEU HapMap founders and 86 unrelated twins. Genome-wide association analysis of these probes with 2.2 m SNPs revealed 131 potential eQTLs (1,989 eQTL SNPs) overlapping between the HapMap datasets, five of which were in cis (58 eQTL SNPs). We then tested 535 SNPs tagging the eQTL SNPs, for association with the relevant QT in 2,905 twins. We identified nine potential SNP-QT associations (P<0.01) but none significantly replicated in five large consortia of 1,097-16,129 subjects. We also failed to replicate previous reported eQTL associations with body mass index, plasma low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides levels derived from lymphocytes, adipose and liver tissue. Our results and additional power calculations suggest that proponents may have been overoptimistic in the power of LCLs in eQTL approaches to elucidate regulatory genetic effects on complex traits using the small datasets generated to date. Nevertheless, larger tissue-specific expression data sets relevant to specific traits are becoming available, and should enable the adoption of similar integrated analyses in the near future.

    Funded by: Wellcome Trust: 085235

    PloS one 2011;6;7;e22070

  • Novel chromosomal rearrangements and break points at the t(6;9) in salivary adenoid cystic carcinoma: association with MYB-NFIB chimeric fusion, MYB expression, and clinical outcome.

    Mitani Y, Rao PH, Futreal PA, Roberts DB, Stephens PJ, Zhao YJ, Zhang L, Mitani M, Weber RS, Lippman SM, Caulin C and El-Naggar AK

    Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.

    Objective: To investigate the molecular genetic heterogeneity associated with the t(6:9) in adenoid cystic carcinoma (ACC) and correlate the findings with patient clinical outcome.

    Multimolecular and genetic techniques complemented with massive pair-ended sequencing and single-nucleotide polymorphism array analyses were used on tumor specimens from 30 new and 52 previously analyzed fusion transcript-negative ACCs by reverse transcriptase PCR (RT-PCR). MYB mRNA expression level was determined by quantitative RT-PCR. The results of 102 tumors (30 new and 72 previously reported cases) were correlated with the clinicopathologic factors and patients' survival.

    Results: The FISH analysis showed 34 of 82 (41.5%) fusion-positive tumors and molecular techniques identified fusion transcripts in 21 of the 82 (25.6%) tumors. Detailed FISH analysis of 11 out the 15 tumors with gene fusion without transcript formation showed translocation of NFIB sequences to proximal or distal sites of the MYB gene. Massive pair-end sequencing of a subset of tumors confirmed the proximal translocation to an NFIB sequence and led to the identification of a new fusion gene (NFIB-AIG1) in one of the tumors. Overall, MYB-NFIB gene fusion rate by FISH was in 52.9% whereas fusion transcript forming incidence was 38.2%. Significant statistical association between the 5' MYB transcript expression and patient survival was found.

    Conclusions: We conclude that: (i) t(6;9) results in complex genetic and molecular alterations in ACC, (ii) MYB-NFIB gene fusion may not always be associated with chimeric transcript formation, (iii) noncanonical MYB-NFIB gene fusions occur in a subset of tumors, (iv) high MYB expression correlates with worse patient survival.

    Funded by: NCI NIH HHS: CA-16672, P50 CA097007; NIDCR NIH HHS: U01DE019765; Wellcome Trust: 077012, 077012/Z/05/Z

    Clinical cancer research : an official journal of the American Association for Cancer Research 2011;17;22;7003-14

  • Beta-adrenergic receptor activation rescues theta frequency stimulation-induced LTP deficits in mice expressing C-terminally truncated NMDA receptor GluN2A subunits.

    Moody TD, Watabe AM, Indersmitten T, Komiyama NH, Grant SG and O'Dell TJ

    Interdepartmental PhD Program for Neuroscience, University of California Los Angeles, Los Angeles, California 90024, USA.

    Through protein interactions mediated by their cytoplasmic C termini the GluN2A and GluN2B subunits of NMDA receptors (NMDARs) have a key role in the formation of NMDAR signaling complexes at excitatory synapses. Although these signaling complexes are thought to have a crucial role in NMDAR-dependent forms of synaptic plasticity such as long-term potentiation (LTP), the role of the C terminus of GluN2A in coupling NMDARs to LTP enhancing and/or suppressing signaling pathways is unclear. To address this issue we examined the induction of LTP in the hippocampal CA1 region in mice lacking the C terminus of endogenous GluN2A subunits (GluN2AΔC/ΔC). Our results show that truncation of GluN2A subunits produces robust, but highly frequency-dependent, deficits in LTP and a reduction in basal levels of extracellular signal regulated kinase 2 (ERK2) activation and phosphorylation of AMPA receptor GluA1 subunits at a protein kinase A site (serine 845). Consistent with the notion that these signaling deficits contribute to the deficits in LTP in GluN2AΔC/ΔC mice, activating ERK2 and increasing GluA1 S845 phosphorylation through activation of β-adrenergic receptors rescued the induction of LTP in these mutants. Together, our results indicate that the capacity of excitatory synapses to undergo plasticity in response to different patterns of activity is dependent on the coupling of specific signaling pathways to the intracellular domains of the NMDARs and that abnormal plasticity resulting from mutations in NMDARs can be reduced by activation of key neuromodulatory transmitter receptors that engage converging signaling pathways.

    Funded by: NIMH NIH HHS: MH609197; Wellcome Trust

    Learning & memory (Cold Spring Harbor, N.Y.) 2011;18;2;118-27

  • The origins, evolution, and functional potential of alternative splicing in vertebrates.

    Mudge JM, Frankish A, Fernandez-Banet J, Alioto T, Derrien T, Howald C, Reymond A, Guigó R, Hubbard T and Harrow J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. jm12@sanger.ac.uk

    Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution.

    Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT077198/Z/05/Z

    Molecular biology and evolution 2011;28;10;2949-59

  • Sequencing skippy: the genome sequence of an Australian kangaroo, Macropus eugenii.

    Murchison EP and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Sequencing of the tammar wallaby (Macropus eugenii) reveals insights into genome evolution, and mammalian reproduction and development.

    Genome biology 2011;12;8;123

  • Obstetric and perinatal outcomes in pregnancies complicated by Type 1 and Type 2 diabetes: influences of glycaemic control, obesity and social disadvantage.

    Murphy HR, Steel SA, Roland JM, Morris D, Ball V, Campbell PJ, Temple RC and East Anglia Study Group for Improving Pregnancy Outcomes in Women with Diabetes (EASIPOD)

    Metabolic Research Laboratories, Institute of Metabolic Science, University of Cambridge, Cambridge, UK. hm386@medschl.cam.ac.uk

    Aims: To compare obstetric and perinatal outcomes in women with Type 1 and Type 2 diabetes and relate these to maternal risk factors.

    Methods: Prospective cohort study of 682 consecutive diabetic pregnancies in East Anglia during 2006-2009. Relationships between congenital malformation, perinatal mortality and perinatal morbidity (large for gestational age, preterm delivery, neonatal care) with maternal age, parity, ethnicity, glycaemic control, obesity and social disadvantage were examined using bivariable and multivariate models.

    Results: There were 408 (59.8%) Type 1 and 274 (40.2%) Type 2 diabetes pregnancies. Women with Type 2 diabetes were older (P < 0.001), heavier (P < 0.0001), more frequently multiparous (P < 0.001), more ethnically diverse (p < 0.0001) and more socially disadvantaged (P = 0.0004). Although women with Type 2 diabetes had shorter duration of diabetes (P < 0.0001) and better pre-conception glycaemic control [HbA(1c) 52 mmol/mol (6.9%) Type 2 diabetes vs. 63 mmol/l (7.9%) Type 1 diabetes; p < 0.0001), rates of congenital malformation and perinatal mortality were comparable. Women with Type 2 diabetes had fewer large-for-gestational-age infants (37.6 vs. 52.9%, P < 0.0008), fewer preterm deliveries (17.5 vs. 37.1%, P < 0.0001) and their offspring had fewer neonatal care admissions (29.8 vs. 43.2%, P = 0.001). Third trimester HbA(1c) (OR 1.35, 95% CI 1.09-1.67, P = 0.006) and social disadvantage (OR 0.80, 95% CI 0.67-0.98; P = 0.03) were risk factors for large for gestational age.

    Conclusions: Despite increased age, parity, obesity and social disadvantage, women with Type 2 diabetes had better glycaemic control, fewer large-for-gestational-age infants, fewer preterm deliveries and fewer neonatal care admissions. Better tools are needed to improve glycaemic control and reduce the rates of large for gestational age, particularly in Type 1 diabetes.

    Funded by: Wellcome Trust: 093867

    Diabetic medicine : a journal of the British Diabetic Association 2011;28;9;1060-7

  • Evidence for several waves of global transmission in the seventh cholera pandemic.

    Mutreja A, Kim DW, Thomson NR, Connor TR, Lee JH, Kariuki S, Croucher NJ, Choi SY, Harris SR, Lebens M, Niyogi SK, Kim EJ, Ramamurthy T, Chun J, Wood JL, Clemens JD, Czerkinsky C, Nair GB, Holmgren J, Parkhill J and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Vibrio cholerae is a globally important pathogen that is endemic in many areas of the world and causes 3-5 million reported cases of cholera every year. Historically, there have been seven acknowledged cholera pandemics; recent outbreaks in Zimbabwe and Haiti are included in the seventh and ongoing pandemic. Only isolates in serogroup O1 (consisting of two biotypes known as 'classical' and 'El Tor') and the derivative O139 can cause epidemic cholera. It is believed that the first six cholera pandemics were caused by the classical biotype, but El Tor has subsequently spread globally and replaced the classical biotype in the current pandemic. Detailed molecular epidemiological mapping of cholera has been compromised by a reliance on sub-genomic regions such as mobile elements to infer relationships, making El Tor isolates associated with the seventh pandemic seem superficially diverse. To understand the underlying phylogeny of the lineage responsible for the current pandemic, we identified high-resolution markers (single nucleotide polymorphisms; SNPs) in 154 whole-genome sequences of globally and temporally representative V. cholerae isolates. Using this phylogeny, we show here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and identify several transcontinental transmission events. Additionally, we show how the acquisition of the SXT family of antibiotic resistance elements has shaped pandemic spread, and show that this family was first acquired at least ten years before its discovery in V. cholerae.

    Funded by: Wellcome Trust: 076962, 076964

    Nature 2011;477;7365;462-5

  • Candidate gene association study of magnetic resonance imaging-based hip osteoarthritis (OA): evidence for COL9A2 gene as a common predisposing factor for hip OA and lumbar disc degeneration.

    Näkki A, Videman T, Kujala UM, Suhonen M, Männikkö M, Peltonen L, Battié MC, Kaprio J and Saarela J

    Institute for Molecular Medicine Finland (FIMM), Biomedicum Helsinki 2U, University of Helsinki, Helsinki, Finland.

    Objective: To study whether gene variants associated with lumbar disc degeneration (LDD) phenotypes are also associated with hip osteoarthritis (OA).

    Methods: Magnetic resonance imaging (MRI)-based hip OA changes for 345 twins were assessed and 99 single-nucleotide polymorphisms (SNP) were analyzed.

    Results: Variants in the COL9A2 (rs7533552, p = 0.0025) and COL10A1 (rs568725, p = 0.002) genes showed association with hip OA.

    Conclusion: The associating G allele in COL9A2 changes a glutamine to arginine or to tryptophan and may predispose to both hip OA and LDD, making it a candidate for degenerative connective tissue diseases.

    The Journal of rheumatology 2011;38;4;747-52

  • Activation of K-RAS by co-mutation of codons 19 and 20 is transforming.

    Naguib A, Wilson CH, Adams DJ and Arends MJ

    Department of Pathology, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK, CB2 0QQ, UK. mja40@cam.ac.uk.

    The K-RAS oncogene is widely mutated in human cancers. Activating mutations in K-RAS give rise to constitutive signalling through the MAPK/ERK and PI3K/AKT pathways promoting increased cell division, reduced apoptosis and transformation. The majority of activating mutations in K-RAS are located in codons 12 and 13. In a human colorectal cancer we identified a novel K-RAS co-mutation that altered codons 19 and 20 resulting in transitions at both codons (L19F/T20A) in the same allele. Using focus forming transformation assays in vitro , we showed that co-mutation of L19F/T20A in K-RAS demonstrated intermediate transforming ability that was greater than that of individual L19F and T20A mutants, but less than that of G12D and G12V K-RAS mutants. This demonstrated the synergistic effects of co-mutation of codons 19 and 20 and illustrated that co-mutation of these codons is functionally significant.

    Journal of molecular signaling 2011;6;2

  • Multiple loci are associated with white blood cell phenotypes.

    Nalls MA, Couper DJ, Tanaka T, van Rooij FJ, Chen MH, Smith AV, Toniolo D, Zakai NA, Yang Q, Greinacher A, Wood AR, Garcia M, Gasparini P, Liu Y, Lumley T, Folsom AR, Reiner AP, Gieger C, Lagou V, Felix JF, Völzke H, Gouskova NA, Biffi A, Döring A, Völker U, Chong S, Wiggins KL, Rendon A, Dehghan A, Moore M, Taylor K, Wilson JG, Lettre G, Hofman A, Bis JC, Pirastu N, Fox CS, Meisinger C, Sambrook J, Arepalli S, Nauck M, Prokisch H, Stephens J, Glazer NL, Cupples LA, Okada Y, Takahashi A, Kamatani Y, Matsuda K, Tsunoda T, Tanaka T, Kubo M, Nakamura Y, Yamamoto K, Kamatani N, Stumvoll M, Tönjes A, Prokopenko I, Illig T, Patel KV, Garner SF, Kuhnel B, Mangino M, Oostra BA, Thein SL, Coresh J, Wichmann HE, Menzel S, Lin J, Pistis G, Uitterlinden AG, Spector TD, Teumer A, Eiriksdottir G, Gudnason V, Bandinelli S, Frayling TM, Chakravarti A, van Duijn CM, Melzer D, Ouwehand WH, Levy D, Boerwinkle E, Singleton AB, Hernandez DG, Longo DL, Soranzo N, Witteman JC, Psaty BM, Ferrucci L, Harris TB, O'Donnell CJ and Ganesh SK

    Laboratory of Neurogenetics, Intramural Research Program, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, United States of America. nallsm@mail.nih.gov

    White blood cell (WBC) count is a common clinical measure from complete blood count assays, and it varies widely among healthy individuals. Total WBC count and its constituent subtypes have been shown to be moderately heritable, with the heritability estimates varying across cell types. We studied 19,509 subjects from seven cohorts in a discovery analysis, and 11,823 subjects from ten cohorts for replication analyses, to determine genetic factors influencing variability within the normal hematological range for total WBC count and five WBC subtype measures. Cohort specific data was supplied by the CHARGE, HeamGen, and INGI consortia, as well as independent collaborative studies. We identified and replicated ten associations with total WBC count and five WBC subtypes at seven different genomic loci (total WBC count-6p21 in the HLA region, 17q21 near ORMDL3, and CSF3; neutrophil count-17q21; basophil count- 3p21 near RPN1 and C3orf27; lymphocyte count-6p21, 19p13 at EPS15L1; monocyte count-2q31 at ITGA4, 3q21, 8q24 an intergenic region, 9q31 near EDG2), including three previously reported associations and seven novel associations. To investigate functional relationships among variants contributing to variability in the six WBC traits, we utilized gene expression- and pathways-based analyses. We implemented gene-clustering algorithms to evaluate functional connectivity among implicated loci and showed functional relationships across cell types. Gene expression data from whole blood was utilized to show that significant biological consequences can be extracted from our genome-wide analyses, with effect estimates for significant loci from the meta-analyses being highly corellated with the proximal gene expression. In addition, collaborative efforts between the groups contributing to this study and related studies conducted by the COGENT and RIKEN groups allowed for the examination of effect homogeneity for genome-wide significant associations across populations of diverse ancestral backgrounds.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; NCRR NIH HHS: M01-RR000052, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL085251, HL087652, N01-HC-05187, N01-HC-25195, N01-HC-45134, N01-HC-45204, N01-HC-45205, N01-HC-48047, N01-HC-48048, N01-HC-48049, N01-HC-48050, N01-HC-55015, N01-HC-55016, N01-HC-55017, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-65226, N01-HC-95095, N01-HC-95100, N01-HC-95170, N01-HC-95171, N01-HC-95172, N02-HL-6-4278, R01 HL-071862, R01 HL087698-01, R01HL086694, R01HL087641, R01HL59367, R01HL86694, U01 HL72518; NIA NIH HHS: 1R01AG032098-01A1, AG000932-02, N01-AG-12100, N01-AG-821336, N01-AG-916413, N01AG62101, N01AG62103, N01AG62106, R01 AG032098-04, R01 AG24233-01; NINR NIH HHS: R01 NR08153; PHS HHS: 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32, 44221, HHSN268200625226C, HHSN268200782096C; WHI NIH HHS: N01WH22110; Wellcome Trust: 076113/C/04/Z, 091746/Z/10/Z

    PLoS genetics 2011;7;6;e1002113

  • High incidence of recurrent copy number variants in patients with isolated and syndromic Müllerian aplasia.

    Nik-Zainal S, Strick R, Storer M, Huang N, Rad R, Willatt L, Fitzgerald T, Martin V, Sandford R, Carter NP, Janecke AR, Renner SP, Oppelt PG, Oppelt P, Schulze C, Brucker S, Hurles M, Beckmann MW, Strissel PL and Shaw-Smith C

    Department of Obstetrics and Gynecology, University-Clinic Erlangen, Erlangen, Germany.

    Background: Congenital malformations involving the Müllerian ducts are observed in around 5% of infertile women. Complete aplasia of the uterus, cervix, and upper vagina, also termed Müllerian aplasia or Mayer-Rokitansky-Kuster-Hauser (MRKH) syndrome, occurs with an incidence of around 1 in 4500 female births, and occurs in both isolated and syndromic forms. Previous reports have suggested that a proportion of cases, especially syndromic cases, are caused by variation in copy number at different genomic loci.

    Methods: In order to obtain an overview of the contribution of copy number variation to both isolated and syndromic forms of Müllerian aplasia, copy number assays were performed in a series of 63 cases, of which 25 were syndromic and 38 isolated.

    Results: A high incidence (9/63, 14%) of recurrent copy number variants in this cohort is reported here. These comprised four cases of microdeletion at 16p11.2, an autism susceptibility locus not previously associated with Müllerian aplasia, four cases of microdeletion at 17q12, and one case of a distal 22q11.2 microdeletion. Microdeletions at 16p11.2 and 17q12 were found in 4/38 (10.5%) cases with isolated Müllerian aplasia, and at 16p11.2, 17q12 and 22q11.2 (distal) in 5/25 cases (20%) with syndromic Müllerian aplasia.

    Conclusion: The finding of microdeletion at 16p11.2 in 2/38 (5%) of isolated and 2/25 (8%) of syndromic cases suggests a significant contribution of this copy number variant alone to the pathogenesis of Müllerian aplasia. Overall, the high incidence of recurrent copy number variants in all forms of Müllerian aplasia has implications for the understanding of the aetiopathogenesis of the condition, and for genetic counselling in families affected by it.

    Funded by: Wellcome Trust: 077008, 077014, 079973

    Journal of medical genetics 2011;48;3;197-204

  • Mechanisms mediating brain and cognitive reserve: experience-dependent neuroprotection and functional compensation in animal models of neurodegenerative diseases.

    Nithianantharajah J and Hannan AJ

    Wellcome Trust Sanger Institute, Cambridge, UK. jn3@sanger.ac.uk

    'Brain and cognitive reserve' (BCR) refers here to the accumulated neuroprotective reserve and capacity for functional compensation induced by the chronic enhancement of mental and physical activity. BCR is thought to protect against, and compensate for, a range of different neurodegenerative diseases, as well as other neurological and psychiatric disorders. In this review we will discuss BCR, and its potential mechanisms, in neurodegenerative disorders, with a focus on Huntington's disease (HD) and Alzheimer's disease (AD). Epidemiological studies of AD, and other forms of dementia, provided early evidence for BCR. The first evidence for the beneficial effects of enhanced mental and physical activity, and associated mechanistic insights, in an animal model of neurodegenerative disease was provided by experiments using HD transgenic mice. More recently, experiments on animal models of HD, AD and various other brain disorders have suggested potential molecular and cellular mechanisms underpinning BCR. We propose that sophisticated insight into the processes underlying BCR, and identification of key molecules mediating these beneficial effects, will pave the way for therapeutic advances targeting these currently incurable neurodegenerative diseases.

    Progress in neuro-psychopharmacology & biological psychiatry 2011;35;2;331-9

  • Impact of temperament on depression and anxiety symptoms and depressive disorder in a population-based birth cohort.

    Nyman E, Miettunen J, Freimer N, Joukamaa M, Mäki P, Ekelund J, Peltonen L, Järvelin MR, Veijola J and Paunio T

    Public Health Genomics Unit, Institute for Molecular Medicine Finland FIMM, University of Helsinki and National Institute for Health and Welfare, Helsinki, Finland.

    Background: The aim of this study was to characterize at the population level how innate features of temperament relate to experience of depressive mood and anxiety, and whether these symptoms have separable temperamental backgrounds.

    Methods: The study subjects were 4773 members of the population-based Northern Finland Birth Cohort 1966, a culturally and genetically homogeneous study sample. Temperament was measured at age 31 using the temperament items of the Temperament and Character Inventory and a separate Pessimism score. Depressive mood was assessed based on a previous diagnosis of depressive disorder or symptoms of depression according to the Hopkins Symptom Check List - 25. Anxiety was assessed analogously.

    Results: High levels of Harm avoidance and Pessimism were related to both depressive mood (effect sizes; d=0.84 and d=1.25, respectively) and depressive disorder (d=0.68 and d=0.68, respectively). Of the dimensions of Harm avoidance, Anticipatory worry and Fatigability had the strongest effects. Symptoms of depression and anxiety showed very similar underlying temperament patterns.

    Limitations: Although Harm avoidance and Pessimism appear to be important endophenotype candidates for depression and anxiety, their potential usefulness as endophenotypes, and whether they meet all the suggested criteria for endophenotypes will remain to be confirmed in future studies.

    Conclusions: Personality characteristics of Pessimism and Harm avoidance, in particular its dimensions Anticipatory worry and Fatigability, are strongly related to symptoms of depression and anxiety as well as to depressive disorder in this population. These temperamental features may be used as dimensional susceptibility factors in etiological studies of depression, which may aid in the development of improved clinical practice.

    Journal of affective disorders 2011;131;1-3;393-7

  • BioMart as an integration solution for the International Knockout Mouse Consortium.

    Oakley DJ, Iyer V, Skarnes WC and Smedley D

    The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH.

    In this article, we describe the use of the BioMart data management system to provide integrated access to International Knockout Mouse Consortium (IKMC) data and other related mouse resources. The IKMC is currently mutating all mouse protein-coding genes in embryonic stem (ES) cells using gene targeting and gene trapping approaches. The BioMart portal allows researchers to identify and obtain IKMC knockout vectors, ES cells and mice for genes of interest. Gene annotation, expression, phenotype and disease data is also integrated from external BioMarts, allowing selection of IKMC products by a wide variety of criteria. These products are invaluable for researchers involved in the elucidation of gene function and the role of individual genes in human disease. Here, we describe these datasets in more detail and illustrate the functionality of the portal using several examples.

    Database : the journal of biological databases and curation 2011;2011;bar028

  • A comprehensive evaluation of potential lung function associated genes in the SpiroMeta general population sample.

    Obeidat M, Wain LV, Shrine N, Kalsheker N, Soler Artigas M, Repapi E, Burton PR, Johnson T, Ramasamy A, Zhao JH, Zhai G, Huffman JE, Vitart V, Albrecht E, Igl W, Hartikainen AL, Pouta A, Cadby G, Hui J, Palmer LJ, Hadley D, McArdle WL, Rudnicka AR, Barroso I, Loos RJ, Wareham NJ, Mangino M, Soranzo N, Spector TD, Gläser S, Homuth G, Völzke H, Deloukas P, Granell R, Henderson J, Grkovic I, Jankovic S, Zgaga L, Polašek O, Rudan I, Wright AF, Campbell H, Wild SH, Wilson JF, Heinrich J, Imboden M, Probst-Hensch NM, Gyllensten U, Johansson Å, Zaboli G, Mustelin L, Rantanen T, Surakka I, Kaprio J, Jarvelin MR, Hayward C, Evans DM, Koch B, Musk AW, Elliott P, Strachan DP, Tobin MD, Sayers I, Hall IP and SpiroMeta Consortium

    Nottingham Respiratory Biomedical Research Unit, Division of Therapeutics and Molecular Medicine, University Hospital of Nottingham, Nottingham, United Kingdom.

    Rationale: Lung function measures are heritable traits that predict population morbidity and mortality and are essential for the diagnosis of chronic obstructive pulmonary disease (COPD). Variations in many genes have been reported to affect these traits, but attempts at replication have provided conflicting results. Recently, we undertook a meta-analysis of Genome Wide Association Study (GWAS) results for lung function measures in 20,288 individuals from the general population (the SpiroMeta consortium).

    Objectives: To comprehensively analyse previously reported genetic associations with lung function measures, and to investigate whether single nucleotide polymorphisms (SNPs) in these genomic regions are associated with lung function in a large population sample.

    Methods: We analysed association for SNPs tagging 130 genes and 48 intergenic regions (+/-10 kb), after conducting a systematic review of the literature in the PubMed database for genetic association studies reporting lung function associations.

    Results: The analysis included 16,936 genotyped and imputed SNPs. No loci showed overall significant association for FEV(1) or FEV(1)/FVC traits using a carefully defined significance threshold of 1.3×10(-5). The most significant loci associated with FEV(1) include SNPs tagging MACROD2 (P = 6.81×10(-5)), CNTN5 (P = 4.37×10(-4)), and TRPV4 (P = 1.58×10(-3)). Among ever-smokers, SERPINA1 showed the most significant association with FEV(1) (P = 8.41×10(-5)), followed by PDE4D (P = 1.22×10(-4)). The strongest association with FEV(1)/FVC ratio was observed with ABCC1 (P = 4.38×10(-4)), and ESR1 (P = 5.42×10(-4)) among ever-smokers.

    Conclusions: Polymorphisms spanning previously associated lung function genes did not show strong evidence for association with lung function measures in the SpiroMeta consortium population. Common SERPINA1 polymorphisms may affect FEV(1) among smokers in the general population.

    Funded by: Cancer Research UK; Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0401540, G0600705, G0701863, G0800582, G0801056, G0902125, G0902313, G9815508, G990146, MC_QA137934, MC_U106179471, MC_U106188470; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895, 092731

    PloS one 2011;6;5;e19382

  • Targeted disruption of py235ebp-1: invasion of erythrocytes by Plasmodium yoelii using an alternative Py235 erythrocyte binding protein.

    Ogun SA, Tewari R, Otto TD, Howell SA, Knuepfer E, Cunningham DA, Xu Z, Pain A and Holder AA

    Division of Parasitology, MRC National Institute for Medical Research, London, UK. sogun@nimr.mrc.ac.uk

    Plasmodium yoelii YM asexual blood stage parasites express multiple members of the py235 gene family, part of the super-family of genes including those coding for Plasmodium vivax reticulocyte binding proteins and Plasmodium falciparum RH proteins. We previously identified a Py235 erythrocyte binding protein (Py235EBP-1, encoded by the PY01365 gene) that is recognized by protective mAb 25.77. Proteins recognized by a second protective mAb 25.37 have been identified by mass spectrometry and are encoded by two genes, PY01185 and PY05995/PY03534. We deleted the PY01365 gene and examined the phenotype. The expression of the members of the py235 family in both the WT and gene deletion parasites was measured by quantitative RT-PCR and RNA-Seq. py235ebp-1 expression was undetectable in the knockout parasite, but transcription of other members of the family was essentially unaffected. The knockout parasites continued to react with mAb 25.77; and the 25.77-binding proteins in these parasites were the PY01185 and PY05995/PY03534 products. The PY01185 product was also identified as erythrocyte binding. There was no clear change in erythrocyte invasion profile suggesting that the PY01185 gene product (designated PY235EBP-2) is able to fulfill the role of EBP-1 by serving as an invasion ligand although the molecular details of its interaction with erythrocytes have not been examined. The PY01365, PY01185, and PY05995/PY03534 genes are part of a distinct subset of the py235 family. In P. falciparum, the RH protein genes are under epigenetic control and expression correlates with binding to distinct erythrocyte receptors and specific invasion pathways, whereas in P. yoelii YM all the genes are expressed and deletion of one does not result in upregulation of another. We propose that simultaneous expression of multiple Py235 ligands enables invasion of a wide range of host erythrocytes even in the presence of antibodies to one or more of the proteins and that this functional redundancy at the protein level gives the parasite phenotypic plasticity in the absence of differences in gene expression.

    Funded by: Medical Research Council: U117532067; Wellcome Trust

    PLoS pathogens 2011;7;2;e1001288

  • Genetic variants and blood pressure in a population-based cohort: the Cardiovascular Risk in Young Finns study.

    Oikonen M, Tikkanen E, Juhola J, Tuovinen T, Seppälä I, Juonala M, Taittonen L, Mikkilä V, Kähönen M, Ripatti S, Viikari J, Lehtimäki T, Havulinna AS, Kee F, Newton-Cheh C, Peltonen L, Schork NJ, Murray SS, Berenson GS, Chen W, Srinivasan SR, Salomaa V and Raitakari OT

    Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, PO Box 52, FI-20521 Turku, Finland. mervi.oikonen@utu.fi

    Clinical relevance of a genetic predisposition to elevated blood pressure was quantified during the transition from childhood to adulthood in a population-based Finnish cohort (N=2357). Blood pressure was measured at baseline in 1980 (age 3-18 years) and in follow-ups in 1983, 1986, 2001, and 2007. Thirteen single nucleotide polymorphisms associated with blood pressure were genotyped, and 3 genetic risk scores associated with systolic and diastolic blood pressures and their combination were derived for all of the participants. Effects of the genetic risk score were 0.47 mm Hg for systolic and 0.53 mm Hg for diastolic blood pressures (both P<0.01). The combination genetic risk score was associated with diastolic blood pressure from age 9 years onward (β=0.68 mm Hg; P=0.015). Replications in 1194 participants of the Bogalusa Heart Study showed essentially similar results. The participants in the highest quintile of the combination genetic risk score had a 1.82-fold risk of hypertension in adulthood (P<0.0001) compared with the lowest quintile, independent of a family history of premature hypertension. These findings show that genetic variants are associated with preclinical blood pressure traits in childhood; individuals with several susceptibility alleles have, on average, a 0.5-mm Hg higher blood pressure, and this trajectory continues from childhood to adulthood.

    Funded by: NIA NIH HHS: AG-16592; NICHD NIH HHS: HD-061437, HD-062783; Wellcome Trust

    Hypertension 2011;58;6;1079-85

  • Genotypic and phenotypic modifications of Neisseria meningitidis after an accidental human passage.

    Omer H, Rose G, Jolley KA, Frapy E, Zahar JR, Maiden MC, Bentley SD, Tinsley CR, Nassif X and Bille E

    INSERM U1002, Paris, France.

    A scientist in our laboratory was accidentally infected while working with Z5463, a Neisseria meningitidis serogroup A strain. She developed severe symptoms (fever, meningism, purpuric lesions) that fortunately evolved with antibiotic treatment to complete recovery. Pulse-field gel electrophoresis confirmed that the isolate obtained from the blood culture (Z5463BC) was identical to Z5463, more precisely to a fourth subculture of this strain used the week before the contamination (Z5463PI). In order to get some insights into genomic modifications that can occur in vivo, we sequenced these three isolates. All the strains contained a mutated mutS allele and therefore displayed an hypermutator phenotype, consistent with the high number of mutations (SNP, Single Nucleotide Polymorphism) detected in the three strains. By comparing the number of SNP in all three isolates and knowing the number of passages between Z5463 and Z5463PI, we concluded that around 25 bacterial divisions occurred in the human body. As expected, the in vivo passage is responsible for several modifications of phase variable genes. This genomic study has been completed by transcriptomic and phenotypic studies, showing that the blood strain used a different haemoglobin-linked iron receptor (HpuA/B) than the parental strains (HmbR). Different pilin variants were found after the in vivo passage, which expressed different properties of adhesion. Furthermore the deletion of one gene involved in LOS biosynthesis (lgtB) results in Z5463BC expressing a different LOS than the L9 immunotype of Z2491. The in vivo passage, despite the small numbers of divisions, permits the selection of numerous genomic modifications that may account for the high capacity of the strain to disseminate.

    Funded by: Wellcome Trust: 087622

    PloS one 2011;6;2;e17145

  • Analysis of mitosis and antimitotic drug responses in tumors by in vivo microscopy and single-cell pharmacodynamics.

    Orth JD, Kohler RH, Foijer F, Sorger PK, Weissleder R and Mitchison TJ

    Department of Systems Biology, Harvard Medical School, Center for Systems Biology, Massachusetts General Hospital, Boston, Massachusetts 02115, USA.

    Cancer relies upon frequent or abnormal cell division, but how the tumor microenvironment affects mitotic processes in vivo remains unclear, largely due to the technical challenges of optical access, spatial resolution, and motion. We developed high-resolution in vivo microscopy methods to visualize mitosis in a murine xenograft model of human cancer. Using these methods, we determined whether the single-cell response to the antimitotic drug paclitaxel (Ptx) was the same in tumors as in cell culture, observed the impact of Ptx on the tumor response as a whole, and evaluated the single-cell pharmacodynamics (PD) of Ptx (by in vivo PD microscopy). Mitotic initiation was generally less frequent in tumors than in cell culture, but subsequently it proceeded normally. Ptx treatment caused spindle assembly defects and mitotic arrest, followed by slippage from mitotic arrest, multinucleation, and apoptosis. Compared with cell culture, the peak mitotic index in tumors exposed to Ptx was lower and the tumor cells survived longer after mitotic arrest, becoming multinucleated rather than dying directly from mitotic arrest. Thus, the tumor microenvironment was much less proapoptotic than cell culture. The morphologies associated with mitotic arrest were dose and time dependent, thereby providing a semiquantitative, single-cell measure of PD. Although many tumor cells did not progress through Ptx-induced mitotic arrest, tumor significantly regressed in the model. Our findings show that in vivo microscopy offers a useful tool to visualize mitosis during tumor progression, drug responses, and cell fate at the single-cell level.

    Funded by: NCI NIH HHS: CA084179, CA092782, CA139980, CA86355, P01 CA139980-01A1, R01 CA084179-11; NIGMS NIH HHS: R01 GM039565-25

    Cancer research 2011;71;13;4608-16

  • Real-time sequencing.

    Otto TD

    Nature reviews. Microbiology 2011;9;9;633

  • RATT: Rapid Annotation Transfer Tool.

    Otto TD, Dillon GP, Degrave WS and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Nucleic acids research 2011;39;9;e57

  • Coordinating cell cycle progression via cyclin specificity.

    Pagliuca FW, Collins MO and Choudhary JS

    Cell cycle (Georgetown, Tex.) 2011;10;24;4195-6

  • Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.

    Pagliuca FW, Collins MO, Lichawska A, Zegerman P, Choudhary JS and Pines J

    The Gurdon Institute, University of Cambridge, Cambridge, UK.

    Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.

    Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z

    Molecular cell 2011;43;3;406-17

  • Genome-wide association study identifies a locus at 7p15.2 associated with endometriosis.

    Painter JN, Anderson CA, Nyholt DR, Macgregor S, Lin J, Lee SH, Lambert A, Zhao ZZ, Roseman F, Guo Q, Gordon SD, Wallace L, Henders AK, Visscher PM, Kraft P, Martin NG, Morris AP, Treloar SA, Kennedy SH, Missmer SA, Montgomery GW and Zondervan KT

    Molecular Epidemiology, Queensland Institute of Medical Research, Herston, Queensland, Australia. jodie.painter@qimr.edu.au

    Endometriosis is a common gynecological disease associated with pelvic pain and subfertility. We conducted a genome-wide association study (GWAS) in 3,194 individuals with surgically confirmed endometriosis (cases) and 7,060 controls from Australia and the UK. Polygenic predictive modeling showed significantly increased genetic loading among 1,364 cases with moderate to severe endometriosis. The strongest association signal was on 7p15.2 (rs12700667) for 'all' endometriosis (P = 2.6 × 10⁻⁷, odds ratio (OR) = 1.22, 95% CI 1.13-1.32) and for moderate to severe disease (P = 1.5 × 10⁻⁹, OR = 1.38, 95% CI 1.24-1.53). We replicated rs12700667 in an independent cohort from the United States of 2,392 self-reported, surgically confirmed endometriosis cases and 2,271 controls (P = 1.2 × 10⁻³, OR = 1.17, 95% CI 1.06-1.28), resulting in a genome-wide significant P value of 1.4 × 10⁻⁹ (OR = 1.20, 95% CI 1.13-1.27) for 'all' endometriosis in our combined datasets of 5,586 cases and 9,331 controls. rs12700667 is located in an intergenic region upstream of the plausible candidate genes NFE2L3 and HOXA10.

    Funded by: Howard Hughes Medical Institute; NCI NIH HHS: P01 CA087969, R01 CA049449, R01 CA050385, R01 CA067262, U01 CA098233; NICHD NIH HHS: R01 HD052473, R01 HD057210; NIDDK NIH HHS: P01 DK070756; Wellcome Trust: 064890, 081682, 084766, 085235, WT084766/Z/08/Z, WT085235/Z/08/Z, WT91745/Z/10/Z

    Nature genetics 2011;43;1;51-4

  • Identity-by-descent-based phasing and imputation in founder populations using graphical models.

    Palin K, Campbell H, Wright AF, Wilson JF and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate.

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: MC_U127561128; Wellcome Trust: 076113, 077192, 085475, WT077192

    Genetic epidemiology 2011;35;8;853-60

  • Insights into the genetic architecture of osteoarthritis from stage 1 of the arcOGEN study.

    Panoutsopoulou K, Southam L, Elliott KS, Wrayner N, Zhai G, Beazley C, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Doherty M, McCaskie A, Ollier WE, Ralston SH, Spector TD, Valdes AM, Wallis GA, Wilkinson JM, Arden E, Battley K, Blackburn H, Blanco FJ, Bumpstead S, Cupples LA, Day-Williams AG, Dixon K, Doherty SA, Esko T, Evangelou E, Felson D, Gomez-Reino JJ, Gonzalez A, Gordon A, Gwilliam R, Halldorsson BV, Hauksson VB, Hofman A, Hunt SE, Ioannidis JP, Ingvarsson T, Jonsdottir I, Jonsson H, Keen R, Kerkhof HJ, Kloppenburg MG, Koller N, Lakenberg N, Lane NE, Lee AT, Metspalu A, Meulenbelt I, Nevitt MC, O'Neill F, Parimi N, Potter SC, Rego-Perez I, Riancho JA, Sherburn K, Slagboom PE, Stefansson K, Styrkarsdottir U, Sumillera M, Swift D, Thorsteinsdottir U, Tsezou A, Uitterlinden AG, van Meurs JB, Watkins B, Wheeler M, Mitchell S, Zhu Y, Zmuda JM, arcOGEN Consortium, Zeggini E and Loughlin J

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Objectives: The genetic aetiology of osteoarthritis has not yet been elucidated. To enable a well-powered genome-wide association study (GWAS) for osteoarthritis, the authors have formed the arcOGEN Consortium, a UK-wide collaborative effort aiming to scan genome-wide over 7500 osteoarthritis cases in a two-stage genome-wide association scan. Here the authors report the findings of the stage 1 interim analysis.

    Methods: The authors have performed a genome-wide association scan for knee and hip osteoarthritis in 3177 cases and 4894 population-based controls from the UK. Replication of promising signals was carried out in silico in five further scans (44,449 individuals), and de novo in 14 534 independent samples, all of European descent.

    Results: None of the association signals the authors identified reach genome-wide levels of statistical significance, therefore stressing the need for corroboration in sample sets of a larger size. Application of analytical approaches to examine the allelic architecture of disease to the stage 1 genome-wide association scan data suggests that osteoarthritis is a highly polygenic disease with multiple risk variants conferring small effects.

    Conclusions: Identifying loci conferring susceptibility to osteoarthritis will require large-scale sample sizes and well-defined phenotypes to minimise heterogeneity.

    Funded by: Arthritis Research UK: 17489; Medical Research Council: G0901461; NIAMS NIH HHS: K24 AR048841, R01 AR052000

    Annals of the rheumatic diseases 2011;70;5;864-7

  • Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts.

    Papaemmanuil E, Cazzola M, Boultwood J, Malcovati L, Vyas P, Bowen D, Pellagatti A, Wainscoat JS, Hellstrom-Lindberg E, Gambacorti-Passerini C, Godfrey AL, Rapado I, Cvejic A, Rance R, McGee C, Ellis P, Mudie LJ, Stephens PJ, McLaren S, Massie CE, Tarpey PS, Varela I, Nik-Zainal S, Davies HR, Shlien A, Jones D, Raine K, Hinton J, Butler AP, Teague JW, Baxter EJ, Score J, Galli A, Della Porta MG, Travaglino E, Groves M, Tauro S, Munshi NC, Anderson KC, El-Naggar A, Fischer A, Mustonen V, Warren AJ, Cross NC, Green AR, Futreal PA, Stratton MR, Campbell PJ and Chronic Myeloid Disorders Working Group of the International Cancer Genome Consortium

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom

    Background: Myelodysplastic syndromes are a diverse and common group of chronic hematologic cancers. The identification of new genetic lesions could facilitate new diagnostic and therapeutic strategies.

    Methods: We used massively parallel sequencing technology to identify somatically acquired point mutations across all protein-coding exons in the genome in 9 patients with low-grade myelodysplasia. Targeted resequencing of the gene encoding RNA splicing factor 3B, subunit 1 (SF3B1), was also performed in a cohort of 2087 patients with myeloid or other cancers.

    Results: We identified 64 point mutations in the 9 patients. Recurrent somatically acquired mutations were identified in SF3B1. Follow-up revealed SF3B1 mutations in 72 of 354 patients (20%) with myelodysplastic syndromes, with particularly high frequency among patients whose disease was characterized by ring sideroblasts (53 of 82 [65%]). The gene was also mutated in 1 to 5% of patients with a variety of other tumor types. The observed mutations were less deleterious than was expected on the basis of chance, suggesting that the mutated protein retains structural integrity with altered function. SF3B1 mutations were associated with down-regulation of key gene networks, including core mitochondrial pathways. Clinically, patients with SF3B1 mutations had fewer cytopenias and longer event-free survival than patients without SF3B1 mutations.

    Conclusions: Mutations in SF3B1 implicate abnormalities of messenger RNA splicing in the pathogenesis of myelodysplastic syndromes. (Funded by the Wellcome Trust and others.).

    Funded by: Medical Research Council: G0800784, G1000729, MC_U105161083, MR/L003368/1; NCI NIH HHS: P01 CA078378, P01 CA078378-10, R01 CA124929, R01 CA124929-05; PHS HHS: P01-155249, P01-78378, P50-100007, R01-124929; Wellcome Trust: 077012/Z/05/Z, 093867, WT088340MA

    The New England journal of medicine 2011;365;15;1384-95

  • Fetal-specific DNA methylation ratio permits noninvasive prenatal diagnosis of trisomy 21.

    Papageorgiou EA, Karagrigoriou A, Tsaliki E, Velissariou V, Carter NP and Patsalis PC

    Cytogenetics and Genomics Department, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.

    The trials performed worldwide toward noninvasive prenatal diagnosis (NIPD) of Down's syndrome (or trisomy 21) have shown the commercial and medical potential of NIPD compared to the currently used invasive prenatal diagnostic procedures. Extensive investigation of methylation differences between the mother and the fetus has led to the identification of differentially methylated regions (DMRs). In this study, we present a strategy using the methylated DNA immunoprecipitation (MeDiP) methodology in combination with real-time quantitative PCR (qPCR) to achieve fetal chromosome dosage assessment, which can be performed noninvasively through the analysis of fetal-specific DMRs. We achieved noninvasive prenatal detection of trisomy 21 by determining the methylation ratio of normal and trisomy 21 cases for each tested fetal-specific DMR present in maternal peripheral blood, followed by further statistical analysis. The application of this fetal-specific methylation ratio approach provided correct diagnosis of 14 trisomy 21 and 26 normal cases.

    Funded by: Wellcome Trust: 079643

    Nature medicine 2011;17;4;510-3

  • Fetal-specific DNA methylation ratio permits noninvasive prenatal diagnosis of trisomy 21

    Papageorgiou EA, Karagrigoriou A, Tsaliki E, Velissariou V, CARTER NP, Patsalis PC

    Obstetrical and Gynecological Survey 2011;66;419

  • Bacterial epidemiology and biology--lessons from genome sequencing.

    Parkhill J and Wren BW

    The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Next-generation sequencing has ushered in a new era of microbial genomics, enabling the detailed historical and geographical tracing of bacteria. This is helping to shape our understanding of bacterial evolution.

    Funded by: Wellcome Trust

    Genome biology 2011;12;10;230

  • Identity crisis? The need for systematic gene IDs.

    Parsons M, Myler PJ, Berriman M, Roos DS and Stuart KD

    Recent years have seen an explosion in the availability of protozoan pathogen genome sequences. Although data regarding the underlying genome sequence remain relatively stable after the initial draft, understanding of specific gene function is increasing rapidly. This dichotomy is reflected in the relative stability of systematic gene identifiers (SysIDs(*)) in genome sequence databases, as compared to evolving and/or conflicting gene and gene product names. GenBank/EMBL/DDBJ accession numbers are important, but most protozoan parasite researchers use organism-based databases such as EuPathDB or GeneDB as their immediate resource for gene-based information because they not only provide sequence information but also functional information and links to references. Reference to SysIDs therefore provides a valuable bridge to this repository of information.

    Funded by: NIAID NIH HHS: 5U01AI075641

    Trends in parasitology 2011;27;5;183-4

  • Revealing the genetic structure of a trait by sequencing a population under selection.

    Parts L, Cubillos FA, Warringer J, Jain K, Salinas F, Bumpstead SJ, Molin M, Zia A, Simpson JT, Quail MA, Moses A, Louis EJ, Durbin R and Liti G

    The Wellcome Trust Sanger Institute, Hinxton, United Kingdom. leopold.parts@sanger.ac.uk

    One approach to understanding the genetic basis of traits is to study their pattern of inheritance among offspring of phenotypically different parents. Previously, such analysis has been limited by low mapping resolution, high labor costs, and large sample size requirements for detecting modest effects. Here, we present a novel approach to map trait loci using artificial selection. First, we generated populations of 10-100 million haploid and diploid segregants by crossing two budding yeast strains of different heat tolerance for up to 12 generations. We then subjected these large segregant pools to heat stress for up to 12 d, enriching for beneficial alleles. Finally, we sequenced total DNA from the pools before and during selection to measure the changes in parental allele frequency. We mapped 21 intervals with significant changes in genetic background in response to selection, which is several times more than found with traditional linkage methods. Nine of these regions contained two or fewer genes, yielding much higher resolution than previous genomic linkage studies. Multiple members of the RAS/cAMP signaling pathway were implicated, along with genes previously not annotated with heat stress response function. Surprisingly, at most selected loci, allele frequencies stopped changing before the end of the selection experiment, but alleles did not become fixed. Furthermore, we were able to detect the same set of trait loci in a population of diploid individuals with similar power and resolution, and observed primarily additive effects, similar to what is seen for complex trait genetics in other diploid organisms such as humans.

    Funded by: Biotechnology and Biological Sciences Research Council: BBF0152161; Canadian Institutes of Health Research: 202372; Wellcome Trust: WT077192/Z/05/Z, WT084507MA

    Genome research 2011;21;7;1131-8

  • Joint genetic analysis of gene expression data with inferred cellular phenotypes.

    Parts L, Stegle O, Winn J and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. leopold.parts@sanger.ac.uk

    Even within a defined cell type, the expression level of a gene differs in individual samples. The effects of genotype, measured factors such as environmental conditions, and their interactions have been explored in recent studies. Methods have also been developed to identify unmeasured intermediate factors that coherently influence transcript levels of multiple genes. Here, we show how to bring these two approaches together and analyse genetic effects in the context of inferred determinants of gene expression. We use a sparse factor analysis model to infer hidden factors, which we treat as intermediate cellular phenotypes that in turn affect gene expression in a yeast dataset. We find that the inferred phenotypes are associated with locus genotypes and environmental conditions and can explain genetic associations to genes in trans. For the first time, we consider and find interactions between genotype and intermediate phenotypes inferred from gene expression levels, complementing and extending established results.

    Funded by: Wellcome Trust: WT077192/Z/05/Z

    PLoS genetics 2011;7;1;e1001276

  • Maps of open chromatin guide the functional follow-up of genome-wide association signals: application to hematological traits.

    Paul DS, Nisbet JP, Yang TP, Meacham S, Rendon A, Hautaviita K, Tallila J, White J, Tijssen MR, Sivapalaratnam S, Basart H, Trip MD, Cardiogenics Consortium, MuTHER Consortium, Göttgens B, Soranzo N, Ouwehand WH and Deloukas P

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom. dp5@sanger.ac.uk

    Turning genetic discoveries identified in genome-wide association (GWA) studies into biological mechanisms is an important challenge in human genetics. Many GWA signals map outside exons, suggesting that the associated variants may lie within regulatory regions. We applied the formaldehyde-assisted isolation of regulatory elements (FAIRE) method in a megakaryocytic and an erythroblastoid cell line to map active regulatory elements at known loci associated with hematological quantitative traits, coronary artery disease, and myocardial infarction. We showed that the two cell types exhibit distinct patterns of open chromatin and that cell-specific open chromatin can guide the finding of functional variants. We identified an open chromatin region at chromosome 7q22.3 in megakaryocytes but not erythroblasts, which harbors the common non-coding sequence variant rs342293 known to be associated with platelet volume and function. Resequencing of this open chromatin region in 643 individuals provided strong evidence that rs342293 is the only putative causative variant in this region. We demonstrated that the C- and G-alleles differentially bind the transcription factor EVI1 affecting PIK3CG gene expression in platelets and macrophages. A protein-protein interaction network including up- and down-regulated genes in Pik3cg knockout mice indicated that PIK3CG is associated with gene pathways with an established role in platelet membrane biogenesis and thrombus formation. Thus, rs342293 is the functional common variant at this locus; to the best of our knowledge this is the first such variant to be elucidated among the known platelet quantitative trait loci (QTLs). Our data suggested a molecular mechanism by which a non-coding GWA index SNP modulates platelet phenotype.

    Funded by: British Heart Foundation: RG/09/12/28096; Medical Research Council: G0800784, G0900339, MC_U105260799; Wellcome Trust: 081917/Z/07/Z, 091746/Z/10/Z

    PLoS genetics 2011;7;6;e1002139

  • Acquired bleeding disorders

    Perry,D.J and GROVE,C.

    Blood and Bone Marrow Pathology 2011;565-82

  • Citrobacter rodentium is an unstable pathogen showing evidence of significant genomic flux.

    Petty NK, Feltwell T, Pickard D, Clare S, Toribio AL, Fookes M, Roberts K, Monson R, Nair S, Kingsley RA, Bulgin R, Wiles S, Goulding D, Keane T, Corton C, Lennard N, Harris D, Willey D, Rance R, Yu L, Choudhary JS, Churcher C, Quail MA, Parkhill J, Frankel G, Dougan G, Salmond GP and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Citrobacter rodentium is a natural mouse pathogen that causes attaching and effacing (A/E) lesions. It shares a common virulence strategy with the clinically significant human A/E pathogens enteropathogenic E. coli (EPEC) and enterohaemorrhagic E. coli (EHEC) and is widely used to model this route of pathogenesis. We previously reported the complete genome sequence of C. rodentium ICC168, where we found that the genome displayed many characteristics of a newly evolved pathogen. In this study, through PFGE, sequencing of isolates showing variation, whole genome transcriptome analysis and examination of the mobile genetic elements, we found that, consistent with our previous hypothesis, the genome of C. rodentium is unstable as a result of repeat-mediated, large-scale genome recombination and because of active transposition of mobile genetic elements such as the prophages. We sequenced an additional C. rodentium strain, EX-33, to reveal that the reference strain ICC168 is representative of the species and that most of the inactivating mutations were common to both isolates and likely to have occurred early on in the evolution of this pathogen. We draw parallels with the evolution of other bacterial pathogens and conclude that C. rodentium is a recently evolved pathogen that may have emerged alongside the development of inbred mice as a model for human disease.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    PLoS pathogens 2011;7;4;e1002018

  • A scalable pipeline for highly effective genetic modification of a malaria parasite.

    Pfander C, Anar B, Schwach F, Otto TD, Brochet M, Volkmann K, Quail MA, Pain A, Rosen B, Skarnes W, Rayner JC and Billker O

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    In malaria parasites, the systematic experimental validation of drug and vaccine targets by reverse genetics is constrained by the inefficiency of homologous recombination and by the difficulty of manipulating adenine and thymine (A+T)-rich DNA of most Plasmodium species in Escherichia coli. We overcame these roadblocks by creating a high-integrity library of Plasmodium berghei genomic DNA (>77% A+T content) in a bacteriophage N15-based vector that can be modified efficiently using the lambda Red method of recombineering. We built a pipeline for generating P. berghei genetic modification vectors at genome scale in serial liquid cultures on 96-well plates. Vectors have long homology arms, which increase recombination frequency up to tenfold over conventional designs. The feasibility of efficient genetic modification at scale will stimulate collaborative, genome-wide knockout and tagging programs for P. berghei.

    Funded by: Medical Research Council: G0501670, G0501670(76331); Wellcome Trust: 089085, WT089085/Z/09/Z

    Nature methods 2011;8;12;1078-82

  • Mendelian randomization study of B-type natriuretic peptide and type 2 diabetes: evidence of causal association from population studies.

    Pfister R, Sharp S, Luben R, Welsh P, Barroso I, Salomaa V, Meirhaeghe A, Khaw KT, Sattar N, Langenberg C and Wareham NJ

    Medical Research Council Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom. rp415@mrc-epid.cam.ac.uk

    Background: Genetic and epidemiological evidence suggests an inverse association between B-type natriuretic peptide (BNP) levels in blood and risk of type 2 diabetes (T2D), but the prospective association of BNP with T2D is uncertain, and it is unclear whether the association is confounded.

    We analysed the association between levels of the N-terminal fragment of pro-BNP (NT-pro-BNP) in blood and risk of incident T2D in a prospective case-cohort study and genotyped the variant rs198389 within the BNP locus in three T2D case-control studies. We combined our results with existing data in a meta-analysis of 11 case-control studies. Using a Mendelian randomization approach, we compared the observed association between rs198389 and T2D to that expected from the NT-pro-BNP level to T2D association and the NT-pro-BNP difference per C allele of rs198389. In participants of our case-cohort study who were free of T2D and cardiovascular disease at baseline, we observed a 21% (95% CI 3%-36%) decreased risk of incident T2D per one standard deviation (SD) higher log-transformed NT-pro-BNP levels in analysis adjusted for age, sex, body mass index, systolic blood pressure, smoking, family history of T2D, history of hypertension, and levels of triglycerides, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol. The association between rs198389 and T2D observed in case-control studies (odds ratio = 0.94 per C allele, 95% CI 0.91-0.97) was similar to that expected (0.96, 0.93-0.98) based on the pooled estimate for the log-NT-pro-BNP level to T2D association derived from a meta-analysis of our study and published data (hazard ratio = 0.82 per SD, 0.74-0.90) and the difference in NT-pro-BNP levels (0.22 SD, 0.15-0.29) per C allele of rs198389. No significant associations were observed between the rs198389 genotype and potential confounders.

    Conclusions: Our results provide evidence for a potential causal role of the BNP system in the aetiology of T2D. Further studies are needed to investigate the mechanisms underlying this association and possibilities for preventive interventions. Please see later in the article for the Editors' Summary.

    Funded by: British Heart Foundation: FS/10/005/28147; Medical Research Council: G0401527, G0601463, G1000143; Wellcome Trust: 077016/Z/05/Z

    PLoS medicine 2011;8;10;e1001112

  • Phenotype mining in CNV carriers from a population cohort.

    Pietiläinen OP, Rehnström K, Jakkula E, Service SK, Congdon E, Tilgmann C, Hartikainen AL, Taanila A, Heikura U, Paunio T, Ripatti S, Jarvelin MR, Isohanni M, Sabatti C, Palotie A, Freimer NB and Peltonen L

    Institute for Molecular Medicine Finland, and Department of Medical Genetics, University of Helsinki, Helsinki, Finland.

    Phenotype mining is a novel approach for elucidating the genetic basis of complex phenotypic variation. It involves a search of rich phenotype databases for measures correlated with genetic variation, as identified in genome-wide genotyping or sequencing studies. An initial implementation of phenotype mining in a prospective unselected population cohort, the Northern Finland 1966 Birth Cohort (NFBC1966), identifies neurodevelopment-related traits-intellectual deficits, poor school performance and hearing abnormalities-which are more frequent among individuals with large (>500 kb) deletions than among other cohort members. Observation of extensive shared single nucleotide polymorphism haplotypes around deletions suggests an opportunity to expand phenotype mining from cohort samples to the populations from which they derive.

    Funded by: Medical Research Council: G0500539; NHLBI NIH HHS: 5R01HL087679-02; NIMH NIH HHS: 1RL1MH083268-01; NINDS NIH HHS: P30NS062691, PL1NS062410; Wellcome Trust: WT089061, WT089062

    Human molecular genetics 2011;20;13;2686-95

  • Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants.

    Pinto D, Darvishi K, Shi X, Rajan D, Rigler D, Fitzgerald T, Lionel AC, Thiruvahindrapuram B, Macdonald JR, Mills R, Prasad A, Noonan K, Gribble S, Prigmore E, Donahoe PK, Smith RS, Park JH, Hurles ME, Carter NP, Lee C, Scherer SW and Feuk L

    The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada.

    We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.

    Funded by: Canadian Institutes of Health Research: 213997; NCI NIH HHS: CA111560, R01 CA111560-05; NHGRI NIH HHS: HG004221, HG005209, P41 HG004221-03, U01 HG005209-02; NICHD NIH HHS: HD007396, HD055150; NIGMS NIH HHS: T32 GM007748-33; Wellcome Trust: 077008, 077014, WT077008

    Nature biotechnology 2011;29;6;512-20

  • An evolutionary genomic approach to identify genes involved in human birth timing.

    Plunkett J, Doniger S, Orabona G, Morgan T, Haataja R, Hallman M, Puttonen H, Menon R, Kuczynski E, Norwitz E, Snegovskikh V, Palotie A, Peltonen L, Fellman V, DeFranco EA, Chaudhari BP, McGregor TL, McElroy JJ, Oetjens MT, Teramo K, Borecki I, Fay J and Muglia L

    Department of Pediatrics, Vanderbilt University School of Medicine and Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, Tennessee, United States of America.

    Coordination of fetal maturation with birth timing is essential for mammalian reproduction. In humans, preterm birth is a disorder of profound global health significance. The signals initiating parturition in humans have remained elusive, due to divergence in physiological mechanisms between humans and model organisms typically studied. Because of relatively large human head size and narrow birth canal cross-sectional area compared to other primates, we hypothesized that genes involved in parturition would display accelerated evolution along the human and/or higher primate phylogenetic lineages to decrease the length of gestation and promote delivery of a smaller fetus that transits the birth canal more readily. Further, we tested whether current variation in such accelerated genes contributes to preterm birth risk. Evidence from allometric scaling of gestational age suggests human gestation has been shortened relative to other primates. Consistent with our hypothesis, many genes involved in reproduction show human acceleration in their coding or adjacent noncoding regions. We screened >8,400 SNPs in 150 human accelerated genes in 165 Finnish preterm and 163 control mothers for association with preterm birth. In this cohort, the most significant association was in FSHR, and 8 of the 10 most significant SNPs were in this gene. Further evidence for association of a linkage disequilibrium block of SNPs in FSHR, rs11686474, rs11680730, rs12473870, and rs1247381 was found in African Americans. By considering human acceleration, we identified a novel gene that may be associated with preterm birth, FSHR. We anticipate other human accelerated genes will similarly be associated with preterm birth risk and elucidate essential pathways for human parturition.

    Funded by: NIGMS NIH HHS: T32 GM081739

    PLoS genetics 2011;7;4;e1001365

  • Jamb and jamc are essential for vertebrate myocyte fusion.

    Powell GT and Wright GJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Cellular fusion is required in the development of several tissues, including skeletal muscle. In vertebrates, this process is poorly understood and lacks an in vivo-validated cell surface heterophilic receptor pair that is necessary for fusion. Identification of essential cell surface interactions between fusing cells is an important step in elucidating the molecular mechanism of cellular fusion. We show here that the zebrafish orthologues of JAM-B and JAM-C receptors are essential for fusion of myocyte precursors to form syncytial muscle fibres. Both jamb and jamc are dynamically co-expressed in developing muscles and encode receptors that physically interact. Heritable mutations in either gene prevent myocyte fusion in vivo, resulting in an overabundance of mononuclear, but otherwise overtly normal, functional fast-twitch muscle fibres. Transplantation experiments show that the Jamb and Jamc receptors must interact between neighbouring cells (in trans) for fusion to occur. We also show that jamc is ectopically expressed in prdm1a mutant slow muscle precursors, which inappropriately fuse with other myocytes, suggesting that control of myocyte fusion through regulation of jamc expression has important implications for the growth and patterning of muscles. Our discovery of a receptor-ligand pair critical for fusion in vivo has important implications for understanding the molecular mechanisms responsible for myocyte fusion and its regulation in vertebrate myogenesis.

    Funded by: Wellcome Trust: 077047/Z/05/Z, 077108/Z/05/Z

    PLoS biology 2011;9;12;e1001216

  • A resource of vectors and ES cells for targeted deletion of microRNAs in mice.

    Prosser HM, Koike-Yusa H, Cooper JD, Law FC and Bradley A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. hmp@sanger.ac.uk

    The 21-23 nucleotide, single-stranded RNAs classified as microRNAs (miRNA) perform fundamental roles in diverse cellular and developmental processes. In contrast to the situation for protein-coding genes, no public resource of miRNA mouse mutant alleles exists. Here we describe a collection of 428 miRNA targeting vectors covering 476 of the miRNA genes annotated in the miRBase registry. Using these vectors, we generated a library of highly germline-transmissible C57BL/6N mouse embryonic stem (ES) cell clones harboring targeted deletions for 392 miRNA genes. For most of these targeted clones, chimerism and germline transmission can be scored through a coat color marker. The targeted alleles have been designed to be adaptable research tools that can be efficiently altered by recombinase-mediated cassette exchange to create reporter, conditional and other allelic variants. This miRNA knockout (mirKO) resource can be searched electronically and is available from ES cell repositories for distribution to the scientific community.

    Funded by: Wellcome Trust: 079643, WT079643

    Nature biotechnology 2011;29;9;840-5

  • Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia.

    Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G, Jares P, Beà S, González-Díaz M, Bassaganyas L, Baumann T, Juan M, López-Guerra M, Colomer D, Tubío JM, López C, Navarro A, Tornador C, Aymerich M, Rozman M, Hernández JM, Puente DA, Freije JM, Velasco G, Gutiérrez-Fernández A, Costa D, Carrió A, Guijarro S, Enjuanes A, Hernández L, Yagüe J, Nicolás P, Romeo-Casabona CM, Himmelbauer H, Castillo E, Dohm JC, de Sanjosé S, Piris MA, de Alava E, San Miguel J, Royo R, Gelpí JL, Torrents D, Orozco M, Pisano DG, Valencia A, Guigó R, Bayés M, Heath S, Gut M, Klatt P, Marshall J, Raine K, Stebbings LA, Futreal PA, Stratton MR, Campbell PJ, Gut I, López-Guillermo A, Estivill X, Montserrat E, López-Otín C and Campo E

    Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología, Universidad de Oviedo, 33006 Oviedo, Spain.

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer.

    Funded by: Wellcome Trust: 093867

    Nature 2011;475;7354;101-5

  • Genomic libraries: I. Construction and screening of fosmid genomic libraries.

    Quail MA, Matthews L, Sims S, Lloyd C, Beasley H and Baxter SW

    Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.

    Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.

    Methods in molecular biology (Clifton, N.J.) 2011;772;37-58

  • Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Quail MA, Matthews L, Sims S, Lloyd C, Beasley H and Baxter SW

    Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

    Methods in molecular biology (Clifton, N.J.) 2011;772;59-81

  • Genetic and transcriptional analysis of phosphoinositide-specific phospholipase C in Plasmodium.

    Raabe A, Berry L, Sollelis L, Cerdan R, Tawk L, Vial HJ, Billker O and Wengelnik K

    UMR5235, CNRS-Université Montpellier 2, 34095 Montpellier, France.

    Phosphoinositide-specific phospholipase C (PI-PLC) is a major regulator of calcium-dependent signal transduction, which has been shown to be important in various processes of the malaria parasite Plasmodium. PI-PLC is generally implicated in calcium liberation from intracellular stores through the action of its product, inositol-(1,4,5)-trisphosphate, and is itself dependent on calcium for its activation. Here we describe the plc genes from Plasmodium species. The encoded proteins contain all domains typically found in PI-PLCs of the δ class but are almost twice as long as their orthologues in mammals. Transcriptional analysis by qRT-PCR of plc during the erythrocytic cycle of P. falciparum revealed steady expression levels that increased at the late schizont stages. Genetic analysis in the P. berghei model revealed that the plc locus was targetable but that plc gene knock-outs could not be obtained, thereby strongly indicating that the gene is essential during blood stage development. Alternatively, we attempted to modify plc expression through a promoter exchange approach but found the gene to be refractory to over-expression indicating that plc expression levels might additionally be tightly controlled.

    Funded by: Medical Research Council: G0501670; Wellcome Trust: WT089085/Z/09/Z

    Experimental parasitology 2011;129;1;75-80

  • Multiple roles for Plasmodium berghei phosphoinositide-specific phospholipase C in regulating gametocyte activation and differentiation.

    Raabe AC, Wengelnik K, Billker O and Vial HJ

    UMR5235, CNRS-Université Montpellier 2, Place Eugène Bataillon, Montpellier cedex 5, France.

    Critical events in the life cycle of malaria parasites are controlled by calcium-dependent signalling cascades, yet the molecular mechanisms of calcium release remain poorly understood. The synchronized development of Plasmodium berghei gametocytes relies on rapid calcium release from internal stores within 10 s of gametocytes being exposed to mosquito-derived xanthurenic acid (XA). Here we addressed the function of phosphoinositide-specific phospholipase C (PI-PLC) for regulating gametocyte activation. XA triggered the hydrolysis of PIP(2) and the production of the secondary messenger IP(3) in gametocytes. Both processes were selectively blocked by a PI-PLC inhibitor, which also reduced the early Ca(2+) signal. However, microgametocyte differentiation into microgametes was blocked even when the inhibitor was added up to 5 min after activation, suggesting a requirement for PI-PLC beyond the early mobilization of calcium. In contrast, inhibitors of calcium release through ryanodine receptor channels were active only during the first minute of gametocyte activation. Biochemical determination of PI-PLC activity was confirmed using transgenic parasites expressing a fluorescent PIP(2) /IP(3) probe that translocates from the parasite plasmalemma to the cytosol upon cell activation. Our study revealed a complex interdependency of Ca(2+) and PI-PLC activity, with PI-PLC being essential throughout gamete formation, possibly explaining the irreversibility of this process.

    Funded by: Medical Research Council: G0501670; Wellcome Trust: WT089085/Z/09/Z

    Cellular microbiology 2011;13;7;955-66

  • Early Diagnosis of Werner's Syndrome Using Exome-Wide Sequencing in a Single, Atypical Patient.

    Raffan E, Hurst LA, Turki SA, Carpenter G, Scott C, Daly A, Coffey A, Bhaskar S, Howard E, Khan N, Kingston H, Palotie A, Savage DB, O'Driscoll M, Smith C, O'Rahilly S, Barroso I and Semple RK

    Institute of Metabolic Science, University of Cambridge Metabolic Research Laboratories Cambridge, UK.

    Genetic diagnosis of inherited metabolic disease is conventionally achieved through syndrome recognition and targeted gene sequencing, but many patients receive no specific diagnosis. Next-generation sequencing allied to capture of expressed sequences from genomic DNA now offers a powerful new diagnostic approach. Barriers to routine diagnostic use include cost, and the complexity of interpreting results arising from simultaneous identification of large numbers of variants. We applied exome-wide sequencing to an individual, 16-year-old daughter of consanguineous parents with a novel syndrome of short stature, severe insulin resistance, ptosis, and microcephaly. Pulldown of expressed sequences from genomic DNA followed by massively parallel sequencing was undertaken. Single nucleotide variants were called using SAMtools prior to filtering based on sequence quality and existence in control genomes and exomes. Of 485 genetic variants predicted to alter protein sequence and absent from control data, 24 were homozygous in the patient. One mutation - the p.Arg732X mutation in the WRN gene - has previously been reported in Werner's syndrome (WS). On re-evaluation of the patient several early features of WS were detected including loss of fat from the extremities and frontal hair thinning. Lymphoblastoid cells from the proband exhibited a defective decatenation checkpoint, consistent with loss of WRN activity. We have thus diagnosed WS some 15 years earlier than average, permitting aggressive prophylactic therapy and screening for WS complications, illustrating the potential of exome-wide sequencing to achieve early diagnosis and change management of rare autosomal recessive disease, even in individual patients of consanguineous parentage with apparently novel syndromes.

    Funded by: Cancer Research UK: 8300; Medical Research Council: G0700733

    Frontiers in endocrinology 2011;2;8

  • Founder effect in the Horn of Africa for an insulin receptor mutation that may impair receptor recycling.

    Raffan E, Soos MA, Rocha N, Tuthill A, Thomsen AR, Hyden CS, Gregory JW, Hindmarsh P, Dattani M, Cochran E, Al Kaabi J, Gorden P, Barroso I, Morling N, O'Rahilly S and Semple RK

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, University of Cambridge, Addenbrooke's Hospital B289, Cambridge, CB2 0QR, UK.

    Genetic insulin receptoropathies are a rare cause of severe insulin resistance. We identified the Ile119Met missense mutation in the insulin receptor INSR gene, previously reported in a Yemeni kindred, in four unrelated patients with Somali ancestry. We aimed to investigate a possible genetic founder effect, and to study the mechanism of loss of function of the mutant receptor.

    Methods: Biochemical profiling and DNA haplotype analysis of affected patients were performed. Insulin receptor expression in lymphoblastoid cells from a homozygous p.Ile119Met INSR patient, and in cells heterologously expressing the mutant receptor, was examined. Insulin binding, insulin-stimulated receptor autophosphorylation, and cooperativity and pH dependency of insulin dissociation were also assessed.

    Results: All patients had biochemical profiles pathognomonic of insulin receptoropathy, while haplotype analysis revealed the putative shared region around the INSR mutant to be no larger than 28 kb. An increased insulin proreceptor to β subunit ratio was seen in patient-derived cells. Steady state insulin binding and insulin-stimulated autophosphorylation of the mutant receptor was normal; however it exhibited decreased insulin dissociation rates with preserved cooperativity, a difference accentuated at low pH.

    The p.Ile119Met INSR appears to have arisen around the Horn of Africa, and should be sought first in severely insulin resistant patients with ancestry from this region. Despite collectively compelling genetic, clinical and biochemical evidence for its pathogenicity, loss of function in conventional in vitro assays is subtle, suggesting mildly impaired receptor recycling only.

    Funded by: Medical Research Council; Wellcome Trust: 077016/Z/05/Z, 078986/Z/06/Z, 080952/Z/06/Z, 087678/Z/08/Z

    Diabetologia 2011;54;5;1057-65

  • Evidence that Cd101 is an autoimmune diabetes gene in nonobese diabetic mice.

    Rainbow DB, Moule C, Fraser HI, Clark J, Howlett SK, Burren O, Christensen M, Moody V, Steward CA, Mohammed JP, Fusakio ME, Masteller EL, Finger EB, Houchins JP, Naf D, Koentgen F, Ridgway WM, Todd JA, Bluestone JA, Peterson LB, Mattner J and Wicker LS

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom.

    We have previously proposed that sequence variation of the CD101 gene between NOD and C57BL/6 mice accounts for the protection from type 1 diabetes (T1D) provided by the insulin-dependent diabetes susceptibility region 10 (Idd10), a <1 Mb region on mouse chromosome 3. In this study, we provide further support for the hypothesis that Cd101 is Idd10 using haplotype and expression analyses of novel Idd10 congenic strains coupled to the development of a CD101 knockout mouse. Susceptibility to T1D was correlated with genotype-dependent CD101 expression on multiple cell subsets, including Foxp3(+) regulatory CD4(+) T cells, CD11c(+) dendritic cells, and Gr1(+) myeloid cells. The correlation of CD101 expression on immune cells from four independent Idd10 haplotypes with the development of T1D supports the identity of Cd101 as Idd10. Because CD101 has been associated with regulatory T and Ag presentation cell functions, our results provide a further link between immune regulation and susceptibility to T1D.

    Funded by: NIAID NIH HHS: AI 15416, N01 AI015416, P01 AI039671, P01 AI039671-16, P01AI039671; NIDDK NIH HHS: P30 DK078392, P30 DK078392-01, R01 DK084054, R01 DK084054-03, R01DK084054; Wellcome Trust: 079895, 091157

    Journal of immunology (Baltimore, Md. : 1950) 2011;187;1;325-36

  • Cutting edge: the membrane attack complex of complement is required for the development of murine experimental cerebral malaria.

    Ramos TN, Darley MM, Hu X, Billker O, Rayner JC, Ahras M, Wohler JE and Barnum SR

    Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

    Cerebral malaria is the most severe complication of Plasmodium falciparum infection and accounts for a large number of malaria fatalities worldwide. Recent studies demonstrated that C5(-/-) mice are resistant to experimental cerebral malaria (ECM) and suggested that protection was due to loss of C5a-induced inflammation. Surprisingly, we observed that C5aR(-/-) mice were fully susceptible to disease, indicating that C5a is not required for ECM. C3aR(-/-) and C3aR(-/-) × C5aR(-/-) mice were equally susceptible to ECM as were wild-type mice, indicating that neither complement anaphylatoxin receptor is critical for ECM development. In contrast, C9 deposition in the brains of mice with ECM suggested an important role for the terminal complement pathway. Treatment with anti-C9 Ab significantly increased survival time and reduced mortality in ECM. Our data indicate that protection from ECM in C5(-/-) mice is mediated through inhibition of membrane attack complex formation and not through C5a-induced inflammation.

    Funded by: Medical Research Council: G0501670; NIAID NIH HHS: AI08382, R03 AI083820, R03 AI083820-02, T32 AI007051, T32 AI007051-35, T32 AI07051

    Journal of immunology (Baltimore, Md. : 1950) 2011;186;12;6657-60

  • Kino: A generic document management system for biologists using SA-REST and faceted search

    Ranabahu A, Parikh P, Panahiazar M, Sheth A, LOGAN-KLUMPLER F

    Proceedings - 5th IEEE International Conference on Semantic Computing, ICSC 2011;205-8

  • Asparagine peptide lyases: a seventh catalytic type of proteolytic enzymes.

    Rawlings ND, Barrett AJ and Bateman A

    Wellcome Trust Genome Campus, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. ndr@sanger.ac.uk

    The terms "proteolytic enzyme" and "peptidase" have been treated as synonymous, and all proteolytic enzymes have been considered to be hydrolases (EC 3.4). However, the recent discovery of proteins that cleave themselves at asparagine residues indicates that not all peptide bond cleavage occurs by hydrolysis. These self-cleaving proteins include the Tsh protein precursor of Escherichia coli, in which the large C-terminal propeptide acts as an autotransporter; certain viral coat proteins; and proteins containing inteins. Proteolysis is the action of an amidine lyase (EC 4.3.2). These proteolytic enzymes are also the first in which the nucleophile is an asparagine, defining the seventh proteolytic catalytic type and the first to be discovered since 2004. We have assembled ten families based on sequence similarity in which cleavage is thought to be catalyzed by an asparagine.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    The Journal of biological chemistry 2011;286;44;38321-8

  • A plethora of Plasmodium species in wild apes: a source of human infection?

    Rayner JC, Liu W, Peeters M, Sharp PM and Hahn BH

    Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. julian.rayner@sanger.ac.uk

    Recent studies of captive and wild-living apes in Africa have uncovered evidence of numerous new Plasmodium species, one of which was identified as the immediate precursor of human Plasmodium falciparum. These findings raise the question whether wild apes could be a recurrent source of Plasmodium infections in humans. This question is not new, but was the subject of intense investigation by researchers in the first half of the last century. Re-examination of their work in the context of recent molecular findings provides a new framework to understand the diversity of Plasmodium species and to assess the risk of future cross-species transmissions to humans in the context of proposed malaria eradication programs.

    Funded by: NIAID NIH HHS: P30 AI 27767, R01 AI091595, R01 AI50529, R01 AI58715, R03 AI074778, R37 AI050529; Wellcome Trust

    Trends in parasitology 2011;27;5;222-9

  • Retrotransposon-induced heterochromatin spreading in the mouse revealed by insertional polymorphisms.

    Rebollo R, Karimi MM, Bilenky M, Gagnier L, Miceli-Royer K, Zhang Y, Goyal P, Keane TM, Jones S, Hirst M, Lorincz MC and Mager DL

    Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, Canada.

    The "arms race" relationship between transposable elements (TEs) and their host has promoted a series of epigenetic silencing mechanisms directed against TEs. Retrotransposons, a class of TEs, are often located in repressed regions and are thought to induce heterochromatin formation and spreading. However, direct evidence for TE-induced local heterochromatin in mammals is surprisingly scarce. To examine this phenomenon, we chose two mouse embryonic stem (ES) cell lines that possess insertionally polymorphic retrotransposons (IAP, ETn/MusD, and LINE elements) at specific loci in one cell line but not the other. Employing ChIP-seq data for these cell lines, we show that IAP elements robustly induce H3K9me3 and H4K20me3 marks in flanking genomic DNA. In contrast, such heterochromatin is not induced by LINE copies and only by a minority of polymorphic ETn/MusD copies. DNA methylation is independent of the presence of IAP copies, since it is present in flanking regions of both full and empty sites. Finally, such spreading into genes appears to be rare, since the transcriptional start sites of very few genes are less than one Kb from an IAP. However, the B3galtl gene is subject to transcriptional silencing via IAP-induced heterochromatin. Hence, although rare, IAP-induced local heterochromatin spreading into nearby genes may influence expression and, in turn, host fitness.

    Funded by: Canadian Institutes of Health Research: 10825, 92090, 92093; Medical Research Council; Wellcome Trust

    PLoS genetics 2011;7;9;e1002301

  • Determinants of R-loop formation at convergent bidirectionally transcribed trinucleotide repeats.

    Reddy K, Tam M, Bowater RP, Barber M, Tomlinson M, Nichol Edamura K, Wang YH and Pearson CE

    Program of Genetics & Genome Biology, The Hospital for Sick Children, 101 College Street, East Tower, 15-312 TMDT, Toronto, Ontario M5G 1L7, Canada.

    R-loops have been described at immunoglobulin class switch sequences, prokaryotic and mitochondrial replication origins, and disease-associated (CAG)n and (GAA)n trinucleotide repeats. The determinants of trinucleotide R-loop formation are unclear. Trinucleotide repeat expansions cause diseases including DM1 (CTG)n, SCA1 (CAG)n, FRAXA (CGG)n, FRAXE (CCG)n and FRDA (GAA)n. Bidirectional convergent transcription across these disease repeats can occur. We find R-loops formed when CTG or CGG and their complementary strands CAG or CCG were transcribed; GAA transcription, but not TTC, yielded R-loops. R-loop formation was sensitive to DNA supercoiling, repeat length, insensitive to repeat interruptions, and formed by extension of RNA:DNA hybrids in the RNA polymerase. R-loops arose by transcription in one direction followed by transcription in the opposite direction, and during simultaneous convergent bidirectional transcription of the same repeat forming double R-loop structures. Since each transcribed disease repeat formed R-loops suggests they may have biological functions.

    Funded by: Canadian Institutes of Health Research: MOP-94966; NCI NIH HHS: R01 CA085826; Wellcome Trust

    Nucleic acids research 2011;39;5;1749-62

  • Genome sequencing gets func-y.

    Reid AJ

    Nature reviews. Microbiology 2011;9;6;401

  • Effects of chronic ascariasis and trichuriasis on cytokine production and gene expression in human blood: a cross-sectional study.

    Reina Ortiz M, Schreiber F, Benitez S, Broncano N, Chico ME, Vaca M, Alexander N, Lewis DJ, Dougan G and Cooper PJ

    Centro de Investigaciones, Fundación Ecuatoriana Para Investigación en Salud, Quinindé, Ecuador.

    Background: Chronic soil-transmitted helminth (STH) infections are associated with effects on systemic immune responses that could be caused by alterations in immune homeostasis. To investigate this, we measured the impact in children of STH infections on cytokine responses and gene expression in unstimulated blood.

    Sixty children were classified as having chronic, light, or no STH infections. Peripheral blood mononuclear cells were cultured in medium for 5 days to measure cytokine accumulation. RNA was isolated from peripheral blood and gene expression analysed using microarrays. Different infection groups were compared for the purpose of analysis: STH infection (combined chronic and light vs. uninfected groups) and chronic STH infection (chronic vs. combined light and uninfected groups). The chronic STH infection effect was associated with elevated production of GM-CSF (P=0.007), IL-2 (P=0.03), IL-5 (P=0.01), and IL-10 (P=0.01). Data reduction suggested that chronic infections were primarily associated with an immune phenotype characterized by elevated IL-5 and IL-10, typical of a modified Th2-like response. Chronic STH infections were associated with the up-regulation of genes associated with immune homeostasis (IDO, P=0.03; CCL23, P=0.008, HRK, P=0.005), down-regulation of microRNA hsa-let-7d (P=0.01) and differential regulation of several genes associated with granulocyte-mediated inflammation (IL-8, down-regulated, P=0.0002; RNASE2, up-regulated, P=0.009; RNASE3, up-regulated, p=0.03).

    Chronic STH infections were associated with a cytokine response indicative of a modified Th2 response. There was evidence that STH infections were associated with a pattern of gene expression suggestive of the induction of homeostatic mechanisms, the differential expression of several inflammatory genes and the down-regulation of microRNA has-let-7d. Effects on immune homeostasis and the development of a modified Th2 immune response during chronic STH infections could explain the systemic immunologic effects that have been associated with these infections such as impaired immune responses to vaccines and the suppression of inflammatory diseases.

    Funded by: Wellcome Trust: 074679/Z/04/Z

    PLoS neglected tropical diseases 2011;5;6;e1157

  • Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.

    Renfree MB, Papenfuss AT, Deakin JE, Lindsay J, Heider T, Belov K, Rens W, Waters PD, Pharo EA, Shaw G, Wong ES, Lefèvre CM, Nicholas KR, Kuroki Y, Wakefield MJ, Zenger KR, Wang C, Ferguson-Smith M, Nicholas FW, Hickford D, Yu H, Short KR, Siddle HV, Frankenberg SR, Chew KY, Menzies BR, Stringer JM, Suzuki S, Hore TA, Delbridge ML, Mohammadi A, Schneider NY, Hu Y, O'Hara W, Al Nadaf S, Wu C, Feng ZP, Cocks BG, Wang J, Flicek P, Searle SM, Fairley S, Beal K, Herrero J, Carone DM, Suzuki Y, Sugano S, Toyoda A, Sakaki Y, Kondo S, Nishida Y, Tatsumoto S, Mandiou I, Hsu A, McColl KA, Lansdell B, Weinstock G, Kuczek E, McGrath A, Wilson P, Men A, Hazar-Rethinam M, Hall A, Davis J, Wood D, Williams S, Sundaravadanam Y, Muzny DM, Jhangiani SN, Lewis LR, Morgan MB, Okwuonu GO, Ruiz SJ, Santibanez J, Nazareth L, Cree A, Fowler G, Kovar CL, Dinh HH, Joshi V, Jing C, Lara F, Thornton R, Chen L, Deng J, Liu Y, Shen JY, Song XZ, Edson J, Troon C, Thomas D, Stephens A, Yapa L, Levchenko T, Gibbs RA, Cooper DW, Speed TP, Fujiyama A, Graves JA, O'Neill RJ, Pask AJ, Forrest SM and Worley KC

    The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia. m.renfree@unimelb.edu.au

    Background: We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development.

    Results: The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements.

    Conclusions: Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.

    Funded by: NHGRI NIH HHS: U54-HG003273

    Genome biology 2011;12;8;R81

  • Genetic predisposition to long-term nondiabetic deteriorations in glucose homeostasis: Ten-year follow-up of the GLACIER study.

    Renström F, Shungin D, Johansson I, MAGIC Investigators, Florez JC, Hallmans G, Hu FB and Franks PW

    Department of Public Health and Clinical Medicine, Umeå University Hospital, Sweden.

    Objective: To assess whether recently discovered genetic loci associated with hyperglycemia also predict long-term changes in glycemic traits.

    Sixteen fasting glucose-raising loci were genotyped in middle-aged adults from the Gene x Lifestyle interactions And Complex traits Involved in Elevated disease Risk (GLACIER) Study, a population-based prospective cohort study from northern Sweden. Genotypes were tested for association with baseline fasting and 2-h postchallenge glycemia (N = 16,330), and for changes in these glycemic traits during a 10-year follow-up period (N = 4,059).

    Results: Cross-sectional directionally consistent replication with fasting glucose concentrations was achieved for 12 of 16 variants; 10 variants were also associated with impaired fasting glucose (IFG) and 7 were independently associated with 2-h postchallenge glucose concentrations. In prospective analyses, the effect alleles at four loci (GCK rs4607517, ADRA2A rs10885122, DGKB-TMEM195 rs2191349, and G6PC2 rs560887) were nominally associated with worsening fasting glucose concentrations during 10-years of follow-up. MTNR1B rs10830963, which was predictive of elevated fasting glucose concentrations in cross-sectional analyses, was associated with a protective effect on postchallenge glucose concentrations during follow-up; however, this was only when baseline fasting and 2-h glucoses were adjusted for. An additive effect of multiple risk alleles on glycemic traits was observed: a weighted genetic risk score (80th vs. 20th centiles) was associated with a 0.16 mmol/l (P = 2.4 × 10⁻⁶) greater elevation in fasting glucose and a 64% (95% CI: 33-201%) higher risk of developing IFG during 10 years of follow-up.

    Conclusions: Our findings imply that genetic profiling might facilitate the early detection of persons who are genetically susceptible to deteriorating glucose control; studies of incident type 2 diabetes and discrete cardiovascular end points will help establish whether the magnitude of these changes is clinically relevant.

    Diabetes 2011;60;1;345-54

  • Effect of using varying negative examples in transcription factor binding site predictions

    Rezwan F, Sun Y, Davey N, Adams R, RUST AG, Robinson M

    Lecture Notes in Computer Science 2011;6623 LNCS;1-12

  • Genome sequences of Salmonella enterica serovar typhimurium, Choleraesuis, Dublin, and Gallinarum strains of well- defined virulence in food-producing animals.

    Richardson EJ, Limaye B, Inamdar H, Datta A, Manjari KS, Pullinger GD, Thomson NR, Joshi RR, Watson M and Stevens MP

    The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK.

    Salmonella enterica is an animal and zoonotic pathogen of worldwide importance and may be classified into serovars differing in virulence and host range. We sequenced and annotated the genomes of serovar Typhimurium, Choleraesuis, Dublin, and Gallinarum strains of defined virulence in each of three food-producing animal hosts. This provides valuable measures of intraserovar diversity and opportunities to formally link genotypes to phenotypes in target animals.

    Journal of bacteriology 2011;193;12;3162-3

  • Prospective study of insulin-like growth factor-I, insulin-like growth factor-binding protein 3, genetic variants in the IGF1 and IGFBP3 genes and risk of coronary artery disease.

    Ricketts SL, Rensing KL, Holly JM, Chen L, Young EH, Luben R, Ashford S, Song K, Yuan X, Dehghan A, Wright BJ, Waterworth DM, Mooser V, GEMS Investigators, Waeber G, Vollenweider P, Epstein SE, Burnett MS, Devaney JM, Hakonarson HH, Rader DJ, Reilly MP, Danesh J, Thompson SG, Dunning AM, van Duijn CM, Samani NJ, McPherson R, Wareham NJ, Khaw KT, Boekholdt SM and Sandhu MS

    Although experimental studies have suggested that insulin-like growth factor I (IGF-I) and its binding protein IGFBP-3 might have a role in the aetiology of coronary artery disease (CAD), the relevance of circulating IGFs and their binding proteins in the development of CAD in human populations is unclear. We conducted a nested case-control study, with a mean follow-up of six years, within the EPIC-Norfolk cohort to assess the association between circulating levels of IGF-I and IGFBP-3 and risk of CAD in up to 1,013 cases and 2,055 controls matched for age, sex and study enrolment date. After adjustment for cardiovascular risk factors, we found no association between circulating levels of IGF-I or IGFBP-3 and risk of CAD (odds ratio: 0.98 (95% Cl 0.90-1.06) per 1 SD increase in circulating IGF-I; odds ratio: 1.02 (95% Cl 0.94-1.12) for IGFBP-3). We examined associations between tagging single nucleotide polymorphisms (tSNPs) at the IGF1 and IGFBP3 loci and circulating IGF-I and IGFBP-3 levels in up to 1,133 cases and 2,223 controls and identified three tSNPs (rs1520220, rs3730204, rs2132571) that showed independent association with either circulating IGF-I or IGFBP-3 levels. In an assessment of 31 SNPs spanning the IGF1 or IGFBP3 loci, none were associated with risk of CAD in a meta-analysis that included EPIC-Norfolk and eight additional studies comprising up to 9,319 cases and 19,964 controls. Our results indicate that IGF-I and IGFBP-3 are unlikely to be importantly involved in the aetiology of CAD in human populations.

    International journal of molecular epidemiology and genetics 2011;2;3;261-85

  • The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium.

    Ringwald M, Iyer V, Mason JC, Stone KR, Tadepally HD, Kadin JA, Bult CJ, Eppig JT, Oakley DJ, Briois S, Stupka E, Maselli V, Smedley D, Liu S, Hansen J, Baldock R, Hicks GG and Skarnes WC

    The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA. ringwald@informatics.jax.org

    The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.

    Funded by: Medical Research Council: MC_U127527203; NHGRI NIH HHS: HG004074

    Nucleic acids research 2011;39;Database issue;D849-55

  • Genome-wide association study identifies five new schizophrenia loci.

    Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, Holmans PA, Lin DY, Duan J, Ophoff RA, Andreassen OA, Scolnick E, Cichon S, St Clair D, Corvin A, Gurling H, Werge T, Rujescu D, Blackwood DH, Pato CN, Malhotra AK, Purcell S, Dudbridge F, Neale BM, Rossin L, Visscher PM, Posthuma D, Ruderfer DM, Fanous A, Stefansson H, Steinberg S, Mowry BJ, Golimbet V, De Hert M, Jönsson EG, Bitter I, Pietiläinen OP, Collier DA, Tosato S, Agartz I, Albus M, Alexander M, Amdur RL, Amin F, Bass N, Bergen SE, Black DW, Børglum AD, Brown MA, Bruggeman R, Buccola NG, Byerley WF, Cahn W, Cantor RM, Carr VJ, Catts SV, Choudhury K, Cloninger CR, Cormican P, Craddock N, Danoy PA, Datta S, de Haan L, Demontis D, Dikeos D, Djurovic S, Donnelly P, Donohoe G, Duong L, Dwyer S, Fink-Jensen A, Freedman R, Freimer NB, Friedl M, Georgieva L, Giegling I, Gill M, Glenthøj B, Godard S, Hamshere M, Hansen M, Hansen T, Hartmann AM, Henskens FA, Hougaard DM, Hultman CM, Ingason A, Jablensky AV, Jakobsen KD, Jay M, Jürgens G, Kahn RS, Keller MC, Kenis G, Kenny E, Kim Y, Kirov GK, Konnerth H, Konte B, Krabbendam L, Krasucki R, Lasseter VK, Laurent C, Lawrence J, Lencz T, Lerer FB, Liang KY, Lichtenstein P, Lieberman JA, Linszen DH, Lönnqvist J, Loughland CM, Maclean AW, Maher BS, Maier W, Mallet J, Malloy P, Mattheisen M, Mattingsdal M, McGhee KA, McGrath JJ, McIntosh A, McLean DE, McQuillin A, Melle I, Michie PT, Milanova V, Morris DW, Mors O, Mortensen PB, Moskvina V, Muglia P, Myin-Germeys I, Nertney DA, Nestadt G, Nielsen J, Nikolov I, Nordentoft M, Norton N, Nöthen MM, O'Dushlaine CT, Olincy A, Olsen L, O'Neill FA, Orntoft TF, Owen MJ, Pantelis C, Papadimitriou G, Pato MT, Peltonen L, Petursson H, Pickard B, Pimm J, Pulver AE, Puri V, Quested D, Quinn EM, Rasmussen HB, Réthelyi JM, Ribble R, Rietschel M, Riley BP, Ruggeri M, Schall U, Schulze TG, Schwab SG, Scott RJ, Shi J, Sigurdsson E, Silverman JM, Spencer CC, Stefansson K, Strange A, Strengman E, Stroup TS, Suvisaari J, Terenius L, Thirumalai S, Thygesen JH, Timm S, Toncheva D, van den Oord E, van Os J, van Winkel R, Veldink J, Walsh D, Wang AG, Wiersma D, Wildenauer DB, Williams HJ, Williams NM, Wormley B, Zammit S, Sullivan PF, O'Donovan MC, Daly MJ, Gejman PV and Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium

    Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA.

    We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (P = 1.6 × 10(-11)) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, P = 7.0 × 10(-9)), ANK3 (rs10994359, P = 2.5 × 10(-8)) and the ITIH3-ITIH4 region (rs2239547, P = 7.8 × 10(-9)).

    Funded by: NCI NIH HHS: R01 CA082659-12; NIGMS NIH HHS: R37 GM047845-21; NIMH NIH HHS: K01 MH085812-03

    Nature genetics 2011;43;10;969-76

  • Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

    Robinson T, Campino SG, Auburn S, Assefa SA, Polley SD, Manske M, MacInnis B, Rockett KA, Maslen GL, Sanders M, Quail MA, Chiodini PL, Kwiatkowski DP, Clark TG and Sutherland CJ

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 077012/Z/05/Z, 090532

    PloS one 2011;6;8;e23204

  • Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania.

    Rogers MB, Hilley JD, Dickens NJ, Wilkes J, Bates PA, Depledge DP, Harris D, Her Y, Herzyk P, Imamura H, Otto TD, Sanders M, Seeger K, Dujardin JC, Berriman M, Smith DF, Hertz-Fowler C and Mottram JC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.

    Leishmania parasites cause a spectrum of clinical pathology in humans ranging from disfiguring cutaneous lesions to fatal visceral leishmaniasis. We have generated a reference genome for Leishmania mexicana and refined the reference genomes for Leishmania major, Leishmania infantum, and Leishmania braziliensis. This has allowed the identification of a remarkably low number of genes or paralog groups (2, 14, 19, and 67, respectively) unique to one species. These were found to be conserved in additional isolates of the same species. We have predicted allelic variation and find that in these isolates, L. major and L. infantum have a surprisingly low number of predicted heterozygous SNPs compared with L. braziliensis and L. mexicana. We used short read coverage to infer ploidy and gene copy numbers, identifying large copy number variations between species, with 200 tandem gene arrays in L. major and 132 in L. mexicana. Chromosome copy number also varied significantly between species, with nine supernumerary chromosomes in L. infantum, four in L. mexicana, two in L. braziliensis, and one in L. major. A significant bias against gene arrays on supernumerary chromosomes was shown to exist, indicating that duplication events occur more frequently on disomic chromosomes. Taken together, our data demonstrate that there is little variation in unique gene content across Leishmania species, but large-scale genetic heterogeneity can result through gene amplification on disomic chromosomes and variation in chromosome number. Increased gene copy number due to chromosome amplification may contribute to alterations in gene expression in response to environmental conditions in the host, providing a genetic basis for disease tropism.

    Funded by: Wellcome Trust: 076355, 085775, 085822

    Genome research 2011;21;12;2129-42

  • Evaluation of the immunogenicity and biological activity of the Citrobacter freundii Vi-CRM197 conjugate as a vaccine for Salmonella enterica serovar Typhi.

    Rondini S, Micoli F, Lanzilao L, Hale C, Saul AJ and Martin LB

    Novartis Vaccines Institute for Global Health, Via Fiorentina 1, 53100 Siena, Italy. simona.rondini@novartis.com

    Typhoid fever remains a major health problem in developing countries. Young children are at high risk, and a vaccine effective for this age group is urgently needed. Purified capsular polysaccharide from Salmonella enterica serovar Typhi (Vi) is licensed as a vaccine, providing 50 to 70% protection in individuals older than 5 years. However, this vaccine is ineffective in infants. Vi conjugated to a carrier protein (i.e., an exoprotein A mutant from Pseudomonas aeruginosa [rEPA]) is highly immunogenic, provides long-term protection, and shows more than 90% protective efficacy in children 2 to 5 years old. Here, we describe an alternative glycoconjugate vaccine for S. Typhi, Vi-CRM(197), where Vi was obtained from Citrobacter freundii WR7011 and CRM(197), the mutant diphtheria toxin protein, was used as the carrier. We investigated the optimization of growth conditions for Vi production from C. freundii WR7011 and the immunogenicity of Vi-CRM(197) conjugates in mice. The optimal saccharide/protein ratio of the glycoconjugates was identified for the best antibody production. We also demonstrated the ability of this new vaccine to protect mice against challenge with Vi-positive Salmonella enterica serovar Typhimurium.

    Funded by: Wellcome Trust

    Clinical and vaccine immunology : CVI 2011;18;3;460-8

  • Integrating genome-wide genetic variations and monocyte expression data reveals trans-regulated gene modules in humans.

    Rotival M, Zeller T, Wild PS, Maouche S, Szymczak S, Schillert A, Castagné R, Deiseroth A, Proust C, Brocheton J, Godefroy T, Perret C, Germain M, Eleftheriadis M, Sinning CR, Schnabel RB, Lubos E, Lackner KJ, Rossmann H, Münzel T, Rendon A, Cardiogenics Consortium, Erdmann J, Deloukas P, Hengstenberg C, Diemert P, Montalescot G, Ouwehand WH, Samani NJ, Schunkert H, Tregouet DA, Ziegler A, Goodall AH, Cambien F, Tiret L and Blankenberg S

    INSERM UMRS 937, Pierre and Marie Curie University (UPMC, Paris 6) and Medical School, Paris, France.

    One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs) have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns-independent component analysis-to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739), previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1) is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178), which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644) was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the mechanisms linking genome-wide association loci to disease.

    Funded by: British Heart Foundation

    PLoS genetics 2011;7;12;e1002367

  • The explosive-degrading cytochrome P450 XplA: biochemistry, structural features and prospects for bioremediation.

    Rylott EL, Jackson RG, Sabbadin F, Seth-Smith HM, Edwards J, Chong CS, Strand SE, Grogan G and Bruce NC

    CNAP, Department of Biology, University of York, PO Box 373, York YO105YW, UK.

    XplA is a cytochrome P450 that mediates the microbial metabolism of the military explosive hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX). It has an unusual structural organisation comprising a heme domain that is fused to its flavodoxin redox partner. XplA along with its partnering reductase XplB are plasmid encoded and the gene xplA has now been found in divergent genera across the globe with near sequence identity. Importantly, it has only been detected at explosives contaminated sites suggesting rapid dissemination of this novel catabolic activity, possibly within the 50-year period since the introduction of RDX into the environment. The X-ray structure of XplA-heme has been solved, providing fundamental information on the heme binding site. Interestingly, oxygen is not required for the degradation of RDX, but its presence determines the final degradation products, demonstrating that the degradation chemistry is flexible with both anaerobic and aerobic pathways resulting in the release of nitrite from the substrate. Transgenic plants expressing xplA are able to remove saturating levels of RDX from soil leachate and may provide a low cost sustainable remediation strategy for contaminated military sites.

    Funded by: Biotechnology and Biological Sciences Research Council

    Biochimica et biophysica acta 2011;1814;1;230-6

  • You cannot B. cereus.

    Salter SJ

    This month's Genome Watch looks at the different Bacillus species that can cause anthrax.

    Nature reviews. Microbiology 2011;9;2;83

  • A 'clap' for in silico studies.

    Sanchez-Flores A

    Nature reviews. Microbiology 2011;9;1;7

  • A new piece of the eukaryotic puzzle.

    Sanchez-Flores A

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. microbes@sanger.ac.uk

    Nature reviews. Microbiology 2011;9;11;769

  • Emergent neutrality in adaptive asexual evolution.

    Schiffels S, Szöllosi GJ, Mustonen V and Lässig M

    Institut für Theoretische Physik, Universität zu Köln, 50937 Köln, Germany.

    In nonrecombining genomes, genetic linkage can be an important evolutionary force. Linkage generates interference interactions, by which simultaneously occurring mutations affect each other's chance of fixation. Here, we develop a comprehensive model of adaptive evolution in linked genomes, which integrates interference interactions between multiple beneficial and deleterious mutations into a unified framework. By an approximate analytical solution, we predict the fixation rates of these mutations, as well as the probabilities of beneficial and deleterious alleles at fixed genomic sites. We find that interference interactions generate a regime of emergent neutrality: all genomic sites with selection coefficients smaller in magnitude than a characteristic threshold have nearly random fixed alleles, and both beneficial and deleterious mutations at these sites have nearly neutral fixation rates. We show that this dynamic limits not only the speed of adaptation, but also a population's degree of adaptation in its current environment. We apply the model to different scenarios: stationary adaptation in a time-dependent environment and approach to equilibrium in a fixed environment. In both cases, the analytical predictions are in good agreement with numerical simulations. Our results suggest that interference can severely compromise biological functions in an adapting population, which sets viability limits on adaptive evolution under linkage.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;189;4;1361-75

  • The human transcriptome during nontyphoid Salmonella and HIV coinfection reveals attenuated NFkappaB-mediated inflammation and persistent cell cycle disruption.

    Schreiber F, Lynn DJ, Houston A, Peters J, Mwafulirwa G, Finlay BB, Brinkman FS, Hancock RE, Heyderman RS, Dougan G and Gordon MA

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Background: Invasive nontyphoid Salmonella (iNTS) disease is common and severe in adults with human immunodeficiency virus (HIV) infection in Africa. We previously observed that ex vivo macrophages from HIV-infected subjects challenged with Salmonella Typhimurium exhibit dysregulated proinflammatory cytokine responses.

    Methods: We studied the transcriptional response in whole blood from HIV-positive patients during acute and convalescent iNTS disease compared to other invasive bacterial diseases, and to HIV-positive and -negative controls.

    Results: During iNTS disease, there was a remarkable lack of a coordinated inflammatory or innate immune signaling response. Few interferon γ (IFNγ)-induced genes or Toll-like receptor/transcription factor nuclear factor κB (TLR/NFκB) gene pathways were upregulated in expression. Ex vivo lipopolysacharide (LPS) or flagellin stimulation of whole blood, however, showed that convalescent iNTS subjects and controls were competent to mount prominent TLR/NFκB-associated patterns of mRNA expression. In contrast, HIV-positive patients with other invasive bacterial infections (Escherichia coli and Streptococcus pneumoniae) displayed a pronounced proinflammatory innate immune transcriptional response. There was also upregulated mRNA expression in cell cycle, DNA replication, translation and repair, and viral replication pathways during iNTS. These patterns persisted for up to 2 months into convalescence.

    Conclusions: Attenuation of NFκB-mediated inflammation and dysregulation of cell cycle and DNA-function gene pathway expression are key features of the interplay between iNTS and HIV.

    Funded by: Canadian Institutes of Health Research: 419; Wellcome Trust

    The Journal of infectious diseases 2011;204;8;1237-45

  • Genome-wide association and genetic functional studies identify autism susceptibility candidate 2 gene (AUTS2) in the regulation of alcohol consumption.

    Schumann G, Coin LJ, Lourdusamy A, Charoen P, Berger KH, Stacey D, Desrivières S, Aliev FA, Khan AA, Amin N, Aulchenko YS, Bakalkin G, Bakker SJ, Balkau B, Beulens JW, Bilbao A, de Boer RA, Beury D, Bots ML, Breetvelt EJ, Cauchi S, Cavalcanti-Proença C, Chambers JC, Clarke TK, Dahmen N, de Geus EJ, Dick D, Ducci F, Easton A, Edenberg HJ, Esko T, Esk T, Fernández-Medarde A, Foroud T, Freimer NB, Girault JA, Grobbee DE, Guarrera S, Gudbjartsson DF, Hartikainen AL, Heath AC, Hesselbrock V, Hofman A, Hottenga JJ, Isohanni MK, Kaprio J, Khaw KT, Kuehnel B, Laitinen J, Lobbens S, Luan J, Mangino M, Maroteaux M, Matullo G, McCarthy MI, Mueller C, Navis G, Numans ME, Núñez A, Nyholt DR, Onland-Moret CN, Oostra BA, O'Reilly PF, Palkovits M, Penninx BW, Polidoro S, Pouta A, Prokopenko I, Ricceri F, Santos E, Smit JH, Soranzo N, Song K, Sovio U, Stumvoll M, Surakk I, Thorgeirsson TE, Thorsteinsdottir U, Troakes C, Tyrfingsson T, Tönjes A, Uiterwaal CS, Uitterlinden AG, van der Harst P, van der Schouw YT, Staehlin O, Vogelzangs N, Vollenweider P, Waeber G, Wareham NJ, Waterworth DM, Whitfield JB, Wichmann EH, Willemsen G, Witteman JC, Yuan X, Zhai G, Zhao JH, Zhang W, Martin NG, Metspalu A, Doering A, Scott J, Spector TD, Loos RJ, Boomsma DI, Mooser V, Peltonen L, Stefansson K, van Duijn CM, Vineis P, Sommer WH, Kooner JS, Spanagel R, Heberlein UA, Jarvelin MR and Elliott P

    Medical Research Council-Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College, London SE5 8AF, United Kingdom. gunter.schumann@kcl.ac.uk

    Alcohol consumption is a moderately heritable trait, but the genetic basis in humans is largely unknown, despite its clinical and societal importance. We report a genome-wide association study meta-analysis of ∼2.5 million directly genotyped or imputed SNPs with alcohol consumption (gram per day per kilogram body weight) among 12 population-based samples of European ancestry, comprising 26,316 individuals, with replication genotyping in an additional 21,185 individuals. SNP rs6943555 in autism susceptibility candidate 2 gene (AUTS2) was associated with alcohol consumption at genome-wide significance (P = 4 × 10(-8) to P = 4 × 10(-9)). We found a genotype-specific expression of AUTS2 in 96 human prefrontal cortex samples (P = 0.026) and significant (P < 0.017) differences in expression of AUTS2 in whole-brain extracts of mice selected for differences in voluntary alcohol consumption. Down-regulation of an AUTS2 homolog caused reduced alcohol sensitivity in Drosophila (P < 0.001). Our finding of a regulator of alcohol consumption adds knowledge to our understanding of genetic mechanisms influencing alcohol drinking behavior.

    Funded by: NIAAA NIH HHS: K05 AA017688-04

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;17;7119-24

  • Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.

    Schunkert H, König IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, Preuss M, Stewart AF, Barbalic M, Gieger C, Absher D, Aherrahrou Z, Allayee H, Altshuler D, Anand SS, Andersen K, Anderson JL, Ardissino D, Ball SG, Balmforth AJ, Barnes TA, Becker DM, Becker LC, Berger K, Bis JC, Boekholdt SM, Boerwinkle E, Braund PS, Brown MJ, Burnett MS, Buysschaert I, Cardiogenics, Carlquist JF, Chen L, Cichon S, Codd V, Davies RW, Dedoussis G, Dehghan A, Demissie S, Devaney JM, Diemert P, Do R, Doering A, Eifert S, Mokhtari NE, Ellis SG, Elosua R, Engert JC, Epstein SE, de Faire U, Fischer M, Folsom AR, Freyer J, Gigante B, Girelli D, Gretarsdottir S, Gudnason V, Gulcher JR, Halperin E, Hammond N, Hazen SL, Hofman A, Horne BD, Illig T, Iribarren C, Jones GT, Jukema JW, Kaiser MA, Kaplan LM, Kastelein JJ, Khaw KT, Knowles JW, Kolovou G, Kong A, Laaksonen R, Lambrechts D, Leander K, Lettre G, Li M, Lieb W, Loley C, Lotery AJ, Mannucci PM, Maouche S, Martinelli N, McKeown PP, Meisinger C, Meitinger T, Melander O, Merlini PA, Mooser V, Morgan T, Mühleisen TW, Muhlestein JB, Münzel T, Musunuru K, Nahrstaedt J, Nelson CP, Nöthen MM, Olivieri O, Patel RS, Patterson CC, Peters A, Peyvandi F, Qu L, Quyyumi AA, Rader DJ, Rallidis LS, Rice C, Rosendaal FR, Rubin D, Salomaa V, Sampietro ML, Sandhu MS, Schadt E, Schäfer A, Schillert A, Schreiber S, Schrezenmeir J, Schwartz SM, Siscovick DS, Sivananthan M, Sivapalaratnam S, Smith A, Smith TB, Snoep JD, Soranzo N, Spertus JA, Stark K, Stirrups K, Stoll M, Tang WH, Tennstedt S, Thorgeirsson G, Thorleifsson G, Tomaszewski M, Uitterlinden AG, van Rij AM, Voight BF, Wareham NJ, Wells GA, Wichmann HE, Wild PS, Willenborg C, Witteman JC, Wright BJ, Ye S, Zeller T, Ziegler A, Cambien F, Goodall AH, Cupples LA, Quertermous T, März W, Hengstenberg C, Blankenberg S, Ouwehand WH, Hall AS, Deloukas P, Thompson JR, Stefansson K, Roberts R, Thorsteinsdottir U, O'Donnell CJ, McPherson R, Erdmann J, CARDIoGRAM Consortium and Samani NJ

    Universität zu Lübeck, Medizinische Klinik II, Lübeck, Germany.

    We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 × 10⁻⁸ and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.

    Funded by: British Heart Foundation: PG/08/094/26019, RG/08/014/24067; Medical Research Council: G0401527, G0801566, G1000143, MC_U106179471; NHLBI NIH HHS: HL087647, N01 HC025195, R01 HL087647, R01HL089650-02

    Nature genetics 2011;43;4;333-8

  • A role for cohesin in T-cell-receptor rearrangement and thymocyte differentiation.

    Seitan VC, Hao B, Tachibana-Konwalski K, Lavagnolli T, Mira-Bontenbal H, Brown KE, Teng G, Carroll T, Terry A, Horan K, Marks H, Adams DJ, Schatz DG, Aragon L, Fisher AG, Krangel MS, Nasmyth K and Merkenschlager M

    Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK.

    Cohesin enables post-replicative DNA repair and chromosome segregation by holding sister chromatids together from the time of DNA replication in S phase until mitosis. There is growing evidence that cohesin also forms long-range chromosomal cis-interactions and may regulate gene expression in association with CTCF, mediator or tissue-specific transcription factors. Human cohesinopathies such as Cornelia de Lange syndrome are thought to result from impaired non-canonical cohesin functions, but a clear distinction between the cell-division-related and cell-division-independent functions of cohesion--as exemplified in Drosophila--has not been demonstrated in vertebrate systems. To address this, here we deleted the cohesin locus Rad21 in mouse thymocytes at a time in development when these cells stop cycling and rearrange their T-cell receptor (TCR) α locus (Tcra). Rad21-deficient thymocytes had a normal lifespan and retained the ability to differentiate, albeit with reduced efficiency. Loss of Rad21 led to defective chromatin architecture at the Tcra locus, where cohesion-binding sites flank the TEA promoter and the Eα enhancer, and demarcate Tcra from interspersed Tcrd elements and neighbouring housekeeping genes. Cohesin was required for long-range promoter-enhancer interactions, Tcra transcription, H3K4me3 histone modifications that recruit the recombination machinery and Tcra rearrangement. Provision of pre-rearranged TCR transgenes largely rescued thymocyte differentiation, demonstrating that among thousands of potential target genes across the genome, defective Tcra rearrangement was limiting for the differentiation of cohesin-deficient thymocytes. These findings firmly establish a cell-division-independent role for cohesin in Tcra locus rearrangement and provide a comprehensive account of the mechanisms by which cohesin enables cellular differentiation in a well-characterized mammalian system.

    Funded by: Cancer Research UK: 13031; Howard Hughes Medical Institute; Medical Research Council: MC_U120027516, MC_U120081295; NIAID NIH HHS: R37 AI032524, R37 AI032524-20; NIGMS NIH HHS: R37 GM041052, R37 GM041052-22; Wellcome Trust

    Nature 2011;476;7361;467-71

  • Silencing of RhoA nucleotide exchange factor, ARHGEF3, reveals its unexpected role in iron uptake.

    Serbanovic-Canic J, Cvejic A, Soranzo N, Stemple DL, Ouwehand WH and Freson K

    Department of Haematology, University of Cambridge and NHS Blood and Transplant, Cambridge, UK.

    Genomewide association meta-analysis studies have identified > 100 independent genetic loci associated with blood cell indices, including volume and count of platelets and erythrocytes. Although several of these loci encode known regulators of hematopoiesis, the mechanism by which most sequence variants exert their effect on blood cell formation remains elusive. An example is the Rho guanine nucleotide exchange factor, ARHGEF3, which was previously implicated by genomewide association meta-analysis studies in bone cell biology. Here, we report on the unexpected role of ARHGEF3 in regulation of iron uptake and erythroid cell maturation. Although early erythroid differentiation progressed normally, silencing of arhgef3 in Danio rerio resulted in microcytic and hypochromic anemia. This was rescued by intracellular supplementation of iron, showing that arhgef3-depleted erythroid cells are fully capable of hemoglobinization. Disruption of the arhgef3 target, RhoA, also produced severe anemia, which was, again, corrected by iron injection. Moreover, silencing of ARHGEF3 in erythromyeloblastoid cells K562 showed that the uptake of transferrin was severely impaired. Taken together, this is the first study to provide evidence for ARHGEF3 being a regulator of transferrin uptake in erythroid cells, through activation of RHOA.

    Funded by: Wellcome Trust: WT 077037/Z/05/Z, WT077047/Z/05/Z, WT082597/Z/07/Z

    Blood 2011;118;18;4967-76

  • Pneu tricks.

    Seth-Smith H

    This month's Genome Watch looks at how recombination has provided Streptococcus pneumoniae with the adaptability to overcome challenges.

    Nature reviews. Microbiology 2011;9;4;230

  • Genome sequence of the zoonotic pathogen Chlamydophila psittaci.

    Seth-Smith HM, Harris SR, Rance R, West AP, Severin JA, Ossewaarde JM, Cutcliffe LT, Skilton RJ, Marsh P, Parkhill J, Clarke IN and Thomson NR

    Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    We present the first genome sequence of Chlamydophila psittaci, an intracellular pathogen of birds and a human zoonotic pathogen. A comparison with previously sequenced Chlamydophila genomes shows that, as in other chlamydiae, most of the genome diversity is restricted to the plasticity zone. The C. psittaci plasmid was also sequenced.

    Funded by: Wellcome Trust: WT076964

    Journal of bacteriology 2011;193;5;1282-3

  • Indian Siddis: African descendants with Indian admixture.

    Shah AM, Tamang R, Moorjani P, Rani DS, Govindaraj P, Kulkarni G, Bhattacharya T, Mustak MS, Bhaskar LV, Reddy AG, Gadhvi D, Gai PB, Chaubey G, Patterson N, Reich D, Tyler-Smith C, Singh L and Thangaraj K

    Centre for Cellular and Molecular Biology, Council of Scientific and Industrial Research, Hyderabad, India.

    The Siddis (Afro-Indians) are a tribal population whose members live in coastal Karnataka, Gujarat, and in some parts of Andhra Pradesh. Historical records indicate that the Portuguese brought the Siddis to India from Africa about 300-500 years ago; however, there is little information about their more precise ancestral origins. Here, we perform a genome-wide survey to understand the population history of the Siddis. Using hundreds of thousands of autosomal markers, we show that they have inherited ancestry from Africans, Indians, and possibly Europeans (Portuguese). Additionally, analyses of the uniparental (Y-chromosomal and mitochondrial DNA) markers indicate that the Siddis trace their ancestry to Bantu speakers from sub-Saharan Africa. We estimate that the admixture between the African ancestors of the Siddis and neighboring South Asian groups probably occurred in the past eight generations (∼200 years ago), consistent with historical records.

    American journal of human genetics 2011;89;1;154-61

  • Source of the human malaria parasite Plasmodium falciparum.

    Sharp PM, Liu W, Learn GH, Rayner JC, Peeters M and Hahn BH

    Funded by: NIAID NIH HHS: R37 AI050529-11

    Proceedings of the National Academy of Sciences of the United States of America 2011;108;38;E744-5

  • Data mining using the Catalogue of Somatic Mutations in Cancer BioMart.

    Shepherd R, Forbes SA, Beare D, Bamford S, Cole CG, Ward S, Bindal N, Gunasekaran P, Jia M, Kok CY, Leung K, Menzies A, Butler AP, Teague JW, Campbell PJ, Stratton MR and Futreal PA

    Cancer Genome Project, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic) is a publicly available resource providing information on somatic mutations implicated in human cancer. Release v51 (January 2011) includes data from just over 19,000 genes, 161,787 coding mutations and 5573 gene fusions, described in more than 577,000 tumour samples. COSMICMart (COSMIC BioMart) provides a flexible way to mine these data and combine somatic mutations with other biological relevant data sets. This article describes the data available in COSMIC along with examples of how to successfully mine and integrate data sets using COSMICMart. DATABASE URL: http://www.sanger.ac.uk/genetics/CGP/cosmic/biomart/martview/.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Database : the journal of biological databases and curation 2011;2011;bar018

  • Common variants on 8p12 and 1q24.2 confer risk of schizophrenia.

    Shi Y, Li Z, Xu Q, Wang T, Li T, Shen J, Zhang F, Chen J, Zhou G, Ji W, Li B, Xu Y, Liu D, Wang P, Yang P, Liu B, Sun W, Wan C, Qin S, He G, Steinberg S, Cichon S, Werge T, Sigurdsson E, Tosato S, Palotie A, Nöthen MM, Rietschel M, Ophoff RA, Collier DA, Rujescu D, Clair DS, Stefansson H, Stefansson K, Ji J, Wang Q, Li W, Zheng L, Zhang H, Feng G and He L

    Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, China. shiyongyong@gmail.com

    Schizophrenia is a severe mental disorder affecting ∼1% of the world population, with heritability of up to 80%. To identify new common genetic risk factors, we performed a genome-wide association study (GWAS) in the Han Chinese population. The discovery sample set consisted of 3,750 individuals with schizophrenia and 6,468 healthy controls (1,578 cases and 1,592 controls from northern Han Chinese, 1,238 cases and 2,856 controls from central Han Chinese, and 934 cases and 2,020 controls from the southern Han Chinese). We further analyzed the strongest association signals in an additional independent cohort of 4,383 cases and 4,539 controls from the Han Chinese population. Meta-analysis identified common SNPs that associated with schizophrenia with genome-wide significance on 8p12 (rs16887244, P = 1.27 × 10(-10)) and 1q24.2 (rs10489202, P = 9.50 × 10(-9)). Our findings provide new insights into the pathogenesis of schizophrenia.

    Nature genetics 2011;43;12;1224-7

  • A comparative survey of the frequency and distribution of polymorphism in the genome of Xenopus tropicalis.

    Showell C, Carruthers S, Hall A, Pardo-Manuel de Villena F, Stemple D and Conlon FL

    UNC McAllister Heart Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.

    Naturally occurring DNA sequence variation within a species underlies evolutionary adaptation and can give rise to phenotypic changes that provide novel insight into biological questions. This variation exists in laboratory populations just as in wild populations and, in addition to being a source of useful alleles for genetic studies, can impact efforts to identify induced mutations in sequence-based genetic screens. The Western clawed frog Xenopus tropicalis (X. tropicalis) has been adopted as a model system for studying the genetic control of embryonic development and a variety of other areas of research. Its diploid genome has been extensively sequenced and efforts are underway to isolate mutants by phenotype- and genotype-based approaches. Here, we describe a study of genetic polymorphism in laboratory strains of X. tropicalis. Polymorphism was detected in the coding and non-coding regions of developmental genes distributed widely across the genome. Laboratory strains exhibit unexpectedly high frequencies of genetic polymorphism, with alleles carrying a variety of synonymous and non-synonymous codon substitutions and nucleotide insertions/deletions. Inter-strain comparisons of polymorphism uncover a high proportion of shared alleles between Nigerian and Ivory Coast strains, in spite of their distinct geographical origins. These observations will likely influence the design of future sequence-based mutation screens, particularly those using DNA mismatch-based detection methods which can be disrupted by the presence of naturally occurring sequence variants. The existence of a significant reservoir of alleles also suggests that existing laboratory stocks may be a useful source of novel alleles for mapping and functional studies.

    Funded by: NHLBI NIH HHS: HL089641; NICHD NIH HHS: HD054354; PHS HHS: DEO18825

    PloS one 2011;6;8;e22392

  • The tammar wallaby major histocompatibility complex shows evidence of past genomic instability.

    Siddle HV, Deakin JE, Coggill P, Whilming LG, Harrow J, Kaufman J, Beck S and Belov K

    Faculty of Veterinary Science, University of Sydney, NSW 2006, Australia.

    Background: The major histocompatibility complex (MHC) is a group of genes with a variety of roles in the innate and adaptive immune responses. MHC genes form a genetically linked cluster in eutherian mammals, an organization that is thought to confer functional and evolutionary advantages to the immune system. The tammar wallaby (Macropus eugenii), an Australian marsupial, provides a unique model for understanding MHC gene evolution, as many of its antigen presenting genes are not linked to the MHC, but are scattered around the genome.

    Results: Here we describe the 'core' tammar wallaby MHC region on chromosome 2q by ordering and sequencing 33 BAC clones, covering over 4.5 MB and containing 129 genes. When compared to the MHC region of the South American opossum, eutherian mammals and non-mammals, the wallaby MHC has a novel gene organization. The wallaby has undergone an expansion of MHC class II genes, which are separated into two clusters by the class III genes. The antigen processing genes have undergone duplication, resulting in two copies of TAP1 and three copies of TAP2. Notably, Kangaroo Endogenous Retroviral Elements are present within the region and may have contributed to the genomic instability.

    Conclusions: The wallaby MHC has been extensively remodeled since the American and Australian marsupials last shared a common ancestor. The instability is characterized by the movement of antigen presenting genes away from the core MHC, most likely via the presence and activity of retroviral elements. We propose that the movement of class II genes away from the ancestral class II region has allowed this gene family to expand and diversify in the wallaby. The duplication of TAP genes in the wallaby MHC makes this species a unique model organism for studying the relationship between MHC gene organization and function.

    Funded by: Wellcome Trust: 084071, 089305

    BMC genomics 2011;12;421

  • Identification of candidate genes linking systemic inflammation to atherosclerosis; results of a human in vivo LPS infusion study.

    Sivapalaratnam S, Farrugia R, Nieuwdorp M, Langford CF, van Beem RT, Maiwald S, Zwaginga JJ, Gusnanto A, Watkins NA, Trip MD and Ouwehand WH

    Department of Vascular Medicine, Academic Medical Center, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands. s.sivapalaratnam@amc.uva.nl

    Background: It is widely accepted that atherosclerosis and inflammation are intimately linked. Monocytes play a key role in both of these processes and we hypothesized that activation of inflammatory pathways in monocytes would lead to, among others, proatherogenic changes in the monocyte transcriptome. Such differentially expressed genes in circulating monocytes would be strong candidates for further investigation in disease association studies.

    Methods: Endotoxin, lipopolysaccharide (LPS), or saline control was infused in healthy volunteers. Monocyte RNA was isolated, processed and hybridized to Hver 2.1.1 spotted cDNA microarrays. Differential expression of key genes was confirmed by RT-PCR and results were compared to in vitro data obtained by our group to identify candidate genes.

    Results: All subjects who received LPS experienced the anticipated clinical response indicating successful stimulation. One hour after LPS infusion, 11 genes were identified as being differentially expressed; 1 down regulated and 10 up regulated. Four hours after LPS infusion, 28 genes were identified as being differentially expressed; 3 being down regulated and 25 up regulated. No genes were significantly differentially expressed following saline infusion. Comparison with results obtained in in vitro experiments lead to the identification of 6 strong candidate genes (BATF, BID, C3aR1, IL1RN, SEC61B and SLC43A3)

    Conclusion: In vivo endotoxin exposure of healthy individuals resulted in the identification of several candidate genes through which systemic inflammation links to atherosclerosis.

    BMC medical genomics 2011;4;64