Sanger Institute - Publications 2009

Number of papers published in 2009: 145

  • Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2.

    Ahmed S, Thomas G, Ghoussaini M, Healey CS, Humphreys MK, Platte R, Morrison J, Maranian M, Pooley KA, Luben R, Eccles D, Evans DG, Fletcher O, Johnson N, dos Santos Silva I, Peto J, Stratton MR, Rahman N, Jacobs K, Prentice R, Anderson GL, Rajkovic A, Curb JD, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, Diver WR, Bojesen S, Nordestgaard BG, Flyger H, Dörk T, Schürmann P, Hillemanns P, Karstens JH, Bogdanova NV, Antonenkova NN, Zalutsky IV, Bermisheva M, Fedorova S, Khusnutdinova E, SEARCH, Kang D, Yoo KY, Noh DY, Ahn SH, Devilee P, van Asperen CJ, Tollenaar RA, Seynaeve C, Garcia-Closas M, Lissowska J, Brinton L, Peplonska B, Nevanlinna H, Heikkinen T, Aittomäki K, Blomqvist C, Hopper JL, Southey MC, Smith L, Spurdle AB, Schmidt MK, Broeks A, van Hien RR, Cornelissen S, Milne RL, Ribas G, González-Neira A, Benitez J, Schmutzler RK, Burwinkel B, Bartram CR, Meindl A, Brauch H, Justenhoven C, Hamann U, GENICA Consortium, Chang-Claude J, Hein R, Wang-Gohrke S, Lindblom A, Margolin S, Mannermaa A, Kosma VM, Kataja V, Olson JE, Wang X, Fredericksen Z, Giles GG, Severi G, Baglietto L, English DR, Hankinson SE, Cox DG, Kraft P, Vatten LJ, Hveem K, Kumle M, Sigurdson A, Doody M, Bhatti P, Alexander BH, Hooning MJ, van den Ouweland AM, Oldenburg RA, Schutte M, Hall P, Czene K, Liu J, Li Y, Cox A, Elliott G, Brock I, Reed MW, Shen CY, Yu JC, Hsu GC, Chen ST, Anton-Culver H, Ziogas A, Andrulis IL, Knight JA, kConFab, Australian Ovarian Cancer Study Group, Beesley J, Goode EL, Couch F, Chenevix-Trench G, Hoover RN, Ponder BA, Hunter DJ, Pharoah PD, Dunning AM, Chanock SJ and Easton DF

    Department of Oncology, University of Cambridge, UK.

    Genome-wide association studies (GWAS) have identified seven breast cancer susceptibility loci, but these explain only a small fraction of the familial risk of the disease. Five of these loci were identified through a two-stage GWAS involving 390 familial cases and 364 controls in the first stage, and 3,990 cases and 3,916 controls in the second stage. To identify additional loci, we tested over 800 promising associations from this GWAS in a further two stages involving 37,012 cases and 40,069 controls from 33 studies in the CGEMS collaboration and Breast Cancer Association Consortium. We found strong evidence for additional susceptibility loci on 3p (rs4973768: per-allele OR = 1.11, 95% CI = 1.08-1.13, P = 4.1 x 10(-23)) and 17q (rs6504950: per-allele OR = 0.95, 95% CI = 0.92-0.97, P = 1.4 x 10(-8)). Potential causative genes include SLC4A7 and NEK10 on 3p and COX11 on 17q.

    Funded by: Cancer Research UK: 10118, 11021, A10123, C1287/A10118, C1287/A5260, C1287/A7497, C490/A11021; Intramural NIH HHS; NCI NIH HHS: 5UO1CA098233, CA-06-503, CA-58860, CA-92044, CA-95-011, CA49449, CA50385, CA65725, CA67262, CA87969, P30 CA062203, P50 CA116201, R01 CA102740-01A2, R01 CA104021-04, R01 CA122340, U01 CA69398, U01 CA69417, U01 CA69446, U01 CA69467, U01 CA69631, U01 CA69638, UO1 CA098710, UO1 CA69467

    Nature genetics 2009;41;5;585-90

  • Genetic diversity amongst isolates of Neospora caninum, and the development of a multiplex assay for the detection of distinct strains.

    Al-Qassab S, Reichel MP, Ivens A and Ellis JT

    Department of Medical and Molecular Biosciences, University of Technology, Sydney, P.O. Box 123, Broadway, New South Wales 2007, Australia.

    Infection with Neospora caninum is regarded as a significant cause of abortion in cattle. Despite the economic impact of this infection, relatively little is known about the biology of this parasite. In this study, mini and microsatellite DNAs were detected in the genome of N. caninum and eight loci were identified that each contained repetitive DNA which was polymorphic among different isolates of this parasite. A multiplex PCR assay was developed for the detection of genetic variation within N. caninum based on length polymorphism associated with three different repetitive markers. The utility of the multiplex PCR was demonstrated in that it was able to distinguish amongst strains of N. caninum used as either vaccine or challenge strains in animal vaccination experiments and that it could genotype N. caninum associated with naturally acquired infections of animals. The multiplex PCR is simple, rapid, informative and sensitive and should provide a valuable tool for further studies on the epidemiology of N. caninum in different host species.

    Molecular and cellular probes 2009;23;3-4;132-9

  • SnoopCGH: software for visualizing comparative genomic hybridization data.

    Almagro-Garcia J, Manske M, Carret C, Campino S, Auburn S, Macinnis BL, Maslen G, Pain A, Newbold CI, Kwiatkowski DP and Clark TG

    Wellcome Trust Sanger Institute, Hinxton, The Weatherall Institute of Molecular Medicine and Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Unlabelled: Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data.

    Availability and implementation: SnoopCGH is written in java and is available from

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;20;2732-3

  • Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.

    Amid C, Rehaume LM, Brown KL, Gilbert JG, Dougan G, Hancock RE and Harrow JL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.

    Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.

    Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.

    Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198

    BMC genomics 2009;10;606

  • Testing for rare variant associations in complex diseases.

    Asimit J and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    The study of rare variants holds the promise of accounting for some of the missing heritability in complex traits. Next-generation sequencing technologies enable probing of variation across the full spectrum of allele frequencies. Multiple methods for the analysis of rare variants have been proposed and, recently, Ionita-Laza et al. have presented an approach with the theoretical capacity to detect risk and protective variants. The identification of rare risk variants could have major implications in understanding complex disease etiopathogenesis.

    Genome medicine 2009;1;11;24

  • ABACAS: algorithm-based automatic contiguation of assembled sequences.

    Assefa S, Keane TM, Otto TD, Newbold C and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.

    Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated.

    Availability and implementation: ABACAS is implemented in Perl and is freely available for download from

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2009;25;15;1968-9

  • A novel system of polymorphic and diverse NK cell receptors in primates.

    Averdam A, Petersen B, Rosner C, Neff J, Roos C, Eberle M, Aujard F, Münch C, Schempp W, Carrington M, Shiina T, Inoko H, Knaust F, Coggill P, Sehra H, Beck S, Abi-Rached L, Reinhardt R and Walter L

    Department of Primate Genetics, German Primate Centre, Göttingen, Germany.

    There are two main classes of natural killer (NK) cell receptors in mammals, the killer cell immunoglobulin-like receptors (KIR) and the structurally unrelated killer cell lectin-like receptors (KLR). While KIR represent the most diverse group of NK receptors in all primates studied to date, including humans, apes, and Old and New World monkeys, KLR represent the functional equivalent in rodents. Here, we report a first digression from this rule in lemurs, where the KLR (CD94/NKG2) rather than KIR constitute the most diverse group of NK cell receptors. We demonstrate that natural selection contributed to such diversification in lemurs and particularly targeted KLR residues interacting with the peptide presented by MHC class I ligands. We further show that lemurs lack a strict ortholog or functional equivalent of MHC-E, the ligands of non-polymorphic KLR in "higher" primates. Our data support the existence of a hitherto unknown system of polymorphic and diverse NK cell receptors in primates and of combinatorial diversity as a novel mechanism to increase NK cell receptor repertoire.

    Funded by: CCR NIH HHS: HHSN261200800001C; Intramural NIH HHS; NCI NIH HHS: HHSN261200800001E; NIAID NIH HHS: AI 31168, R01 AI031168; PHS HHS: HHSN261200800001E

    PLoS genetics 2009;5;10;e1000688

  • Gene body methylation of the dimethylarginine dimethylamino-hydrolase 2 (Ddah2) gene is an epigenetic biomarker for neural stem cell differentiation.

    Bäckdahl L, Herberth M, Wilson G, Tate P, Campos LS, Cortese R, Eckhardt F and Beck S

    UCL Cancer Institute, University College London, London WC1E 6BT, UK.

    DNA methylation is an important epigenetic mark that is involved in the regulation of many cellular processes such as gene expression, genomic imprinting and silencing of repetitive elements. Because of their ability to cause and capture phenotypic plasticity, epigenetic marks such as DNA methylation represent potential biomarkers to distinguish between different types of tissues and stages of differentiation. Here, we have identified differential DNA methylation in the gene body of the nitric oxide inhibitor Ddah2 that discriminates embryonic stem cells from neural stem cells and is positively correlated with differential gene expression.

    Funded by: Wellcome Trust: WT-084071

    Epigenetics 2009;4;4;248-54

  • Replication analysis identifies TYK2 as a multiple sclerosis susceptibility factor.

    Ban M, Goris A, Lorentzen AR, Baker A, Mihalova T, Ingram G, Booth DR, Heard RN, Stewart GJ, Bogaert E, Dubois B, Harbo HF, Celius EG, Spurkland A, Strange R, Hawkins C, Robertson NP, Dudbridge F, Wason J, De Jager PL, Hafler D, Rioux JD, Ivinson AJ, McCauley JL, Pericak-Vance M, Oksenberg JR, Hauser SL, Sexton D, Haines J, Sawcer S, Wellcome Trust Case-Control Consortium (WTCCC) and Compston A

    Department of Clinical Neuroscience, Addenbrooke's, Hospital, University of Cambridge, Cambridge, UK.

    In a recent genome-wide association study (GWAS) based on 12,374 non-synonymous single nucleotide polymorphisms we identified a number of candidate multiple sclerosis susceptibility genes. Here, we describe the extended analysis of 17 of these loci undertaken using an additional 4234 patients, 2983 controls and 2053 trio families. In the final analysis combining all available data, we found that evidence for association was substantially increased for one of the 17 loci, rs34536443 from the tyrosine kinase 2 (TYK2) gene (P=2.7 x 10(-6), odds ratio=1.32 (1.17-1.47)). This single nucleotide polymorphism results in an amino acid substitution (proline to alanine) in the kinase domain of TYK2, which is predicted to influence the levels of phosphorylation and therefore activity of the protein and so is likely to have a functional role in multiple sclerosis.

    Funded by: Medical Research Council: G0000934, G0600329, G0700061, MC_U105292688; NINDS NIH HHS: NS 049477-01A1, R01 NS049477, R01 NS049477-01A1; Wellcome Trust: 061858, 068545/Z/02, 076113, 085475, 090532

    European journal of human genetics : EJHG 2009;17;10;1309-13

  • Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.

    Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C, Plagnol V, Pociot F, Schuilenburg H, Smyth DJ, Stevens H, Todd JA, Walker NM, Rich SS and Type 1 Diabetes Genetics Consortium

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.

    Type 1 diabetes (T1D) is a common autoimmune disorder that arises from the action of multiple genetic and environmental risk factors. We report the findings of a genome-wide association study of T1D, combined in a meta-analysis with two previously published studies. The total sample set included 7,514 cases and 9,045 reference samples. Forty-one distinct genomic locations provided evidence for association with T1D in the meta-analysis (P < 10(-6)). After excluding previously reported associations, we further tested 27 regions in an independent set of 4,267 cases, 4,463 controls and 2,319 affected sib-pair (ASP) families. Of these, 18 regions were replicated (P < 0.01; overall P < 5 × 10(-8)) and 4 additional regions provided nominal evidence of replication (P < 0.05). The many new candidate genes suggested by these results include IL10, IL19, IL20, GLIS3, CD69 and IL27.

    Funded by: Medical Research Council: G0000934; NIDDK NIH HHS: DK46635, K08 DK002876, K08 DK002876-06, R01 DK046635, R01 DK046635-15, U01 DK062418, U01 DK062418-06; NIMH NIH HHS: MH 63420, MH059565, MH059571, MH059588, MH060879, MH061675, MH067257, MH59566, MH59586, MH59587, MH60870, R01 MH059565, R01 MH059566, R01 MH059571, R01 MH059586, R01 MH059587, R01 MH059588, R01 MH060870, R01 MH060879, R01 MH061675, R01 MH063420, R01 MH067257; Wellcome Trust: 061858, 076113

    Nature genetics 2009;41;6;703-7

  • Neuroproteomics: understanding the molecular organization and complexity of the brain.

    Bayés A and Grant SG

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Advances in technology have equipped the field of neuroproteomics with refined tools for the study of the expression, interaction and function of proteins in the nervous system. In combination with bioinformatics, neuroproteomics can address the organization of dynamic, functional protein networks and macromolecular structures that underlie physiological, anatomical and behavioural processes. Furthermore, neuroproteomics is contributing to the elucidation of disease mechanisms and is a powerful tool for the identification of biomarkers.

    Funded by: Medical Research Council; Wellcome Trust

    Nature reviews. Neuroscience 2009;10;9;635-46

  • The genome of the blood fluke Schistosoma mansoni.

    Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG and El-Sayed NM

    Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK.

    Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

    Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03, D43 TW006580, D43 TW007012; NIAID NIH HHS: AI054711-01A2, AI48828, R01 AI054711, U01 AI048828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM060595, R01 GM083873, R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845, R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z

    Nature 2009;460;7253;352-8

  • Public health. The cholera crisis in Africa.

    Bhattacharya S, Black R, Bourgeois L, Clemens J, Cravioto A, Deen JL, Dougan G, Glass R, Grais RF, Greco M, Gust I, Holmgren J, Kariuki S, Lambert PH, Liu MA, Longini I, Nair GB, Norrby R, Nossal GJ, Ogra P, Sansonetti P, von Seidlein L, Songane F, Svennerholm AM, Steele D and Walker R

    Indian Council of Medical Research, Ansari Nagore, New Delhi, 110029, India.

    Science (New York, N.Y.) 2009;324;5929;885

  • Calcium-dependent signaling and kinases in apicomplexan parasites.

    Billker O, Lourido S and Sibley LD

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Calcium controls many critical events in the complex life cycles of apicomplexan parasites including protein secretion, motility, and development. Calcium levels are normally tightly regulated and rapid release of calcium into the cytosol activates a family of calcium-dependent protein kinases (CDPKs), which are normally characteristic of plants. CDPKs present in apicomplexans have acquired a number of unique domain structures likely reflecting their diverse functions. Calcium regulation in parasites is closely linked to signaling by cyclic nucleotides and their associated kinases. This Review summarizes the pivotal roles that calcium- and cyclic nucleotide-dependent kinases play in unique aspects of parasite biology.

    Funded by: Medical Research Council: G0501670; NIAID NIH HHS: AI34036, R01 AI034036, R01 AI034036-17, R01 AI082423, R01 AI082423-01, R01 AI094098, R21 AI067051

    Cell host & microbe 2009;5;6;612-22

  • Large, rare chromosomal deletions associated with severe early-onset obesity.

    Bochukova EG, Huang N, Keogh J, Henning E, Purmann C, Blaszczyk K, Saeed S, Hamilton-Shield J, Clayton-Smith J, O'Rahilly S, Hurles ME and Farooqi IS

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    Obesity is a highly heritable and genetically heterogeneous disorder. Here we investigated the contribution of copy number variation to obesity in 300 Caucasian patients with severe early-onset obesity, 143 of whom also had developmental delay. Large (>500 kilobases), rare (<1%) deletions were significantly enriched in patients compared to 7,366 controls (P < 0.001). We identified several rare copy number variants that were recurrent in patients but absent or at much lower prevalence in controls. We identified five patients with overlapping deletions on chromosome 16p11.2 that were found in 2 out of 7,366 controls (P < 5 x 10(-5)). In three patients the deletion co-segregated with severe obesity. Two patients harboured a larger de novo 16p11.2 deletion, extending through a 593-kilobase region previously associated with autism and mental retardation; both of these patients had mild developmental delay in addition to severe obesity. In an independent sample of 1,062 patients with severe obesity alone, the smaller 16p11.2 deletion was found in an additional two patients. All 16p11.2 deletions encompass several genes but include SH2B1, which is known to be involved in leptin and insulin signalling. Deletion carriers exhibited hyperphagia and severe insulin resistance disproportionate for the degree of obesity. We show that copy number variation contributes significantly to the genetic architecture of human obesity.

    Funded by: Medical Research Council: G0900554; Wellcome Trust: 077014, 077014/Z/05/0Z, 082390, 082390/Z/07/Z), 085475

    Nature 2009;463;7281;666-70

  • IRS2 variants and syndromes of severe insulin resistance.

    Bottomley WE, Soos MA, Adams C, Guran T, Howlett TA, Mackie A, Miell J, Monson JP, Temple R, Tenenbaum-Rakover Y, Tymms J, Savage DB, Semple RK, O'Rahilly S and Barroso I

    Funded by: Wellcome Trust: 077016, 077016/Z/05/Z, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z

    Diabetologia 2009;52;6;1208-11

  • The genome sequence of taurine cattle: a window to ruminant biology and evolution.

    Bovine Genome Sequencing and Analysis Consortium, Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM, Weinstock GM, Adelson DL, Eichler EE, Elnitski L, Guigó R, Hamernik DL, Kappes SM, Lewin HA, Lynn DJ, Nicholas FW, Reymond A, Rijnkels M, Skow LC, Zdobnov EM, Schook L, Womack J, Alioto T, Antonarakis SE, Astashyn A, Chapple CE, Chen HC, Chrast J, Câmara F, Ermolaeva O, Henrichsen CN, Hlavina W, Kapustin Y, Kiryutin B, Kitts P, Kokocinski F, Landrum M, Maglott D, Pruitt K, Sapojnikov V, Searle SM, Solovyev V, Souvorov A, Ucla C, Wyss C, Anzola JM, Gerlach D, Elhaik E, Graur D, Reese JT, Edgar RC, McEwan JC, Payne GM, Raison JM, Junier T, Kriventseva EV, Eyras E, Plass M, Donthu R, Larkin DM, Reecy J, Yang MQ, Chen L, Cheng Z, Chitko-McKown CG, Liu GE, Matukumalli LK, Song J, Zhu B, Bradley DG, Brinkman FS, Lau LP, Whiteside MD, Walker A, Wheeler TT, Casey T, German JB, Lemay DG, Maqbool NJ, Molenaar AJ, Seo S, Stothard P, Baldwin CL, Baxter R, Brinkmeyer-Langford CL, Brown WC, Childers CP, Connelley T, Ellis SA, Fritz K, Glass EJ, Herzig CT, Iivanainen A, Lahmers KK, Bennett AK, Dickens CM, Gilbert JG, Hagen DE, Salih H, Aerts J, Caetano AR, Dalrymple B, Garcia JF, Gill CA, Hiendleder SG, Memili E, Spurlock D, Williams JL, Alexander L, Brownstein MJ, Guan L, Holt RA, Jones SJ, Marra MA, Moore R, Moore SS, Roberts A, Taniguchi M, Waterman RC, Chacko J, Chandrabose MM, Cree A, Dao MD, Dinh HH, Gabisi RA, Hines S, Hume J, Jhangiani SN, Joshi V, Kovar CL, Lewis LR, Liu YS, Lopez J, Morgan MB, Nguyen NB, Okwuonu GO, Ruiz SJ, Santibanez J, Wright RA, Buhay C, Ding Y, Dugan-Rocha S, Herdandez J, Holder M, Sabo A, Egan A, Goodell J, Wilczek-Boney K, Fowler GR, Hitchens ME, Lozado RJ, Moen C, Steffen D, Warren JT, Zhang J, Chiu R, Schein JE, Durbin KJ, Havlak P, Jiang H, Liu Y, Qin X, Ren Y, Shen Y, Song H, Bell SN, Davis C, Johnson AJ, Lee S, Nazareth LV, Patel BM, Pu LL, Vattathil S, Williams RL, Curry S, Hamilton C, Sodergren E, Wheeler DA, Barris W, Bennett GL, Eggen A, Green RD, Harhay GP, Hobbs M, Jann O, Keele JW, Kent MP, Lien S, McKay SD, McWilliam S, Ratnakumar A, Schnabel RD, Smith T, Snelling WM, Sonstegard TS, Stone RT, Sugimoto Y, Takasuga A, Taylor JF, Van Tassell CP, Macneil MD, Abatepaulo AR, Abbey CA, Ahola V, Almeida IG, Amadio AF, Anatriello E, Bahadue SM, Biase FH, Boldt CR, Carroll JA, Carvalho WA, Cervelatti EP, Chacko E, Chapin JE, Cheng Y, Choi J, Colley AJ, de Campos TA, De Donato M, Santos IK, de Oliveira CJ, Deobald H, Devinoy E, Donohue KE, Dovc P, Eberlein A, Fitzsimmons CJ, Franzin AM, Garcia GR, Genini S, Gladney CJ, Grant JR, Greaser ML, Green JA, Hadsell DL, Hakimov HA, Halgren R, Harrow JL, Hart EA, Hastings N, Hernandez M, Hu ZL, Ingham A, Iso-Touru T, Jamis C, Jensen K, Kapetis D, Kerr T, Khalil SS, Khatib H, Kolbehdari D, Kumar CG, Kumar D, Leach R, Lee JC, Li C, Logan KM, Malinverni R, Marques E, Martin WF, Martins NF, Maruyama SR, Mazza R, McLean KL, Medrano JF, Moreno BT, Moré DD, Muntean CT, Nandakumar HP, Nogueira MF, Olsaker I, Pant SD, Panzitta F, Pastor RC, Poli MA, Poslusny N, Rachagani S, Ranganathan S, Razpet A, Riggs PK, Rincon G, Rodriguez-Osorio N, Rodriguez-Zas SL, Romero NE, Rosenwald A, Sando L, Schmutz SM, Shen L, Sherman L, Southey BR, Lutzow YS, Sweedler JV, Tammen I, Telugu BP, Urbanski JM, Utsunomiya YT, Verschoor CP, Waardenberg AJ, Wang Z, Ward R, Weikard R, Welsh TH, White SN, Wilming LG, Wunderlich KR, Yang J and Zhao FQ

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D524040/2, BBS/B/13438, BBS/B/13446; NHGRI NIH HHS: U54 HG003273, U54 HG003273-04, U54 HG003273-04S1, U54 HG003273-05, U54 HG003273-05S1, U54 HG003273-05S2, U54 HG003273-06, U54 HG003273-06S1, U54 HG003273-06S2, U54 HG003273-07, U54 HG003273-08; NIDA NIH HHS: P30 DA018310; Wellcome Trust: 062023, 077198

    Science (New York, N.Y.) 2009;324;5926;522-8

  • Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds.

    Bovine HapMap Consortium, Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, Green RD, Hamernik DL, Kappes SM, Lien S, Matukumalli LK, McEwan JC, Nazareth LV, Schnabel RD, Weinstock GM, Wheeler DA, Ajmone-Marsan P, Boettcher PJ, Caetano AR, Garcia JF, Hanotte O, Mariani P, Skow LC, Sonstegard TS, Williams JL, Diallo B, Hailemariam L, Martinez ML, Morris CA, Silva LO, Spelman RJ, Mulatu W, Zhao K, Abbey CA, Agaba M, Araujo FR, Bunch RJ, Burton J, Gorni C, Olivier H, Harrison BE, Luff B, Machado MA, Mwakaya J, Plastow G, Sim W, Smith T, Thomas MB, Valentini A, Williams P, Womack J, Woolliams JA, Liu Y, Qin X, Worley KC, Gao C, Jiang H, Moore SS, Ren Y, Song XZ, Bustamante CD, Hernandez RD, Muzny DM, Patil S, San Lucas A, Fu Q, Kent MP, Vega R, Matukumalli A, McWilliam S, Sclep G, Bryc K, Choi J, Gao H, Grefenstette JJ, Murdoch B, Stella A, Villa-Angulo R, Wright M, Aerts J, Jann O, Negrini R, Goddard ME, Hayes BJ, Bradley DG, Barbosa da Silva M, Lau LP, Liu GE, Lynn DJ, Panzitta F and Dodds KG

    The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.

    Funded by: NHGRI NIH HHS: U54 HG003273; NIGMS NIH HHS: R01 GM083606, R01 GM083606-02

    Science (New York, N.Y.) 2009;324;5926;528-32

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.

    Funded by: Wellcome Trust: 077198

    Journal of proteome research 2009;8;6;3176-81

  • Functional diversity for REST (NRSF) is defined by in vivo binding affinity hierarchies at the DNA sequence level.

    Bruce AW, López-Contreras AJ, Flicek P, Down TA, Dhami P, Dillon SC, Koch CM, Langford CF, Dunham I, Andrews RM and Vetrie D

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    The molecular events that contribute to, and result from, the in vivo binding of transcription factors to their cognate DNA sequence motifs in mammalian genomes are poorly understood. We demonstrate that variations within the DNA sequence motifs that bind the transcriptional repressor REST (NRSF) encode in vivo DNA binding affinity hierarchies that contribute to regulatory function during lineage-specific and developmental programs in fundamental ways. First, canonical sequence motifs for REST facilitate strong REST binding and control functional classes of REST targets that are common to all cell types, whilst atypical motifs participate in weak interactions and control those targets, which are cell- or tissue-specific. Second, variations in REST binding relate directly to variations in expression and chromatin configurations of REST's target genes. Third, REST clearance from its binding sites is also associated with variations in the RE1 motif. Finally, and most surprisingly, weak REST binding sites reside in DNA sequences that show the highest levels of constraint through evolution, thus facilitating their roles in maintaining tissue-specific functions. These relationships have never been reported in mammalian systems for any transcription factor.

    Funded by: Wellcome Trust

    Genome research 2009;19;6;994-1005

  • Genome-wide microarray-based comparative genomic hybridization analysis of lymphoplasmacytic lymphomas reveals heterogeneous aberrations.

    Buckley PG, Walsh SH, Laurell A, Sundström C, Roos G, Langford CF, Dumanski JP and Rosenquist R

    Department of Cancer Genetics, Royal College of Surgeons in Ireland, Dublin, Ireland.

    Lymphoplasmacytic lymphoma (LPL) is not a sharply delineated lymphoma entity, either morphologically, phenotypically, or clinically. The diagnosis is often made by excluding other small cell lymphomas with plasmacytic differentiation, thus a genetic diagnostic marker would be of great benefit. Conventional cytogenetic techniques have previously demonstrated a deletion of 6q in a proportion of cases, varying from 7 to 55%. In this report, we apply array-based comparative genomic hybridization on 11 LPL samples. Genomic aberrations were detected in 9 of 11 cases, and included gains and losses. In general, the number of genetic aberrations was relatively low (two to three abnormalities per case). Recurrent aberrations detected were deletion of 6q (two cases), deletion of chromosome 17 (two cases), gain of 3q (two cases), and gain of chromosome 7 (two cases). This report not only confirms the reported loss of 6q in a proportion of cases but also highlights the genetic heterogeneity of LPL, in accordance with the known immunophenotypical, morphological, and clinical diversity of the disease.

    Funded by: Wellcome Trust

    Leukemia & lymphoma 2009;50;9;1528-34

  • The T3SS effector EspT defines a new category of invasive enteropathogenic E. coli (EPEC) which form intracellular actin pedestals.

    Bulgin R, Arbeloa A, Goulding D, Dougan G, Crepin VF, Raymond B and Frankel G

    Centre for Molecular Microbiology and Infection, Division of Cell and Molecular Biology, Imperial College London, London, United Kingdom.

    Enteropathogenic Escherichia coli (EPEC) strains are defined as extracellular pathogens which nucleate actin rich pedestal-like membrane extensions on intestinal enterocytes to which they intimately adhere. EPEC infection is mediated by type III secretion system effectors, which modulate host cell signaling. Recently we have shown that the WxxxE effector EspT activates Rac1 and Cdc42 leading to formation of membrane ruffles and lamellipodia. Here we report that EspT-induced membrane ruffles facilitate EPEC invasion into non-phagocytic cells in a process involving Rac1 and Wave2. Internalized EPEC resides within a vacuole and Tir is localized to the vacuolar membrane, resulting in actin polymerization and formation of intracellular pedestals. To the best of our knowledge this is the first time a pathogen has been shown to induce formation of actin comets across a vacuole membrane. Moreover, our data breaks the dogma of EPEC as an extracellular pathogen and defines a new category of invasive EPEC.

    Funded by: Medical Research Council: G0700823; Wellcome Trust

    PLoS pathogens 2009;5;12;e1000683

  • Evolution of pathogenicity and sexual reproduction in eight Candida genomes.

    Butler G, Rasmussen MD, Lin MF, Santos MA, Sakthikumar S, Munro CA, Rheinbay E, Grabherr M, Forche A, Reedy JL, Agrafioti I, Arnaud MB, Bates S, Brown AJ, Brunke S, Costanzo MC, Fitzpatrick DA, de Groot PW, Harris D, Hoyer LL, Hube B, Klis FM, Kodira C, Lennard N, Logue ME, Martin R, Neiman AM, Nikolaou E, Quail MA, Quinn J, Santos MC, Schmitzberger FF, Sherlock G, Shah P, Silverstein KA, Skrzypek MS, Soll D, Staggs R, Stansfield I, Stumpf MP, Sudbery PE, Srikantha T, Zeng Q, Berman J, Berriman M, Heitman J, Gow NA, Lorenz MC, Birren BW, Kellis M and Cuomo CA

    UCD School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland.

    Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F00513X/1, BB/F013566/1; Medical Research Council: G0400284; NHGRI NIH HHS: R01 HG004037, R01 HG004037-02, U54 HG003067, U54 HG003067-06; NIAID NIH HHS: HHSN266200400001C, R01 AI050113, R01 AI075096; NIDCR NIH HHS: R01 DE015873; Wellcome Trust

    Nature 2009;459;7247;657-62

  • Somatic and germline genetics at the JAK2 locus.

    Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Myeloproliferative neoplasms are hematological malignancies frequently associated with somatically acquired mutation of the JAK2 gene. A new study shows that these mutations are preferentially found within a particular inherited JAK2 haplotype, implying the existence of a strong, but uncharacterized, interaction between somatic and germline genetics at this locus.

    Funded by: Wellcome Trust: 088340

    Nature genetics 2009;41;4;385-6

  • TLR9 polymorphisms in African populations: no association with severe malaria, but evidence of cis-variants acting on gene expression.

    Campino S, Forton J, Auburn S, Fry A, Diakite M, Richardson A, Hull J, Jallow M, Sisay-Joof F, Pinder M, Molyneux ME, Taylor TE, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Background: During malaria infection the Toll-like receptor 9 (TLR9) is activated through induction with plasmodium DNA or another malaria motif not yet identified. Although TLR9 activation by malaria parasites is well reported, the implication to the susceptibility to severe malaria is not clear. The aim of this study was to assess the contribution of genetic variation at TLR9 to severe malaria.

    Methods: This study explores the contribution of TLR9 genetic variants to severe malaria using two approaches. First, an association study of four common single nucleotide polymorphisms was performed on both family- and population-based studies from Malawian and Gambian populations (n>6000 individual). Subsequently, it was assessed whether TLR9 expression is affected by cis-acting variants and if these variants could be mapped. For this work, an allele specific expression (ASE) assay on a panel of HapMap cell lines was carried out.

    Results: No convincing association was found with polymorphisms in TLR9 for malaria severity, in either Gambian or Malawian populations, using both case-control and family based study designs. Using an allele specific expression assay it was observed that TLR9 expression is affected by cis-acting variants, these results were replicated in a second experiment using biological replicates.

    Conclusion: By using the largest cohorts analysed to date, as well as a standardized phenotype definition and study design, no association of TLR9 genetic variants with severe malaria was found. This analysis considered all common variants in the region, but it is remains possible that there are rare variants with association signals. This report also shows that TLR9 expression is potentially modulated through cis-regulatory variants, which may lead to differential inflammatory responses to infection between individuals.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust

    Malaria journal 2009;8;44

  • Genome watch: What a scorcher!

    Cerdeño-Tárraga AM

    This month's Genome Watch looks at the publication of four hyperthermophilic archaeal genomes, three of which belong to the Crenarchaeota phylum and one of which belongs to the newly defined Nanoarchaeota phylum.

    Nature reviews. Microbiology 2009;7;6;408-9

  • Induction of antibody responses to African horse sickness virus (AHSV) in ponies after vaccination with recombinant modified vaccinia Ankara (MVA).

    Chiam R, Sharp E, Maan S, Rao S, Mertens P, Blacklaws B, Davis-Poynter N, Wood J and Castillo-Olivares J

    Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk, United Kingdom.

    Background: African horse sickness virus (AHSV) causes a non-contagious, infectious disease in equids, with mortality rates that can exceed 90% in susceptible horse populations. AHSV vaccines play a crucial role in the control of the disease; however, there are concerns over the use of polyvalent live attenuated vaccines particularly in areas where AHSV is not endemic. Therefore, it is important to consider alternative approaches for AHSV vaccine development. We have carried out a pilot study to investigate the ability of recombinant modified vaccinia Ankara (MVA) vaccines expressing VP2, VP7 or NS3 genes of AHSV to stimulate immune responses against AHSV antigens in the horse.

    Methodology/principal findings: VP2, VP7 and NS3 genes from AHSV-4/Madrid87 were cloned into the vaccinia transfer vector pSC11 and recombinant MVA viruses generated. Antigen expression or transcription of the AHSV genes from cells infected with the recombinant viruses was confirmed. Pairs of ponies were vaccinated with MVAVP2, MVAVP7 or MVANS3 and both MVA vector and AHSV antigen-specific antibody responses were analysed. Vaccination with MVAVP2 induced a strong AHSV neutralising antibody response (VN titre up to a value of 2). MVAVP7 also induced AHSV antigen-specific responses, detected by western blotting. NS3 specific antibody responses were not detected.

    Conclusions: This pilot study demonstrates the immunogenicity of recombinant MVA vectored AHSV vaccines, in particular MVAVP2, and indicates that further work to investigate whether these vaccines would confer protection from lethal AHSV challenge in the horse is justifiable.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/00654

    PloS one 2009;4;6;e5997

  • Lineage-specific biology revealed by a finished genome assembly of the mouse.

    Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, Bult CJ, Agarwala R, Cherry JL, DiCuccio M, Hlavina W, Kapustin Y, Meric P, Maglott D, Birtle Z, Marques AC, Graves T, Zhou S, Teague B, Potamousis K, Churas C, Place M, Herschleb J, Runnheim R, Forrest D, Amos-Landgraf J, Schwartz DC, Cheng Z, Lindblad-Toh K, Eichler EE, Ponting CP and Mouse Genome Sequencing Consortium

    National Center for Biotechnology Information, Bethesda, Maryland, United States of America.

    The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.

    Funded by: Medical Research Council: MC_U127561112, MC_U137761446, MC_U142684174; NHGRI NIH HHS: HG002385, R01 HG002385

    PLoS biology 2009;7;5;e1000112

  • Tumor necrosis factor and lymphotoxin-alpha polymorphisms and severe malaria in African populations.

    Clark TG, Diakite M, Auburn S, Campino S, Fry AE, Green A, Richardson A, Small K, Teo YY, Wilson J, Jallow M, Sisay-Joof F, Pinder M, Griffiths MJ, Peshu N, Williams TN, Marsh K, Molyneux ME, Taylor TE, Rockett KA and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Nuffield Department of Medicine, John Radcliffe Hospital, Oxford, United Kingdom.

    The tumor necrosis factor gene (TNF) and lymphotoxin-alpha gene (LTA) have long attracted attention as candidate genes for susceptibility traits for malaria, and several of their polymorphisms have been found to be associated with severe malaria (SM) phenotypes. In a large study involving >10,000 individuals and encompassing 3 African populations, we found evidence to support the reported associations between the TNF -238 polymorphism and SM in The Gambia. However, no TNF/LTA polymorphisms were found to be associated with SM in cohorts in Kenya and Malawi. It has been suggested that the causal polymorphisms regulating the TNF and LTA responses may be located some distance from the genes. Therefore, more-detailed mapping of variants across TNF/LTA genes and their flanking regions in the Gambian and allied populations may need to be undertaken to find any causal polymorphisms.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 076934

    The Journal of infectious diseases 2009;199;4;569-75

  • Neurotransmitters drive combinatorial multistate postsynaptic density networks.

    Coba MP, Pocklington AJ, Collins MO, Kopanitsa MV, Uren RT, Swamy S, Croning MD, Choudhary JS and Grant SG

    Genes to Cognition, Wellcome Trust Sanger Institute, Cambridgeshire, UK.

    The mammalian postsynaptic density (PSD) comprises a complex collection of approximately 1100 proteins. Despite extensive knowledge of individual proteins, the overall organization of the PSD is poorly understood. Here, we define maps of molecular circuitry within the PSD based on phosphorylation of postsynaptic proteins. Activation of a single neurotransmitter receptor, the N-methyl-D-aspartate receptor (NMDAR), changed the phosphorylation status of 127 proteins. Stimulation of ionotropic and metabotropic glutamate receptors and dopamine receptors activated overlapping networks with distinct combinatorial phosphorylation signatures. Using peptide array technology, we identified specific phosphorylation motifs and switching mechanisms responsible for the integration of neurotransmitter receptor pathways and their coordination of multiple substrates in these networks. These combinatorial networks confer high information-processing capacity and functional diversity on synapses, and their elucidation may provide new insights into disease mechanisms and new opportunities for drug discovery.

    Funded by: Medical Research Council: G0801418, G90/93; Wellcome Trust: 066717

    Science signaling 2009;2;68;ra19

  • Origins and functional impact of copy number variation in the human genome.

    Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK.

    Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

    Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: HG004221, P41 HG004221; NIGMS NIH HHS: GM081533, R01 GM081533; Wellcome Trust: 077006/Z/05/Z, 077008, 077009, 077014, 088340

    Nature 2009;464;7289;704-12

  • Large scale association analysis of novel genetic loci for coronary artery disease.

    Coronary Artery Disease Consortium, Samani NJ, Deloukas P, Erdmann J, Hengstenberg C, Kuulasmaa K, McGinnis R, Schunkert H, Soranzo N, Thompson J, Tiret L and Ziegler A

    Background: Combined analysis of 2 genome-wide association studies in cases enriched for family history recently identified 7 loci (on 1p13.3, 1q41, 2q36.3, 6q25.1, 9p21, 10q11.21, and 15q22.33) that may affect risk of coronary artery disease (CAD). Apart from the 9p21 locus, the other loci await substantive replication. Furthermore, the effect of these loci on CAD risk in a broader range of individuals remains to be determined.

    Methods and results: We undertook association analysis of single nucleotide polymorphisms at each locus with CAD risk in 11,550 cases and 11,205 controls from 9 European studies. The 9p21.3 locus showed unequivocal association (rs1333049, combined odds ratio [OR]=1.20, 95% CI [1.16 to 1.25], probability value=2.81 x 10(-21)). We also confirmed association signals at 1p13.3 (rs599839, OR=1.13 [1.08 to 1.19], P=1.44 x 10(-7)), 1q41 (rs3008621, OR=1.10 [1.04 to 1.17], P=1.02 x 10(-3)), and 10q11.21 (rs501120, OR=1.11 [1.05 to 1.18], P=4.34 x 10(-4)). The associations with 6q25.1 (rs6922269, P=0.020) and 2q36.3 (rs2943634, P=0.032) were borderline and not statistically significant after correction for multiple testing. The 15q22.33 locus did not replicate. The 10q11.21 locus showed a possible sex interaction (P=0.015), with a significant effect in women (OR=1.29 [1.15 to 1.45], P=1.86 x 10(-5)) but not men (OR=1.03 [0.96 to 1.11], P=0.387). There were no other strong interactions of any of the loci with other traditional risk factors. The loci at 9p21, 1p13.3, 2q36.3, and 10q11.21 acted independently and cumulatively increased CAD risk by 15% (12% to 18%), per additional risk allele.

    Conclusions: The findings provide strong evidence for association between at least 4 genetic loci and CAD risk. Cumulatively, these novel loci have a significant impact on risk of CAD at least in European populations.

    Funded by: British Heart Foundation: CH/03/001/15569, RG/08/014/24067; Medical Research Council: G0401527, G0701863, MC_U106179471; Wellcome Trust: 077011, 082371, 091746

    Arteriosclerosis, thrombosis, and vascular biology 2009;29;5;774-80

  • From small reads do mighty genomes grow.

    Croucher NJ

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    This month's Genome Watch discusses the use of next-generation sequencing technologies to assemble draft genomes for two pseudomonad species.

    Nature reviews. Microbiology 2009;7;9;621

  • X-box binding protein 1 contributes to induction of the Kaposi's sarcoma-associated herpesvirus lytic cycle under hypoxic conditions.

    Dalton-Griffin L, Wilson SJ and Kellam P

    Department of Infection, UCL, London, United Kingdom.

    Kaposi's sarcoma-associated herpesvirus (KSHV), like other herpesviruses, has two stages to its life cycle: latency and lytic replication. KSHV is required for development of Kaposi's sarcoma, a tumor of endothelial origin, and is associated with the B-cell tumor primary effusion lymphoma (PEL) and the plasmablastic variant of multicentric Castleman's disease, all of which are characterized by predominantly latent KSHV infection. Recently, we and others have shown that the activated form of transcription factor X-box binding protein 1 (XBP-1) is a physiological trigger of KSHV lytic reactivation in PEL. Here, we show that XBP-1s transactivates the ORF50/RTA promoter though an ACGT core containing the XBP-1 response element, an element previously identified as a weakly active hypoxia response element (HRE). Hypoxia induces the KSHV lytic cycle, and active HREs that respond to hypoxia-inducible factor 1alpha are present in the ORF50/RTA promoter. Hypoxia also induces active XBP-1s, and here, we show that both transcription factors contribute to the induction of RTA expression, leading to the production of infectious KSHV under hypoxic conditions.

    Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust

    Journal of virology 2009;83;14;7202-9

  • A truncation mutation in TBC1D4 in a family with acanthosis nigricans and postprandial hyperinsulinemia.

    Dash S, Sano H, Rochford JJ, Semple RK, Yeo G, Hyden CS, Soos MA, Clark J, Rodin A, Langenberg C, Druet C, Fawcett KA, Tung YC, Wareham NJ, Barroso I, Lienhard GE, O'Rahilly S and Savage DB

    Departments of Medicine and Clinical Biochemistry, University of Cambridge, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Tre-2, BUB2, CDC16, 1 domain family member 4 (TBC1D4) (AS160) is a Rab-GTPase activating protein implicated in insulin-stimulated glucose transporter 4 (GLUT4) translocation in adipocytes and myotubes. To determine whether loss-of-function mutations in TBC1D4 might impair GLUT4 translocation and cause insulin resistance in humans, we screened the coding regions of this gene in 156 severely insulin-resistant patients. A female presenting at age 11 years with acanthosis nigricans and extreme postprandial hyperinsulinemia was heterozygous for a premature stop mutation (R363X) in TBC1D4. After demonstrating reduced expression of wild-type TBC1D4 protein and expression of the truncated protein in lymphocytes from the proband, we further characterized the biological effects of the truncated protein in 3T3L1 adipocytes. Prematurely truncated TBC1D4 protein tended to increase basal cell membrane GLUT4 levels (P = 0.053) and significantly reduced insulin-stimulated GLUT4 cell membrane translocation (P < 0.05). When coexpressed with wild-type TBC1D4, the truncated protein dimerized with full-length TBC1D4, suggesting that the heterozygous truncated variant might interfere with its wild-type counterpart in a dominant negative fashion. Two overweight family members with the mutation also manifested normal fasting glucose and insulin levels but disproportionately elevated insulin levels following an oral glucose challenge. This family provides unique genetic evidence of TBC1D4 involvement in human insulin action.

    Funded by: British Heart Foundation; Medical Research Council: G0600414; NCI NIH HHS: P30 CA023108; NIDDK NIH HHS: DK25336, R01 DK025336, R56 DK025336; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;23;9350-5

  • Common regulatory variation impacts gene expression in a cell type-dependent manner.

    Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, Gagnebin M, Nisbett J, Deloukas P, Dermitzakis ET and Antonarakis SE

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1HH, Cambridge, UK.

    Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.

    Funded by: Wellcome Trust: 077011, 077046

    Science (New York, N.Y.) 2009;325;5945;1246-50

  • Ectopic recombination of a malaria var gene during mitosis associated with an altered var switch rate.

    Duffy MF, Byrne TJ, Carret C, Ivens A and Brown GV

    Department of Medicine at RMH, University of Melbourne, Parkville 3050, Australia.

    The Plasmodium falciparum var multigene family encodes P. falciparum erythrocyte membrane protein 1, which is responsible for the pathogenic traits of antigenic variation and adhesion of infected erythrocytes to host receptors during malaria infection. Clonal antigenic variation of P. falciparum erythrocyte membrane protein 1 is controlled by the switching between exclusively transcribed var genes. The tremendous diversity of the var gene repertoire both within and between parasite strains is critical for the parasite's strategy of immune evasion. We show that ectopic recombination between var genes occurs during mitosis, providing P. falciparum with opportunities to diversify its var repertoire, even during the course of a single infection. We show that the regulation of the recombined var gene has been disrupted, resulting in its persistent activation although the regulation of most other var genes is unaffected. The var promoter and intron of the recombined var gene are not responsible for its atypically persistent activity, and we conclude that altered subtelomeric cis sequence is the most likely cause of the persistent activity of the recombined var gene.

    Journal of molecular biology 2009;389;3;453-69

  • Traces of sub-Saharan and Middle Eastern lineages in Indian Muslim populations.

    Eaaswarkhanth M, Haque I, Ravesh Z, Romero IG, Meganathan PR, Dubey B, Khan FA, Chaubey G, Kivisild T, Tyler-Smith C, Singh L and Thangaraj K

    National DNA Analysis Centre, Central Forensic Science Laboratory, Kolkata, India.

    Islam is the second most practiced religion in India, next to Hinduism. It is still unclear whether the spread of Islam in India has been only a cultural transformation or is associated with detectable levels of gene flow. To estimate the contribution of West Asian and Arabian admixture to Indian Muslims, we assessed genetic variation in mtDNA, Y-chromosomal and LCT/MCM6 markers in 472, 431 and 476 samples, respectively, representing six Muslim communities from different geographical regions of India. We found that most of the Indian Muslim populations received their major genetic input from geographically close non-Muslim populations. However, low levels of likely sub-Saharan African, Arabian and West Asian admixture were also observed among Indian Muslims in the form of L0a2a2 mtDNA and E1b1b1a and J(*)(xJ2) Y-chromosomal lineages. The distinction between Iranian and Arabian sources was difficult to make with mtDNA and the Y chromosome, as the estimates were highly correlated because of similar gene pool compositions in the sources. In contrast, the LCT/MCM6 locus, which shows a clear distinction between the two sources, enabled us to rule out significant gene flow from Arabia. Overall, our results support a model according to which the spread of Islam in India was predominantly cultural conversion associated with minor but still detectable levels of gene flow from outside, primarily from Iran and Central Asia, rather than directly from the Arabian Peninsula.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2009;18;3;354-63

  • A high-throughput pharmaceutical screen identifies compounds with specific toxicity against BRCA2-deficient tumors.

    Evers B, Schut E, van der Burg E, Braumuller TM, Egan DA, Holstege H, Edser P, Adams DJ, Wade-Martins R, Bouwman P and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, the Netherlands.

    Purpose: Hereditary breast cancer is partly explained by germline mutations in BRCA1 and BRCA2. Although patients carry heterozygous mutations, their tumors have typically lost the remaining wild-type allele. Selectively targeting BRCA deficiency may therefore constitute an important therapeutic approach. Clinical trials applying this principle are underway, but it is unknown whether the compounds tested are optimal. It is therefore important to identify alternative compounds that specifically target BRCA deficiency and to test new combination therapies to establish optimal treatment strategies.

    Experimental design: We did a high-throughput pharmaceutical screen on BRCA2-deficient mouse mammary tumor cells and isogenic controls with restored BRCA2 function. Subsequently, we validated positive hits in vitro and in vivo using mice carrying BRCA2-deficient mammary tumors.

    Results: Three alkylators-chlorambucil, melphalan, and nimustine-displayed strong and specific toxicity against BRCA2-deficient cells. In vivo, these showed heterogeneous but generally strong BRCA2-deficient antitumor activity, with melphalan and nimustine doing better than cisplatin and the poly-(ADP-ribose)-polymerase inhibitor olaparib (AZD2281) in this small study. In vitro drug combination experiments showed synergistic interactions between the alkylators and olaparib. Tumor intervention studies combining nimustine and olaparib resulted in recurrence-free survival exceeding 330 days in 3 of 5 animals tested.

    Conclusions: We generated and validated a platform for identification of compounds with specific activity against BRCA2-deficient cells that translates well to the preclinical setting. Our data call for the re-evaluation of alkylators, especially melphalan and nimustine, alone or in combination with the poly-(ADP-ribose)-polymerase inhibitors, for the treatment of breast cancers with a defective BRCA pathway.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D012910/1; Cancer Research UK; Wellcome Trust

    Clinical cancer research : an official journal of the American Association for Cancer Research 2009;16;1;99-108

  • Genome-wide association study identifies variants at 9p21 and 22q13 associated with development of cutaneous nevi.

    Falchi M, Bataille V, Hayward NK, Duffy DL, Bishop JA, Pastinen T, Cervino A, Zhao ZZ, Deloukas P, Soranzo N, Elder DE, Barrett JH, Martin NG, Bishop DT, Montgomery GW and Spector TD

    Department of Twin Research & Genetic Epidemiology, Kings College London, St. Thomas' Hospital Campus, London, UK.

    A high melanocytic nevi count is the strongest known risk factor for cutaneous melanoma. We conducted a genome-wide association study for nevus count using 297,108 SNPs in 1,524 twins, with validation in an independent cohort of 4,107 individuals. We identified strongly associated variants in MTAP, a gene adjacent to the familial melanoma susceptibility locus CDKN2A on 9p21 (rs4636294, combined P = 3.4 x 10(-15)), as well as in PLA2G6 on 22q13.1 (rs2284063, combined P = 3.4 x 10(-8)). In addition, variants in these two loci showed association with melanoma risk in 3,131 melanoma cases from two independent studies, including rs10757257 at 9p21, combined P = 3.4 x 10(-8), OR = 1.23 (95% CI = 1.15-1.30) and rs132985 at 22q13.1, combined P = 2.6 x 10(-7), OR = 1.23 (95% CI = 1.15-1.30). This provides the first report of common variants associated to nevus number and demonstrates association of these variants with melanoma susceptibility.

    Funded by: Cancer Research UK: 10589, C588/A4994; Department of Health; NCI NIH HHS: CA88363, R01 CA083115, R01 CA083115-08, R01 CA83115; Wellcome Trust: 077011, 091746

    Nature genetics 2009;41;8;915-9

  • Detailed investigation of the role of common and low-frequency WFS1 variants in type 2 diabetes risk.

    Fawcett KA, Wheeler E, Morris AP, Ricketts SL, Hallmans G, Rolandsson O, Daly A, Wasson J, Permutt A, Hattersley AT, Glaser B, Franks PW, McCarthy MI, Wareham NJ, Sandhu MS and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Objective: Wolfram syndrome 1 (WFS1) single nucleotide polymorphisms (SNPs) are associated with risk of type 2 diabetes. In this study we aimed to refine this association and investigate the role of low-frequency WFS1 variants in type 2 diabetes risk.

    Research design and methods: For fine-mapping, we sequenced WFS1 exons, splice junctions, and conserved noncoding sequences in samples from 24 type 2 diabetic case and 68 control subjects, selected tagging SNPs, and genotyped these in 959 U.K. type 2 diabetic case and 1,386 control subjects. The same genomic regions were sequenced in samples from 1,235 type 2 diabetic case and 1,668 control subjects to compare the frequency of rarer variants between case and control subjects.

    Results: Of 31 tagging SNPs, the strongest associated was the previously untested 3' untranslated region rs1046320 (P = 0.008); odds ratio 0.84 and P = 6.59 x 10(-7) on further replication in 3,753 case and 4,198 control subjects. High correlation between rs1046320 and the original strongest SNP (rs10010131) (r2 = 0.92) meant that we could not differentiate between their effects in our samples. There was no difference in the cumulative frequency of 82 rare (minor allele frequency [MAF] <0.01) nonsynonymous variants between type 2 diabetic case and control subjects (P = 0.79). Two intermediate frequency (MAF 0.01-0.05) nonsynonymous changes also showed no statistical association with type 2 diabetes.

    Conclusions: We identified six highly correlated SNPs that show strong and comparable associations with risk of type 2 diabetes, but further refinement of these associations will require large sample sizes (>100,000) or studies in ethnically diverse populations. Low frequency variants in WFS1 are unlikely to have a large impact on type 2 diabetes risk in white U.K. populations, highlighting the complexities of undertaking association studies with low-frequency variants identified by resequencing.

    Funded by: British Heart Foundation; Medical Research Council: MC_U106179471; Wellcome Trust: 064890, 077016, 077016/Z/05/Z, 081682

    Diabetes 2009;59;3;741-6

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.

    The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.

    Funded by: Wellcome Trust

    Molecular systems biology 2009;5;269

  • The Pfam protein families database.

    Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (, the USA ( and Sweden (

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Howard Hughes Medical Institute; Medical Research Council: MC_U137761446; Wellcome Trust: 087656, WT077044/Z/05/Z

    Nucleic acids research 2009;38;Database issue;D211-22

  • DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources.

    Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM and Carter NP

    Cambridge University Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK.

    Many patients suffering from developmental disorders harbor submicroscopic deletions or duplications that, by affecting the copy number of dosage-sensitive genes or disrupting normal gene expression, lead to disease. However, many aberrations are novel or extremely rare, making clinical interpretation problematic and genotype-phenotype correlations uncertain. Identification of patients sharing a genomic rearrangement and having phenotypic features in common leads to greater certainty in the pathogenic nature of the rearrangement and enables new syndromes to be defined. To facilitate the analysis of these rare events, we have developed an interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations. DECIPHER catalogs common copy-number changes in normal populations and thus, by exclusion, enables changes that are novel and potentially pathogenic to be identified. DECIPHER enhances genetic counseling by retrieving relevant information from a variety of bioinformatics resources. Known and predicted genes within an aberration are listed in the DECIPHER patient report, and genes of recognized clinical importance are highlighted and prioritized. DECIPHER enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community through display in the genome browser Ensembl. By sharing cases worldwide, clusters of rare cases having phenotype and structural rearrangement in common can be identified, leading to the delineation of new syndromes and furthering understanding of gene function.

    Funded by: Wellcome Trust: WT077008

    American journal of human genetics 2009;84;4;524-33

  • Ensembl's 10th year.

    Flicek P, Aken BL, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Koscielny G, Kulesha E, Lawson D, Longden I, Massingham T, McLaren W, Megy K, Overduin B, Pritchard B, Rios D, Ruffier M, Schuster M, Slater G, Smedley D, Spudich G, Tang YA, Trevanion S, Vilella A, Vogel J, White S, Wilder SP, Zadissa A, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, Smith J and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Ensembl ( integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E010768/1, BB/E011640/1, BBE0116401, BBS/B/13438, BBS/B/13462; Wellcome Trust: 062023, 077198

    Nucleic acids research 2009;38;Database issue;D557-62

  • COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer.

    Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, Teague JW, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The catalogue of Somatic Mutations in Cancer (COSMIC) ( is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13,423 genes in almost 370,000 tumours, describing over 90,000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world's literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC's data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Nucleic acids research 2009;38;Database issue;D652-7

  • Reduced TFAP2A function causes variable optic fissure closure and retinal defects and sensitizes eye development to mutations in other morphogenetic regulators.

    Gestri G, Osborne RJ, Wyatt AW, Gerrelli D, Gribble S, Stewart H, Fryer A, Bunyan DJ, Prescott K, Collin JR, Fitzgerald T, Robinson D, Carter NP, Wilson SW and Ragge NK

    Department of Cell and Developmental Biology, UCL, London, UK.

    Mutations in the transcription factor encoding TFAP2A gene underlie branchio-oculo-facial syndrome (BOFS), a rare dominant disorder characterized by distinctive craniofacial, ocular, ectodermal and renal anomalies. To elucidate the range of ocular phenotypes caused by mutations in TFAP2A, we took three approaches. First, we screened a cohort of 37 highly selected individuals with severe ocular anomalies plus variable defects associated with BOFS for mutations or deletions in TFAP2A. We identified one individual with a de novo TFAP2A four amino acid deletion, a second individual with two non-synonymous variations in an alternative splice isoform TFAP2A2, and a sibling-pair with a paternally inherited whole gene deletion with variable phenotypic expression. Second, we determined that TFAP2A is expressed in the lens, neural retina, nasal process, and epithelial lining of the oral cavity and palatal shelves of human and mouse embryos--sites consistent with the phenotype observed in patients with BOFS. Third, we used zebrafish to examine how partial abrogation of the fish ortholog of TFAP2A affects the penetrance and expressivity of ocular phenotypes due to mutations in genes encoding bmp4 or tcf7l1a. In both cases, we observed synthetic, enhanced ocular phenotypes including coloboma and anophthalmia when tfap2a is knocked down in embryos with bmp4 or tcf7l1a mutations. These results reveal that mutations in TFAP2A are associated with a wide range of eye phenotypes and that hypomorphic tfap2a mutations can increase the risk of developmental defects arising from mutations at other loci.

    Funded by: Medical Research Council: G0501487, G0700089; Wellcome Trust: 074376, 078047, WT077008

    Human genetics 2009;126;6;791-803

  • Neonates harbour highly active gammadelta T cells with selective impairments in preterm infants.

    Gibbons DL, Haque SF, Silberzahn T, Hamilton K, Langford C, Ellis P, Carr R and Hayday AC

    Peter Gorer Department of Immunobiology, London, UK.

    Acknowledgement of the breadth of T-cell pleiotropy has provoked increasing interest in the degree to which functional responsiveness is elicited by environmental cues versus differentiation. This is particularly relevant for young animals requiring rapid responses to acute environmental exposure. In young mice, gammadelta T cells are disproportionately important for immuno-protection. To examine the situation in humans, we compared populations and clones of T cells from term and preterm babies, and adults. By comparison with alphabeta T cells, neonate-derived gammadelta cells show stronger, pleiotropic functional responsiveness, and lack signatory deficits in IFN-gamma production. Emphasising the acquisition of functional competence in utero, IFN-gamma was produced by gammadelta cells sampled from premature births, and, although one month's post-partum environmental exposure invariably increased their TNF-alpha production, it had no consistent effect on IFN-gamma or IL-2. In sum, gammadelta cells seem well positioned at birth to contribute to immuno-protection and immuno-regulation, possibly compensating for selective immaturity in the alphabeta compartment. With regard to the susceptibilities of preterm babies to viral infection, gammadelta cells from preterm neonates were commonly impaired in Toll-like receptor-3 and -7 expression and compared with cells from term babies failed to optimise cytokine production in response to coincident TCR and TLR agonists.

    Funded by: PHS HHS: R0161799; Wellcome Trust: 071534

    European journal of immunology 2009;39;7;1794-806

  • A general basis for cognition in the evolution of synapse signaling complexes.

    Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Cambridge, United Kingdom.

    Beneath the complexity of the human brain are molecular principles shaped by evolution explaining the origins of the behavioral repertoire. The role of the nervous system is to provide a repertoire of behaviors allowing the animal to respond and adapt to changing environments during the course of its life. Multiprotein complexes in the postsynaptic terminal of synapses control adaptive and cognitive processes in metazoan nervous systems. These multiprotein complexes are organized into molecular networks that detect and respond to patterns of neural activity. Combinations of proteins are used to build different complexes and pathways producing great diversity. These complexes evolved from an ancestral core set of proteins controlling adaptive behaviors in unicellular organisms known as the protosynapse. Later expansion in numbers and interactions resulted in more complex synapses in invertebrates and vertebrates. The resultant combinatorial complexity has contributed to the neuroanatomical, neurophysiological, and behavioral diversity in these species. Mutations in genes encoding the complexes result in many human diseases of the nervous system. This general mechanism of cognition provides a useful template for studying evolution of behavior in all animals.

    Funded by: Wellcome Trust

    Cold Spring Harbor symposia on quantitative biology 2009;74;249-57

  • Genetic utility of broadly defined bipolar schizoaffective disorder as a diagnostic concept.

    Hamshere ML, Green EK, Jones IR, Jones L, Moskvina V, Kirov G, Grozeva D, Nikolov I, Vukcevic D, Caesar S, Gordon-Smith K, Fraser C, Russell E, Breen G, St Clair D, Collier DA, Young AH, Ferrier IN, Farmer A, McGuffin P, Wellcome Trust Case Control Consortium, Holmans PA, Owen MJ, O'Donovan MC and Craddock N

    Biostatistics and Bioinformatics Unit and Department of Psychological Medicine, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK.

    Background: Psychiatric phenotypes are currently defined according to sets of descriptive criteria. Although many of these phenotypes are heritable, it would be useful to know whether any of the various diagnostic categories in current use identify cases that are particularly helpful for biological-genetic research.

    Aims: To use genome-wide genetic association data to explore the relative genetic utility of seven different descriptive operational diagnostic categories relevant to bipolar illness within a large UK case-control bipolar disorder sample.

    Method: We analysed our previously published Wellcome Trust Case Control Consortium (WTCCC) bipolar disorder genome-wide association data-set, comprising 1868 individuals with bipolar disorder and 2938 controls genotyped for 276 122 single nucleotide polymorphisms (SNPs) that met stringent criteria for genotype quality. For each SNP we performed a test of association (bipolar disorder group v. control group) and used the number of associated independent SNPs statistically significant at P<0.00001 as a metric for the overall genetic signal in the sample. We next compared this metric with that obtained using each of seven diagnostic subsets of the group with bipolar disorder: Research Diagnostic Criteria (RDC): bipolar I disorder; manic disorder; bipolar II disorder; schizoaffective disorder, bipolar type; DSM-IV: bipolar I disorder; bipolar II disorder; schizoaffective disorder, bipolar type.

    Results: The RDC schizoaffective disorder, bipolar type (v. controls) stood out from the other diagnostic subsets as having a significant excess of independent association signals (P<0.003) compared with that expected in samples of the same size selected randomly from the total bipolar disorder group data-set. The strongest association in this subset of participants with bipolar disorder was at rs4818065 (P = 2.42 x 10(-7)). Biological systems implicated included gamma amniobutyric acid (GABA)(A) receptors. Genes having at least one associated polymorphism at P<10(-4) included B3GALTS, A2BP1, GABRB1, AUTS2, BSN, PTPRG, GIRK2 and CDH12.

    Conclusions: Our findings show that individuals with broadly defined bipolar schizoaffective features have either a particularly strong genetic contribution or that, as a group, are genetically more homogeneous than the other phenotypes tested. The results point to the importance of using diagnostic approaches that recognise this group of individuals. Our approach can be applied to similar data-sets for other psychiatric and non-psychiatric phenotypes.

    Funded by: Medical Research Council: G0000647, G0000934, G0701003, G0801418; Wellcome Trust: 060620

    The British journal of psychiatry : the journal of mental science 2009;195;1;23-9

  • Identification of MAMDC1 as a candidate susceptibility gene for systemic lupus erythematosus (SLE).

    Hellquist A, Zucchelli M, Lindgren CM, Saarialho-Kere U, Järvinen TM, Koskenmies S, Julkunen H, Onkamo P, Skoog T, Panelius J, Räisänen-Sokolowski A, Hasan T, Widen E, Gunnarson I, Svenungsson E, Padyukov L, Assadi G, Berglind L, Mäkelä VV, Kivinen K, Wong A, Cunningham Graham DS, Vyse TJ, D'Amato M and Kere J

    Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.

    Background: Systemic lupus erythematosus (SLE) is a complex autoimmune disorder with multiple susceptibility genes. We have previously reported suggestive linkage to the chromosomal region 14q21-q23 in Finnish SLE families.

    Principal findings: Genetic fine mapping of this region in the same family material, together with a large collection of parent affected trios from UK and two independent case-control cohorts from Finland and Sweden, indicated that a novel uncharacterized gene, MAMDC1 (MAM domain containing glycosylphosphatidylinositol anchor 2, also known as MDGA2, MIM 611128), represents a putative susceptibility gene for SLE. In a combined analysis of the whole dataset, significant evidence of association was detected for the MAMDC1 intronic single nucleotide polymorphisms (SNP) rs961616 (P -value = 0.001, Odds Ratio (OR) = 1.292, 95% CI 1.103-1.513) and rs2297926 (P -value = 0.003, OR = 1.349, 95% CI 1.109-1.640). By Northern blot, real-time PCR (qRT-PCR) and immunohistochemical (IHC) analyses, we show that MAMDC1 is expressed in several tissues and cell types, and that the corresponding mRNA is up-regulated by the pro-inflammatory cytokines tumour necrosis factor alpha (TNF-alpha) and interferon gamma (IFN-gamma) in THP-1 monocytes. Based on its homology to known proteins with similar structure, MAMDC1 appears to be a novel member of the adhesion molecules of the immunoglobulin superfamily (IgCAM), which is involved in cell adhesion, migration, and recruitment to inflammatory sites. Remarkably, some IgCAMs have been shown to interact with ITGAM, the product of another SLE susceptibility gene recently discovered in two independent genome wide association (GWA) scans.

    Significance: Further studies focused on MAMDC1 and other molecules involved in these pathways might thus provide new insight into the pathogenesis of SLE.

    PloS one 2009;4;12;e8037

  • Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis.

    Holden MT, Hauser H, Sanders M, Ngo TH, Cherevach I, Cronin A, Goodhead I, Mungall K, Quail MA, Price C, Rabbinowitsch E, Sharp S, Croucher NJ, Chieu TB, Mai NT, Diep TS, Chinh NT, Kehoe M, Leigh JA, Ward PN, Dowson CG, Whatmore AM, Chanter N, Iversen P, Gottschalk M, Slater JD, Smith HE, Spratt BG, Xu J, Ye C, Bentley S, Barrell BG, Schultsz C, Maskell DJ and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Background: Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood.

    Methodology/principal findings: The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, approximately 40% of the approximately 2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three approximately 90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors.

    Conclusions/significance: The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G019274/1; Wellcome Trust: 089472

    PloS one 2009;4;7;e6072

  • Genomic evidence for the evolution of Streptococcus equi: host restriction, increased virulence, and genetic exchange with human pathogens.

    Holden MT, Heather Z, Paillot R, Steward KF, Webb K, Ainslie F, Jourdan T, Bason NC, Holroyd NE, Mungall K, Quail MA, Sanders M, Simmonds M, Willey D, Brooks K, Aanensen DM, Spratt BG, Jolley KA, Maiden MC, Kehoe M, Chanter N, Bentley SD, Robinson C, Maskell DJ, Parkhill J and Waller AS

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The continued evolution of bacterial pathogens has major implications for both human and animal disease, but the exchange of genetic material between host-restricted pathogens is rarely considered. Streptococcus equi subspecies equi (S. equi) is a host-restricted pathogen of horses that has evolved from the zoonotic pathogen Streptococcus equi subspecies zooepidemicus (S. zooepidemicus). These pathogens share approximately 80% genome sequence identity with the important human pathogen Streptococcus pyogenes. We sequenced and compared the genomes of S. equi 4047 and S. zooepidemicus H70 and screened S. equi and S. zooepidemicus strains from around the world to uncover evidence of the genetic events that have shaped the evolution of the S. equi genome and led to its emergence as a host-restricted pathogen. Our analysis provides evidence of functional loss due to mutation and deletion, coupled with pathogenic specialization through the acquisition of bacteriophage encoding a phospholipase A(2) toxin, and four superantigens, and an integrative conjugative element carrying a novel iron acquisition system with similarity to the high pathogenicity island of Yersinia pestis. We also highlight that S. equi, S. zooepidemicus, and S. pyogenes share a common phage pool that enhances cross-species pathogen evolution. We conclude that the complex interplay of functional loss, pathogenic specialization, and genetic exchange between S. equi, S. zooepidemicus, and S. pyogenes continues to influence the evolution of these important streptococci.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G019274/1; Wellcome Trust: 047072, 087622, 089472

    PLoS pathogens 2009;5;3;e1000346

  • Genome sequence of a recently emerged, highly transmissible, multi-antibiotic- and antiseptic-resistant variant of methicillin-resistant Staphylococcus aureus, sequence type 239 (TW).

    Holden MT, Lindsay JA, Corton C, Quail MA, Cockfield JD, Pathak S, Batra R, Parkhill J, Bentley SD and Edgeworth JD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.

    The 3.1-Mb genome of an outbreak methicillin-resistant Staphylococcus aureus (MRSA) strain (TW20) contains evidence of recently acquired DNA, including two large regions (635 kb and 127 kb). The strain is resistant to a wide range of antibiotics, antiseptics, and heavy metals due to resistance genes encoded on mobile genetic elements and also mutations in housekeeping genes.

    Funded by: Wellcome Trust

    Journal of bacteriology 2009;192;3;888-92

  • Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.

    Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P, Wellcome Trust Case-Control Consortium, Owen MJ, O'Donovan MC and Craddock N

    MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Heath Park, CF23 6BQ Cardiff, UK.

    We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis.

    Funded by: Medical Research Council: G0801418; Wellcome Trust

    American journal of human genetics 2009;85;1;13-24

  • Epilepsy and mental retardation limited to females with PCDH19 mutations can present de novo or in single generation families.

    Hynes K, Tarpey P, Dibbens LM, Bayly MA, Berkovic SF, Smith R, Raisi ZA, Turner SJ, Brown NJ, Desai TD, Haan E, Turner G, Christodoulou J, Leonard H, Gill D, Stratton MR, Gecz J and Scheffer IE

    SA Pathology, Women's and Children's Hospital, 72 King William Road, North Adelaide, SA 5006, Australia.

    Background: Epilepsy and mental retardation limited to females (EFMR) is an intriguing X-linked disorder affecting heterozygous females and sparing hemizygous males. Mutations in the protocadherin 19 (PCDH19) gene have been identified in seven unrelated families with EFMR.

    Methods and results: Here, we assessed the frequency of PCDH19 mutations in individuals with clinical features which overlap those of EFMR. We analysed 185 females from three cohorts: 42 with Rett syndrome who were negative for MECP2 and CDKL5 mutations, 57 with autism spectrum disorders, and 86 with epilepsy with or without intellectual disability. No mutations were identified in the Rett syndrome and autism spectrum disorders cohorts suggesting that despite sharing similar clinical characteristics with EFMR, PCDH19 mutations are not generally associated with these disorders. Among the 86 females with epilepsy (of whom 51 had seizure onset before 3 years), with or without intellectual disability, we identified two (2.3%) missense changes. One (c.1671C-->G, p.N557K), reported previously without clinical data, was found in two affected sisters, the first EFMR family without a multigenerational family history of affected females. The second, reported here, is a novel de novo missense change identified in a sporadic female. The change, p.S276P, is predicted to result in functional disturbance of PCDH19 as it affects a highly conserved residue adjacent to the adhesion interface of EC3 of PCDH19.

    Conclusions: This de novo PCDH19 mutation in a sporadic female highlights that mutational analysis should be considered in isolated instances of girls with infantile onset seizures and developmental delay, in addition to those with the characteristic family history of EFMR.

    Funded by: Wellcome Trust

    Journal of medical genetics 2009;47;3;211-6

  • A genome-wide perspective of genetic variation in human metabolism.

    Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, Prehn C, Altmaier E, Kastenmüller G, Kato BS, Mewes HW, Meitinger T, de Angelis MH, Kronenberg F, Soranzo N, Wichmann HE, Spector TD, Adamski J and Suhre K

    Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.

    Serum metabolite concentrations provide a direct readout of biological processes in the human body, and they are associated with disorders such as cardiovascular and metabolic diseases. We present a genome-wide association study (GWAS) of 163 metabolic traits measured in human blood from 1,809 participants from the KORA population, with replication in 422 participants of the TwinsUK cohort. For eight out of nine replicated loci (FADS1, ELOVL2, ACADS, ACADM, ACADL, SPTLC3, ETFDH and SLC16A9), the genetic variant is located in or near genes encoding enzymes or solute carriers whose functions match the associating metabolic traits. In our study, the use of metabolite concentration ratios as proxies for enzymatic reaction rates reduced the variance and yielded robust statistical associations with P values ranging from 3 x 10(-24) to 6.5 x 10(-179). These loci explained 5.6%-36.3% of the observed variance in metabolite concentrations. For several loci, associations with clinically relevant parameters have been reported previously.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust: 091746

    Nature genetics 2009;42;2;137-41

  • Common variants at five new loci associated with early-onset inflammatory bowel disease.

    Imielinski M, Baldassano RN, Griffiths A, Russell RK, Annese V, Dubinsky M, Kugathasan S, Bradfield JP, Walters TD, Sleiman P, Kim CE, Muise A, Wang K, Glessner JT, Saeed S, Zhang H, Frackelton EC, Hou C, Flory JH, Otieno G, Chiavacci RM, Grundmeier R, Castro M, Latiano A, Dallapiccola B, Stempak J, Abrams DJ, Taylor K, McGovern D, Western Regional Alliance for Pediatric IBD, Silber G, Wrobel I, Quiros A, International IBD Genetics Consortium, Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmuda MM, Bitton A, Dassopoulos T, Datta LW, Green T, Griffiths AM, Kistner EO, Murtha MT, Regueiro MD, Rotter JI, Schumm LP, Steinhart AH, Targan SR, Xavier RJ, NIDDK IBD Genetics Consortium, Libioulle C, Sandor C, Lathrop M, Belaiche J, Dewit O, Gut I, Heath S, Laukens D, Mni M, Rutgeerts P, Van Gossum A, Zelenika D, Franchimont D, Hugot JP, de Vos M, Vermeire S, Louis E, Belgian-French IBD Consortium, Wellcome Trust Case Control Consortium, Cardon LR, Anderson CA, Drummond H, Nimmo E, Ahmad T, Prescott NJ, Onnie CM, Fisher SA, Marchini J, Ghori J, Bumpstead S, Gwillam R, Tremelling M, Delukas P, Mansfield J, Jewell D, Satsangi J, Mathew CG, Parkes M, Georges M, Daly MJ, Heyman MB, Ferry GD, Kirschner B, Lee J, Essers J, Grand R, Stephens M, Levine A, Piccoli D, Van Limbergen J, Cucchiara S, Monos DS, Guthery SL, Denson L, Wilson DC, Grant SF, Daly M, Silverberg MS, Satsangi J and Hakonarson H

    Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

    The inflammatory bowel diseases (IBD) Crohn's disease and ulcerative colitis are common causes of morbidity in children and young adults in the western world. Here we report the results of a genome-wide association study in early-onset IBD involving 3,426 affected individuals and 11,963 genetically matched controls recruited through international collaborations in Europe and North America, thereby extending the results from a previous study of 1,011 individuals with early-onset IBD. We have identified five new regions associated with early-onset IBD susceptibility, including 16p11 near the cytokine gene IL27 (rs8049439, P = 2.41 x 10(-9)), 22q12 (rs2412973, P = 1.55 x 10(-9)), 10q22 (rs1250550, P = 5.63 x 10(-9)), 2q37 (rs4676410, P = 3.64 x 10(-8)) and 19q13.11 (rs10500264, P = 4.26 x 10(-10)). Our scan also detected associations at 23 of 32 loci previously implicated in adult-onset Crohn's disease and at 8 of 17 loci implicated in adult-onset ulcerative colitis, highlighting the close pathogenetic relationship between early- and adult-onset IBD.

    Funded by: Canadian Institutes of Health Research; Chief Scientist Office: CZB/4/540; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: C06 RR011234, M01 RR000064, M01 RR002172, M01 RR002172-26, M01-RR00064; NIDDK NIH HHS: DK062423, DK069513, K23 DK069513, K24 DK060617, K24 DK060617-07, P30 DK040561, P30 DK040561-14, P30 DK043351, T32 DK007477, U01 DK062413, U01 DK062420, U01 DK062420-08, U01 DK062423; Wellcome Trust: 072789/Z/03/Z

    Nature genetics 2009;41;12;1335-40

  • Transposon-mediated genome manipulation in vertebrates.

    Ivics Z, Li MA, Mátés L, Boeke JD, Nagy A, Bradley A and Izsvák Z

    Max Delbrück Center for Molecular Medicine, Berlin, Germany.

    Transposable elements are DNA segments with the unique ability to move about in the genome. This inherent feature can be exploited to harness these elements as gene vectors for genome manipulation. Transposon-based genetic strategies have been established in vertebrate species over the last decade, and current progress in this field suggests that transposable elements will serve as indispensable tools. In particular, transposons can be applied as vectors for somatic and germline transgenesis, and as insertional mutagens in both loss-of-function and gain-of-function forward mutagenesis screens. In addition, transposons will gain importance in future cell-based clinical applications, including nonviral gene transfer into stem cells and the rapidly developing field of induced pluripotent stem cells. Here we provide an overview of transposon-based methods used in vertebrate model organisms with an emphasis on the mouse system and highlight the most important considerations concerning genetic applications of the transposon systems.

    Funded by: NCI NIH HHS: P01 CA016519, P01 CA016519-340010; NIGMS NIH HHS: R01 GM036481

    Nature methods 2009;6;6;415-22

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800675, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 072064, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062, 090532

    Nature genetics 2009;41;6;657-65

  • Effects of calcium signaling on Plasmodium falciparum erythrocyte invasion and post-translational modification of gliding-associated protein 45 (PfGAP45).

    Jones ML, Cottingham C and Rayner JC

    Department of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

    Plasmodium falciparum erythrocyte invasion is powered by an actin/myosin motor complex that is linked both to the tight junction and to the merozoite cytoskeleton through the Inner Membrane Complex (IMC). The IMC association of the myosin motor, PfMyoA, is maintained by its association with three proteins: PfMTIP, a myosin light chain, PfGAP45, an IMC peripheral membrane protein, and PfGAP50, an integral membrane protein of the IMC. This protein complex is referred to as the glideosome, and given its central role in erythrocyte invasion, this complex is likely the target of several specific regulatory effectors that ensure it is properly localized, assembled, and activated as the merozoite prepares to invade its target cell. However, little is known about how erythrocyte invasion as a whole is regulated, or about how or whether that regulation impacts the glideosome. Here we show that P. falciparum erythrocyte invasion is regulated by the release of intracellular calcium via the cyclic-ADP Ribose (cADPR) pathway, but that inhibition of cADPR-mediated calcium release does not affect PfGAP45 phosphorylation or glideosome association. By contrast, the serine/threonine kinase inhibitor, staurosporine, affects both PfGAP45 isoform distribution and the integrity of the glideosome complex. This data identifies specific regulatory elements involved in controlling P. falciparum erythrocyte invasion and reveals that the assembly status of the merozoite glideosome, which is central to erythrocyte invasion, is surprisingly dynamic.

    Funded by: NIAID NIH HHS: T32 AI055438, T32 AI055438-05

    Molecular and biochemical parasitology 2009;168;1;55-62

  • Typhoid Fever

    Kingsley,R.A. and Dougan,G.;

    Vaccines for Biodefense and Emerging and Neglected Diseases 2009;Chapter 57;1147–1161

  • Support for the involvement of large copy number variants in the pathogenesis of schizophrenia.

    Kirov G, Grozeva D, Norton N, Ivanov D, Mantripragada KK, Holmans P, International Schizophrenia Consortium, Wellcome Trust Case Control Consortium, Craddock N, Owen MJ and O'Donovan MC

    Department of Psychological Medicine, Cardiff University, Heath Park, Cardiff, UK.

    We investigated the involvement of rare (<1%) copy number variants (CNVs) in 471 cases of schizophrenia and 2792 controls that had been genotyped using the Affymetrix GeneChip 500K Mapping Array. Large CNVs >1 Mb were 2.26 times more common in cases (P = 0.00027), with the effect coming mostly from deletions (odds ratio, OR = 4.53, P = 0.00013) although duplications were also more common (OR = 1.71, P = 0.04). Two large deletions were found in two cases each, but in no controls: a deletion at 22q11.2 known to be a susceptibility factor for schizophrenia and a deletion on 17p12, at 14.0-15.4 Mb. The latter is known to cause hereditary neuropathy with liability to pressure palsies. The same deletion was found in 6 of 4618 (0.13%) cases and 6 of 36 092 (0.017%) controls in the re-analysed data of two recent large CNV studies of schizophrenia (OR = 7.82, P = 0.001), with the combined significance level for all three studies achieving P = 5 x 10(-5). One large duplication on 16p13.1, which has been previously implicated as a susceptibility factor for autism, was found in three cases and six controls (0.6% versus 0.2%, OR = 2.98, P = 0.13). We also provide the first support for a recently reported association between deletions at 15q11.2 and schizophrenia (P = 0.026). This study confirms the involvement of rare CNVs in the pathogenesis of schizophrenia and contributes to the growing list of specific CNVs that are implicated.

    Funded by: Medical Research Council: G0801418; NIMH NIH HHS: 2 P50 MH066392-05A1; Wellcome Trust: 076113

    Human molecular genetics 2009;18;8;1497-503

  • Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations.

    Kolz M, Johnson T, Sanna S, Teumer A, Vitart V, Perola M, Mangino M, Albrecht E, Wallace C, Farrall M, Johansson A, Nyholt DR, Aulchenko Y, Beckmann JS, Bergmann S, Bochud M, Brown M, Campbell H, EUROSPAN Consortium, Connell J, Dominiczak A, Homuth G, Lamina C, McCarthy MI, ENGAGE Consortium, Meitinger T, Mooser V, Munroe P, Nauck M, Peden J, Prokisch H, Salo P, Salomaa V, Samani NJ, Schlessinger D, Uda M, Völker U, Waeber G, Waterworth D, Wang-Sattler R, Wright AF, Adamski J, Whitfield JB, Gyllensten U, Wilson JF, Rudan I, Pramstaller P, Watkins H, PROCARDIS Consortium, Doering A, Wichmann HE, KORA Study, Spector TD, Peltonen L, Völzke H, Nagaraja R, Vollenweider P, Caulfield M, WTCCC, Illig T and Gieger C

    Institute of Epidemiology, Helmholtz Zentrum München, National Research Center for Environment and Health, Neuherberg, Germany.

    Elevated serum uric acid levels cause gout and are a risk factor for cardiovascular disease and diabetes. To investigate the polygenetic basis of serum uric acid levels, we conducted a meta-analysis of genome-wide association scans from 14 studies totalling 28,141 participants of European descent, resulting in identification of 954 SNPs distributed across nine loci that exceeded the threshold of genome-wide significance, five of which are novel. Overall, the common variants associated with serum uric acid levels fall in the following nine regions: SLC2A9 (p = 5.2x10(-201)), ABCG2 (p = 3.1x10(-26)), SLC17A1 (p = 3.0x10(-14)), SLC22A11 (p = 6.7x10(-14)), SLC22A12 (p = 2.0x10(-9)), SLC16A9 (p = 1.1x10(-8)), GCKR (p = 1.4x10(-9)), LRRC16A (p = 8.5x10(-9)), and near PDZK1 (p = 2.7x10(-9)). Identified variants were analyzed for gender differences. We found that the minor allele for rs734553 in SLC2A9 has greater influence in lowering uric acid levels in women and the minor allele of rs2231142 in ABCG2 elevates uric acid levels more strongly in men compared to women. To further characterize the identified variants, we analyzed their association with a panel of metabolites. rs12356193 within SLC16A9 was associated with DL-carnitine (p = 4.0x10(-26)) and propionyl-L-carnitine (p = 5.0x10(-8)) concentrations, which in turn were associated with serum UA levels (p = 1.4x10(-57) and p = 8.1x10(-54), respectively), forming a triangle between SNP, metabolites, and UA levels. Taken together, these associations highlight additional pathways that are important in the regulation of serum uric acid levels and point toward novel potential targets for pharmacological intervention to prevent or treat hyperuricemia. In addition, these findings strongly support the hypothesis that transport proteins are key in regulating serum uric acid levels.

    Funded by: Arthritis Research UK; British Heart Foundation: FS/05/061/19501, PG02/128; Chief Scientist Office: CZB/4/710; Medical Research Council: G0400874, G9521010, G9521010D, MC_U127561128; NIA NIH HHS: N01-AG-1-2109; NIAAA NIH HHS: AA007535, R01 AA007535; Wellcome Trust: 076113/B/04/Z

    PLoS genetics 2009;5;6;e1000504

  • Parental origin of sequence variants associated with complex diseases.

    Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, Besenbacher S, Jonasdottir A, Sigurdsson A, Kristinsson KT, Jonasdottir A, Frigge ML, Gylfason A, Olason PI, Gudjonsson SA, Sverrisson S, Stacey SN, Sigurgeirsson B, Benediktsdottir KR, Sigurdsson H, Jonsson T, Benediktsson R, Olafsson JH, Johannsson OT, Hreidarsson AB, Sigurdsson G, DIAGRAM Consortium, Ferguson-Smith AC, Gudbjartsson DF, Thorsteinsdottir U and Stefansson K

    deCODE genetics, Sturlugata 8, 101 Reykjavík, Iceland.

    Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.

    Funded by: Medical Research Council: G9723500, MC_U106179471, MC_U106179474, MC_U127592696; NIAMS NIH HHS: K08 AR055688; NIDDK NIH HHS: R01 DK029867; Wellcome Trust: 077016

    Nature 2009;462;7275;868-74

  • Common genetic variation in the melatonin receptor 1B gene (MTNR1B) is associated with decreased early-phase insulin response.

    Langenberg C, Pascoe L, Mari A, Tura A, Laakso M, Frayling TM, Barroso I, Loos RJ, Wareham NJ, Walker M and RISC Consortium

    MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    Aims/hypothesis: We investigated whether variation in MTNR1B, which was recently identified as a common genetic determinant of fasting glucose levels in healthy, diabetes-free individuals, is associated with measures of beta cell function and whole-body insulin sensitivity.

    Methods: We studied 1,276 healthy individuals of European ancestry at 19 centres of the Relationship between Insulin Sensitivity and Cardiovascular disease (RISC) study. Whole-body insulin sensitivity was assessed by euglycaemic-hyperinsulinaemic clamp and indices of beta cell function were derived from a 75 g oral glucose tolerance test (including 30 min insulin response and glucose sensitivity). We studied rs10830963 in MTNR1B using additive genetic models, adjusting for age, sex and recruitment centre.

    Results: The minor (G) allele of rs10830963 in MTNR1B (frequency 0.30 in HapMap Centre d'Etude du Polymorphisme [Utah residents with northern and western European ancestry] [CEU]; 0.29 in RISC participants) was associated with higher levels of fasting plasma glucose (standardised beta [95% CI] 0.17 [0.085, 0.25] per G allele, p = 5.8 x 10(-5)), consistent with recent observations. In addition, the G-allele was significantly associated with lower early insulin response (-0.19 [-0.28, -0.10], p = 1.7 x 10(-5)), as well as with decreased beta cell glucose sensitivity (-0.11 [-0.20, -0.027], p = 0.010). No associations were observed with clamp-assessed insulin sensitivity (p = 0.15) or different measures of body size (p > 0.7 for all).

    Conclusions/interpretation: Genetic variation in MTNR1B is associated with defective early insulin response and decreased beta cell glucose sensitivity, which may contribute to the higher glucose levels of non-diabetic individuals carrying the minor G allele of rs10830963 in MTNR1B.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0701863, MC_U106188470; Wellcome Trust: 077016, 077016/Z/05/Z

    Diabetologia 2009;52;8;1537-42

  • Testing the water: marine metagenomics.

    Langridge G

    Nature reviews. Microbiology 2009;7;8;552

  • Antibiotic treatment of clostridium difficile carrier mice triggers a supershedder state, spore-mediated transmission, and severe disease in immunocompromised hosts.

    Lawley TD, Clare S, Walker AW, Goulding D, Stabler RA, Croucher N, Mastroeni P, Scott P, Raisen C, Mottram L, Fairweather NF, Wren BW, Parkhill J and Dougan G

    Microbial Pathogenesis Laboratory1 and Pathogen Genomics, Hinxton, United Kingdom.

    Clostridium difficile persists in hospitals by exploiting an infection cycle that is dependent on humans shedding highly resistant and infectious spores. Here we show that human virulent C. difficile can asymptomatically colonize the intestines of immunocompetent mice, establishing a carrier state that persists for many months. C. difficile carrier mice consistently shed low levels of spores but, surprisingly, do not transmit infection to cohabiting mice. However, antibiotic treatment of carriers triggers a highly contagious supershedder state, characterized by a dramatic reduction in the intestinal microbiota species diversity, C. difficile overgrowth, and excretion of high levels of spores. Stopping antibiotic treatment normally leads to recovery of the intestinal microbiota species diversity and suppresses C. difficile levels, although some mice persist in the supershedding state for extended periods. Spore-mediated transmission to immunocompetent mice treated with antibiotics results in self-limiting mucosal inflammation of the large intestine. In contrast, transmission to mice whose innate immune responses are compromised (Myd88(-/-)) leads to a severe intestinal disease that is often fatal. Thus, mice can be used to investigate distinct stages of the C. difficile infection cycle and can serve as a valuable surrogate for studying the spore-mediated transmission and interactions between C. difficile and the host and its microbiota, and the results obtained should guide infection control measures.

    Funded by: Wellcome Trust

    Infection and immunity 2009;77;9;3661-9

  • Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.

    Lawley TD, Croucher NJ, Yu L, Clare S, Sebaihia M, Goulding D, Pickard DJ, Parkhill J, Choudhary J and Dougan G

    Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom.

    Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.

    Funded by: Wellcome Trust

    Journal of bacteriology 2009;191;17;5377-86

  • GLIDERS--a web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs.

    Lawrence R, Day-Williams AG, Mott R, Broxholme J, Cardon LR and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Background: A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 >or= 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers.

    Description: GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range). The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%), distance limits between SNPs (minimum and maximum), r2 (0.3 to 1), HapMap population sample (CEU, YRI and JPT+CHB combined) and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file.

    Conclusion: GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.

    Funded by: Wellcome Trust: 079557, 079557MA, 088885/Z/09/Z

    BMC bioinformatics 2009;10;367

  • An ENU-induced mutation of miR-96 associated with progressive hearing loss in mice.

    Lewis MA, Quint E, Glazier AM, Fuchs H, De Angelis MH, Langford C, van Dongen S, Abreu-Goodger C, Piipari M, Redshaw N, Dalmay T, Moreno-Pelayo MA, Enright AJ and Steel KP

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Progressive hearing loss is common in the human population, but little is known about the molecular basis. We report a new N-ethyl-N-nitrosurea (ENU)-induced mouse mutant, diminuendo, with a single base change in the seed region of Mirn96. Heterozygotes show progressive loss of hearing and hair cell anomalies, whereas homozygotes have no cochlear responses. Most microRNAs are believed to downregulate target genes by binding to specific sites on their mRNAs, so mutation of the seed should lead to target gene upregulation. Microarray analysis revealed 96 transcripts with significantly altered expression in homozygotes; notably, Slc26a5, Ocm, Gfi1, Ptprq and Pitpnm1 were downregulated. Hypergeometric P-value analysis showed that hundreds of genes were upregulated in mutants. Different genes, with target sites complementary to the mutant seed, were downregulated. This is the first microRNA found associated with deafness, and diminuendo represents a model for understanding and potentially moderating progressive hair cell degeneration in hearing loss more generally.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077189, 077198

    Nature genetics 2009;41;5;614-8

  • The Sequence Alignment/Map format and SAMtools.

    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R and 1000 Genome Project Data Processing Subgroup

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK, Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA.

    Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.


    Funded by: NHGRI NIH HHS: R01 HG004719, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, U54 HG002750, U54HG002750; Wellcome Trust: 077192/Z/05/Z

    Bioinformatics (Oxford, England) 2009;25;16;2078-9

  • Chromosomal mobilization and reintegration of Sleeping Beauty and PiggyBac transposons.

    Liang Q, Kong J, Stalker J and Bradley A

    The Sleeping Beauty and PiggyBac DNA transposon systems have recently been developed as tools for insertional mutagenesis. We have compared the chromosomal mobilization efficiency and insertion site preference of the two transposons mobilized from the same donor site in mouse embryonic stem (ES) cells under conditions in which there were no selective constraints on the transposons' insertion sites. Compared with Sleeping Beauty, PiggyBac exhibits higher transposition efficiencies, no evidence for local hopping and a significant bias toward reintegration in intragenic regions, which demonstrate its utility for insertional mutagenesis. Although Sleeping Beauty had no detectable genomic bias with respect to insertions in genes or intergenic regions, both Sleeping Beauty and PiggyBac transposons displayed preferential integration into actively transcribed loci.

    Genesis (New York, N.Y. : 2000) 2009;47;6;404-8

  • Genome-wide association scan meta-analysis identifies three Loci influencing adiposity and fat distribution.

    Lindgren CM, Heid IM, Randall JC, Lamina C, Steinthorsdottir V, Qi L, Speliotes EK, Thorleifsson G, Willer CJ, Herrera BM, Jackson AU, Lim N, Scheet P, Soranzo N, Amin N, Aulchenko YS, Chambers JC, Drong A, Luan J, Lyon HN, Rivadeneira F, Sanna S, Timpson NJ, Zillikens MC, Zhao JH, Almgren P, Bandinelli S, Bennett AJ, Bergman RN, Bonnycastle LL, Bumpstead SJ, Chanock SJ, Cherkas L, Chines P, Coin L, Cooper C, Crawford G, Doering A, Dominiczak A, Doney AS, Ebrahim S, Elliott P, Erdos MR, Estrada K, Ferrucci L, Fischer G, Forouhi NG, Gieger C, Grallert H, Groves CJ, Grundy S, Guiducci C, Hadley D, Hamsten A, Havulinna AS, Hofman A, Holle R, Holloway JW, Illig T, Isomaa B, Jacobs LC, Jameson K, Jousilahti P, Karpe F, Kuusisto J, Laitinen J, Lathrop GM, Lawlor DA, Mangino M, McArdle WL, Meitinger T, Morken MA, Morris AP, Munroe P, Narisu N, Nordström A, Nordström P, Oostra BA, Palmer CN, Payne F, Peden JF, Prokopenko I, Renström F, Ruokonen A, Salomaa V, Sandhu MS, Scott LJ, Scuteri A, Silander K, Song K, Yuan X, Stringham HM, Swift AJ, Tuomi T, Uda M, Vollenweider P, Waeber G, Wallace C, Walters GB, Weedon MN, Wellcome Trust Case Control Consortium, Witteman JC, Zhang C, Zhang W, Caulfield MJ, Collins FS, Davey Smith G, Day IN, Franks PW, Hattersley AT, Hu FB, Jarvelin MR, Kong A, Kooner JS, Laakso M, Lakatta E, Mooser V, Morris AD, Peltonen L, Samani NJ, Spector TD, Strachan DP, Tanaka T, Tuomilehto J, Uitterlinden AG, van Duijn CM, Wareham NJ, Hugh Watkins, Procardis Consortia, Waterworth DM, Boehnke M, Deloukas P, Groop L, Hunter DJ, Thorsteinsdottir U, Schlessinger D, Wichmann HE, Frayling TM, Abecasis GR, Hirschhorn JN, Loos RJ, Stefansson K, Mohlke KL, Barroso I, McCarthy MI and Giant Consortium

    Wellcome Trust Centre for Human Genetics, University of Oxford, , Oxford, United Kingdom.

    To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.

    Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation; Intramural NIH HHS: Z01 HG000024; Medical Research Council: 0600705, G0000649, G0000934, G0500539, G0600705, G0601261, G0701863, G0801056, G9521010, G9521010D, MC_QA137934, MC_U106188470, MC_UP_A620_1014; NHGRI NIH HHS: N01HG65403, R01 HG002651; NHLBI NIH HHS: HL084729, HL087679, R01 HL087679, U01 HL084729; NIDDK NIH HHS: DK062370, DK067288, DK07191, DK072193, DK075787, DK079466, DK080145, F32 DK079466, F32 DK079466-01, K23 DK067288, K23 DK080145, K23 DK080145-01, R01 DK029867, R01 DK062370, R01 DK072193, R01 DK075787, R56 DK062370, T32 DK007191, U01 DK062370; PHS HHS: G02651; Wellcome Trust: 064890, 068545/Z/02, 081682, 086596/Z/08/Z, 090532, GR069224, GR072960, GR076113

    PLoS genetics 2009;5;6;e1000508

  • HI: haplotype improver using paired-end short reads.

    Long Q, MacArthur D, Ning Z and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs, UK.

    Summary: We present a program to improve haplotype reconstruction by incorporating information from paired-end reads, and demonstrate its utility on simulated data. We find that given a fixed coverage, longer reads (implying fewer of them) are preferable.

    Availability: The executable and user manual can be freely downloaded from

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;18;2436-7

  • Biology of Genomes: making sense of sequence.

    Macarthur DG

    Human Evolution, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    A report on the Biology of Genomes meeting held at Cold Spring Harbor Laboratory, NY, USA, 5-9 May 2009.

    Genome medicine 2009;1;6;61

  • LookSeq: a browser-based viewer for deep sequencing data.

    Manske HM and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Genome research 2009;19;11;2125-32

  • SNP-o-matic.

    Manske HM and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Motivation: High throughput sequencing technologies generate large amounts of short reads. Mapping these to a reference sequence consumes large amounts of processing time and memory, and read mapping errors can lead to noisy or incorrect alignments. SNP-o-matic is a fast, memory-efficient and stringent read mapping tool offering a variety of analytical output functions, with an emphasis on genotyping.


    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;18;2434-5

  • Donor-recipient mismatch for common gene deletion polymorphisms in graft-versus-host disease.

    McCarroll SA, Bradner JE, Turpeinen H, Volin L, Martin PJ, Chilewski SD, Antin JH, Lee SJ, Ruutu T, Storer B, Warren EH, Zhang B, Zhao LP, Ginsburg D, Soiffer RJ, Partanen J, Hansen JA, Ritz J, Palotie A and Altshuler D

    Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA.

    Transplantation and pregnancy, in which two diploid genomes reside in one body, can each lead to diseases in which immune cells from one individual target antigens encoded in the other's genome. One such disease, graft-versus-host disease (GVHD) after hematopoietic stem cell transplantation (HSCT, or bone marrow transplant), is common even after transplants between HLA-identical siblings, indicating that cryptic histocompatibility loci exist outside the HLA locus. The immune system of an individual whose genome is homozygous for a gene deletion could recognize epitopes encoded by that gene as alloantigens. Analyzing common gene deletions in three HSCT cohorts (1,345 HLA-identical sibling donor-recipient pairs), we found that risk of acute GVHD was greater (odds ratio (OR) = 2.5; 95% confidence interval (CI) 1.4-4.6) when donor and recipient were mismatched for homozygous deletion of UGT2B17, a gene expressed in GVHD-affected tissues and giving rise to multiple histocompatibility antigens. Human genome structural variation merits investigation as a potential mechanism in diseases of alloimmunity.

    Funded by: NCI NIH HHS: CA18029, P01 CA018029, P01 CA018029-270048, P01 CA018029-349016; NHLBI NIH HHS: HL087690, P01 HL070149, P01 HL070149-05, R01 HL087690, R01 HL087690-03; NIAID NIH HHS: AI29530, AI33484, P01 AI029530, P01 AI029530-130007, P01 AI033484, P01 AI033484-13, U19 AI029530; PHS HHS: HA070149

    Nature genetics 2009;41;12;1341-4

  • Microduplications of 16p11.2 are associated with schizophrenia.

    McCarthy SE, Makarov V, Kirov G, Addington AM, McClellan J, Yoon S, Perkins DO, Dickel DE, Kusenda M, Krastoshevsky O, Krause V, Kumar RA, Grozeva D, Malhotra D, Walsh T, Zackai EH, Kaplan P, Ganesh J, Krantz ID, Spinner NB, Roccanova P, Bhandari A, Pavon K, Lakshmi B, Leotta A, Kendall J, Lee YH, Vacic V, Gary S, Iakoucheva LM, Crow TJ, Christian SL, Lieberman JA, Stroup TS, Lehtimäki T, Puura K, Haldeman-Englert C, Pearl J, Goodell M, Willour VL, Derosse P, Steele J, Kassem L, Wolff J, Chitkara N, McMahon FJ, Malhotra AK, Potash JB, Schulze TG, Nöthen MM, Cichon S, Rietschel M, Leibenluft E, Kustanovich V, Lajonchere CM, Sutcliffe JS, Skuse D, Gill M, Gallagher L, Mendell NR, Wellcome Trust Case Control Consortium, Craddock N, Owen MJ, O'Donovan MC, Shaikh TH, Susser E, Delisi LE, Sullivan PF, Deutsch CK, Rapoport J, Levy DL, King MC and Sebat J

    Recurrent microdeletions and microduplications of a 600-kb genomic region of chromosome 16p11.2 have been implicated in childhood-onset developmental disorders. We report the association of 16p11.2 microduplications with schizophrenia in two large cohorts. The microduplication was detected in 12/1,906 (0.63%) cases and 1/3,971 (0.03%) controls (P = 1.2 x 10(-5), OR = 25.8) from the initial cohort, and in 9/2,645 (0.34%) cases and 1/2,420 (0.04%) controls (P = 0.022, OR = 8.3) of the replication cohort. The 16p11.2 microduplication was associated with a 14.5-fold increased risk of schizophrenia (95% CI (3.3, 62)) in the combined sample. A meta-analysis of datasets for multiple psychiatric disorders showed a significant association of the microduplication with schizophrenia (P = 4.8 x 10(-7)), bipolar disorder (P = 0.017) and autism (P = 1.9 x 10(-7)). In contrast, the reciprocal microdeletion was associated only with autism and developmental disorders (P = 2.3 x 10(-13)). Head circumference was larger in patients with the microdeletion than in patients with the microduplication (P = 0.0007).

    Funded by: Intramural NIH HHS: ZIA MH002581-19; Medical Research Council: G0800509; NCRR NIH HHS: M01 RR000037, RR000037; NICHD NIH HHS: HD04147, P30 HD004147; NIDCR NIH HHS: DE016442, R41 DE016442, R42 DE016442; NIGMS NIH HHS: GM081519, R01 GM081519; NIMH NIH HHS: 1U24MH081810, K99 MH086756, K99 MH086756-01, K99 MH086756-02, MH061009, MH071523, MH074027, MH076431, MH077139, MH081810, MH083989, MH31340, MH44245, N01 MH90001, R00 MH086756, R00 MH086756-03, R01 MH031340, R01 MH061009, R01 MH071523, R01 MH074027, R01 MH076431, R01 MH077139, R01 MH083989, R01 MH091350, U24 MH081810; PHS HHS: HF004222; Wellcome Trust: 076113

    Nature genetics 2009;41;11;1223-7

  • Regulation of the Epstein-Barr virus Zp promoter in B lymphocytes during reactivation from latency.

    McDonald C, Karstegl CE, Kellam P and Farrell PJ

    Department of Virology, Imperial College Faculty of Medicine, St Mary's Campus, London W2 1PG, UK.

    Ten novel mutations were introduced into the Zp promoter to test the role of sequences outside the established transcription factor-binding sites in Epstein-Barr virus (EBV) reactivation. Most of these had only small effects, but mutations in the ZID site were shown to reduce Zp activity strongly at early times after induction by anti-immunoglobulin (anti-Ig). The binding of MEF2 transcription factor to ZID was characterized in detail and linked functionally to Zp promoter activity. The presence of XBP-1s, the active form of XBP-1, after administration of anti-Ig to Akata Burkitt's lymphoma cells is consistent with a role for this factor in reactivation of the EBV lytic cycle, although signalling through MEF2D was quantitatively much more significant in activation of Zp. Silencing of Zp during latency is thought to be primarily a consequence of a repressive chromatin structure on Zp, and this aspect of Zp regulation can be observed in the Akata genome through protection of Zp from activation by BZLF1 in the absence of signalling from the B-cell receptor.

    The Journal of general virology 2009;91;Pt 3;622-9

  • Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss.

    Mencía A, Modamio-Høybjør S, Redshaw N, Morín M, Mayo-Merino F, Olavarrieta L, Aguirre LA, del Castillo I, Steel KP, Dalmay T, Moreno F and Moreno-Pelayo MA

    Unidad de Genética Molecular, Hospital Ramón y Cajal, Madrid, Spain.

    MicroRNAs (miRNAs) bind to complementary sites in their target mRNAs to mediate post-transcriptional repression, with the specificity of target recognition being crucially dependent on the miRNA seed region. Impaired miRNA target binding resulting from SNPs within mRNA target sites has been shown to lead to pathologies associated with dysregulated gene expression. However, no pathogenic mutations within the mature sequence of a miRNA have been reported so far. Here we show that point mutations in the seed region of miR-96, a miRNA expressed in hair cells of the inner ear, result in autosomal dominant, progressive hearing loss. This is the first study implicating a miRNA in a mendelian disorder. The identified mutations have a strong impact on miR-96 biogenesis and result in a significant reduction of mRNA targeting. We propose that these mutations alter the regulatory role of miR-96 in maintaining gene expression profiles in hair cells required for their normal function.

    Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212; Wellcome Trust

    Nature genetics 2009;41;5;609-13

  • Genetic structure of nomadic Bedouin from Kuwait.

    Mohammad T, Xue Y, Evison M and Tyler-Smith C

    Division of Genomic Medicine, University of Sheffield, Sheffield, UK.

    Bedouin are traditionally nomadic inhabitants of the Persian Gulf who claim descent from two male lineages: Adnani and Qahtani. We have investigated whether or not this tradition is reflected in the current genetic structure of a sample of 153 Bedouin males from six Kuwaiti tribes, including three tribes from each traditional lineage. Volunteers were genotyped using a panel of autosomal and Y-STRs, and Y-SNPs. The samples clustered with their geographical neighbours in both the autosomal and Y-chromosomal analyses, and showed strong evidence of genetic isolation and drift. Although there was no evidence of segregation into the two male lineages, other aspects of genetic structure were in accord with tradition.

    Funded by: Wellcome Trust: 077009

    Heredity 2009;103;5;425-33

  • Novel genes in cell cycle control and lipid metabolism with dynamically regulated binding sites for sterol regulatory element-binding protein 1 and RNA polymerase II in HepG2 cells detected by chromatin immunoprecipitation with microarray detection.

    Motallebipour M, Enroth S, Punga T, Ameur A, Koch C, Dunham I, Komorowski J, Ericsson J and Wadelius C

    Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden.

    Sterol regulatory element-binding proteins 1 and 2 (SREBP-1 and SREBP-2) are important regulators of genes involved in cholesterol and fatty acid metabolism, but have also been implicated in the regulation of the cell cycle and have been associated with the pathogenesis of type 2 diabetes, atherosclerosis and obesity, among others. In this study, we aimed to characterize the binding sites of SREBP-1 and RNA polymerase II through chromatin immunoprecipitation and microarray analysis in 1% of the human genome, as defined by the Encyclopaedia of DNA Elements consortium, in a hepatocellular carcinoma cell line (HepG2). Our data identified novel binding sites for SREBP-1 in genes directly or indirectly involved in cholesterol metabolism, e.g. apolipoprotein C-III (APOC3). The most interesting biological findings were the binding sites for SREBP-1 in genes for host cell factor C1 (HCFC1), involved in cell cycle regulation, and for filamin A (FLNA). For RNA polymerase II, we found binding sites at classical promoters, but also in intergenic and intragenic regions. Furthermore, we found evidence of sterol-regulated binding of SREBP-1 and RNA polymerase II to HCFC1 and FLNA. From the results of this work, we infer that SREBP-1 may be involved in processes other than lipid metabolism.

    The FEBS journal 2009;276;7;1878-90

  • Abnormal behavior in a chromosome-engineered mouse model for human 15q11-13 duplication seen in autism.

    Nakatani J, Tamada K, Hatanaka F, Ise S, Ohta H, Inoue K, Tomonaga S, Watanabe Y, Chung YJ, Banerjee R, Iwamoto K, Kato T, Okazawa M, Yamauchi K, Tanda K, Takao K, Miyakawa T, Bradley A and Takumi T

    Osaka Bioscience Institute, Suita, Osaka 565-0874, Japan.

    Substantial evidence suggests that chromosomal abnormalities contribute to the risk of autism. The duplication of human chromosome 15q11-13 is known to be the most frequent cytogenetic abnormality in autism. We have modeled this genetic change in mice by using chromosome engineering to generate a 6.3 Mb duplication of the conserved linkage group on mouse chromosome 7. Mice with a paternal duplication display poor social interaction, behavioral inflexibility, abnormal ultrasonic vocalizations, and correlates of anxiety. An increased MBII52 snoRNA within the duplicated region, affecting the serotonin 2c receptor (5-HT2cR), correlates with altered intracellular Ca(2+) responses elicited by a 5-HT2cR agonist in neurons of mice with a paternal duplication. This chromosome-engineered mouse model for autism seems to replicate various aspects of human autistic phenotypes and validates the relevance of the human chromosome abnormality. This model will facilitate forward genetics of developmental brain disorders and serve as an invaluable tool for therapeutic development.

    Cell 2009;137;7;1235-46

  • Genome-wide association study identifies eight loci associated with blood pressure.

    Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, Najjar SS, Zhao JH, Heath SC, Eyheramendy S, Papadakis K, Voight BF, Scott LJ, Zhang F, Farrall M, Tanaka T, Wallace C, Chambers JC, Khaw KT, Nilsson P, van der Harst P, Polidoro S, Grobbee DE, Onland-Moret NC, Bots ML, Wain LV, Elliott KS, Teumer A, Luan J, Lucas G, Kuusisto J, Burton PR, Hadley D, McArdle WL, Wellcome Trust Case Control Consortium, Brown M, Dominiczak A, Newhouse SJ, Samani NJ, Webster J, Zeggini E, Beckmann JS, Bergmann S, Lim N, Song K, Vollenweider P, Waeber G, Waterworth DM, Yuan X, Groop L, Orho-Melander M, Allione A, Di Gregorio A, Guarrera S, Panico S, Ricceri F, Romanazzi V, Sacerdote C, Vineis P, Barroso I, Sandhu MS, Luben RN, Crawford GJ, Jousilahti P, Perola M, Boehnke M, Bonnycastle LL, Collins FS, Jackson AU, Mohlke KL, Stringham HM, Valle TT, Willer CJ, Bergman RN, Morken MA, Döring A, Gieger C, Illig T, Meitinger T, Org E, Pfeufer A, Wichmann HE, Kathiresan S, Marrugat J, O'Donnell CJ, Schwartz SM, Siscovick DS, Subirana I, Freimer NB, Hartikainen AL, McCarthy MI, O'Reilly PF, Peltonen L, Pouta A, de Jong PE, Snieder H, van Gilst WH, Clarke R, Goel A, Hamsten A, Peden JF, Seedorf U, Syvänen AC, Tognoni G, Lakatta EG, Sanna S, Scheet P, Schlessinger D, Scuteri A, Dörr M, Ernst F, Felix SB, Homuth G, Lorbeer R, Reffelmann T, Rettig R, Völker U, Galan P, Gut IG, Hercberg S, Lathrop GM, Zelenika D, Deloukas P, Soranzo N, Williams FM, Zhai G, Salomaa V, Laakso M, Elosua R, Forouhi NG, Völzke H, Uiterwaal CS, van der Schouw YT, Numans ME, Matullo G, Navis G, Berglund G, Bingham SA, Kooner JS, Connell JM, Bandinelli S, Ferrucci L, Watkins H, Spector TD, Tuomilehto J, Altshuler D, Strachan DP, Laan M, Meneton P, Wareham NJ, Uda M, Jarvelin MR, Mooser V, Melander O, Loos RJ, Elliott P, Abecasis GR, Caulfield M and Munroe PB

    Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA.

    Elevated blood pressure is a common, heritable cause of cardiovascular disease worldwide. To date, identification of common genetic variants influencing blood pressure has proven challenging. We tested 2.5 million genotyped and imputed SNPs for association with systolic and diastolic blood pressure in 34,433 subjects of European ancestry from the Global BPgen consortium and followed up findings with direct genotyping (N ≤ 71,225 European ancestry, N ≤ 12,889 Indian Asian ancestry) and in silico comparison (CHARGE consortium, N = 29,136). We identified association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2 (P = 1 × 10(-23)), FGF5 (P = 1 × 10(-21)), SH2B3 (P = 3 × 10(-18)), MTHFR (P = 2 × 10(-13)), c10orf107 (P = 1 × 10(-9)), ZNF652 (P = 5 × 10(-9)) and PLCD3 (P = 1 × 10(-8)) genes. All variants associated with continuous blood pressure were associated with dichotomous hypertension. These associations between common variants and blood pressure and hypertension offer mechanistic insights into the regulation of blood pressure and may point to novel targets for interventions to prevent cardiovascular disease.

    Funded by: British Heart Foundation: FS/05/061/19501, PG02/128, SP/04/002; Cancer Research UK: 10589; Chief Scientist Office: CZB/4/540; Intramural NIH HHS: Z01 HG000024; Medical Research Council: 85374, G0000934, G0400874, G0401527, G0501942, G0600329, G0701863, G0800675, G0800759, G0801056, G9521010, G9521010D, MC_QA137934, MC_U105630924, MC_U106188470, MC_U137686857; NCRR NIH HHS: U54 RR020278, U54RR020278; NHGRI NIH HHS: 1Z01HG000024; NHLBI NIH HHS: K23 HL080025, K23 HL080025-04, K23 HL083102, K23HL083102, K23HL80025, R01 HL056931-02, R01 HL056931-03, R01 HL056931-04, R01 HL087676, R01 HL087679, R01HL056931, R01HL087676, R01HL087679; NIA NIH HHS: N01-AG-1-2109, N01AG-821336, N01AG-916413; NICHD NIH HHS: N01-HD-1-3107; NIDA NIH HHS: U54 DA021519, U54DA021519; NIDDK NIH HHS: DK062370, DK072193, R01 DK062370, R01 DK072193, R56 DK062370, U01 DK062370, U01 DK062418, U01DK062418; NIEHS NIH HHS: P30 ES007033, P30ES007033; NIMH NIH HHS: RL1 MH083268, RL1MH083268; NIMHD NIH HHS: 263MD821336, 263MD916413; PHS HHS: 263-MA-410953; Wellcome Trust: 061858, 068545/Z/02, 070191/Z/03/Z, 076113, 076113/B/04/Z, 077011, 077016, 077016/Z/05/Z, 079557, 079895, 088885, 089061, 090532, WT088885/Z/09/Z

    Nature genetics 2009;41;6;666-76

  • Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies.

    Nolte IM, Wallace C, Newhouse SJ, Waggott D, Fu J, Soranzo N, Gwilliam R, Deloukas P, Savelieva I, Zheng D, Dalageorgou C, Farrall M, Samani NJ, Connell J, Brown M, Dominiczak A, Lathrop M, Zeggini E, Wain LV, Wellcome Trust Case Control Consortium, DCCT/EDIC Research Group, Newton-Cheh C, Eijgelsheim M, Rice K, de Bakker PI, QTGEN consortium, Pfeufer A, Sanna S, Arking DE, QTSCD consortium, Asselbergs FW, Spector TD, Carter ND, Jeffery S, Tobin M, Caulfield M, Snieder H, Paterson AD, Munroe PB and Jamshidi Y

    Unit of Genetic Epidemiology and Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands.

    To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the TwinsUK and BRIGHT cohorts in the UK and the DCCT/EDIC cohort from North America. Five loci were significantly associated with QT interval at P<1x10(-6). To validate these findings we performed an in silico comparison with data from two QT consortia: QTSCD (n = 15,842) and QTGEN (n = 13,685). Analysis confirmed the association between common variants near NOS1AP (P = 1.4x10(-83)) and the phospholamban (PLN) gene (P = 1.9x10(-29)). The most associated SNP near NOS1AP (rs12143842) explains 0.82% variance; the SNP near PLN (rs11153730) explains 0.74% variance of QT interval duration. We found no evidence for interaction between these two SNPs (P = 0.99). PLN is a key regulator of cardiac diastolic function and is involved in regulating intracellular calcium cycling, it has only recently been identified as a susceptibility locus for QT interval. These data offer further mechanistic insights into genetic influence on the QT interval which may predispose to life threatening arrhythmias and sudden cardiac death.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: 06/094, FS/05/061/19501, PG02/128, SP/02/001; Department of Health; Medical Research Council: G0400874, G0501942, G9521010, G9521010D; NCRR NIH HHS: UL1 RR025005, UL1RR025005; NHGRI NIH HHS: U01 HG004402, U01HG004402; NHLBI NIH HHS: HL054512, HL86694, K23 HL080025, K23-HL-080025, N01 HC-55222, N01 HC015103, N01 HC035129, N01 HC045133, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01HC25195, N01HC55015, N01HC55016, N01HC55018, N01HC55019, N01HC55020, N01HC55021, N01HC55022, N01HC55222, N01HC75150, N01HC85079, N01HC85086, N02-HL-6-4278, R01 HL059367, R01 HL086694, R01 HL087641, R01 HL087652, R01HL086694, R01HL087641, R01HL59367, U01 HL054512, U01 HL080295, U10 HL054512; NIA NIH HHS: N01-AG-1-2109; NIDDK NIH HHS: N01-DK-6-2204, R01 DK077510, R01-DK-077510; PHS HHS: 263-MA-410953, HHSN268200625226C; Wellcome Trust: WT088885/Z/09/Z

    PloS one 2009;4;7;e6138

  • Functional genomics in zebrafish permits rapid characterization of novel platelet membrane proteins.

    O'Connor MN, Salles II, Cvejic A, Watkins NA, Walker A, Garner SF, Jones CI, Macaulay IC, Steward M, Zwaginga JJ, Bray SL, Dudbridge F, de Bono B, Goodall AH, Deckmyn H, Stemple DL, Ouwehand WH and Bloodomics Consortium

    Department of Haematology, University of Cambridge, Cambridge, United Kingdom.

    In this study, we demonstrate the suitability of the vertebrate Danio rerio (zebrafish) for functional screening of novel platelet genes in vivo by reverse genetics. Comparative transcript analysis of platelets and their precursor cell, the megakaryocyte, together with nucleated blood cell elements, endothelial cells, and erythroblasts, identified novel platelet membrane proteins with hitherto unknown roles in thrombus formation. We determined the phenotype induced by antisense morpholino oligonucleotide (MO)-based knockdown of 5 of these genes in a laser-induced arterial thrombosis model. To validate the model, the genes for platelet glycoprotein (GP) IIb and the coagulation protein factor VIII were targeted. MO-injected fish showed normal thrombus initiation but severely impaired thrombus growth, consistent with the mouse knockout phenotypes, and concomitant knockdown of both resulted in spontaneous bleeding. Knockdown of 4 of the 5 novel platelet proteins altered arterial thrombosis, as demonstrated by modified kinetics of thrombus initiation and/or development. We identified a putative role for BAMBI and LRRC32 in promotion and DCBLD2 and ESAM in inhibition of thrombus formation. We conclude that phenotypic analysis of MO-injected zebrafish is a fast and powerful method for initial screening of novel platelet proteins for function in thrombosis.

    Funded by: Wellcome Trust: WT077037/Z/05/Z, WT082597/Z/07/Z

    Blood 2009;113;19;4754-62

  • Somatic mutation databases as tools for molecular epidemiology and molecular pathology of cancer: proposed guidelines for improving data collection, distribution, and integration.

    Olivier M, Petitjean A, Teague J, Forbes S, Dunnick JK, den Dunnen JT, Langerød A, Wilkinson JM, Vihinen M, Cotton RG, Hainaut P, IARC and EC FP6

    Group of Molecular Carcinogenesis and Biomarkers, International Agency for Research on Cancer, World Health Organization, Lyon, France.

    There are currently less than 40 locus-specific databases (LSDBs) and one large general database that curate data on somatic mutations in human cancer genes. These databases have different scope and use different annotation standards and database systems, resulting in duplicated efforts in data curation, and making it difficult for users to find clear and consistent information. As data related to somatic mutations are generated at an increasing pace it is urgent to create a framework for improving the collecting of this information and making it more accessible to clinicians, scientists, and epidemiologists to facilitate research on biomarkers. Here we propose a data flow for improving the connectivity between existing databases and we provide practical guidelines for data reporting, database contents, and annotation standards. These proposals are based on common standards recommended by the Human Genome Variation Society (HGVS) with additions related to specific requirements of somatic mutations in cancer. Indeed, somatic mutations may be used in molecular pathology and clinical studies to characterize tumor types, help treatment choice, predict response to treatment and patient outcome, or in epidemiological studies as markers for tumor etiology or exposure assessment. Thus, specific annotations are required to cover these diverse research topics. This initiative is meant to promote collaboration and discussion on these issues and the development of adequate resources that would avoid the loss of extremely valuable information generated by years of basic and clinical research.

    Human mutation 2009;30;3;275-82

  • Genetic variation in LIN28B is associated with the timing of puberty.

    Ong KK, Elks CE, Li S, Zhao JH, Luan J, Andersen LB, Bingham SA, Brage S, Smith GD, Ekelund U, Gillson CJ, Glaser B, Golding J, Hardy R, Khaw KT, Kuh D, Luben R, Marcus M, McGeehin MA, Ness AR, Northstone K, Ring SM, Rubin C, Sims MA, Song K, Strachan DP, Vollenweider P, Waeber G, Waterworth DM, Wong A, Deloukas P, Barroso I, Mooser V, Loos RJ and Wareham NJ

    Medical Research Council (MRC) Epidemiology Unit, Addenbrooke's Hospital, Cambridge, UK.

    The timing of puberty is highly variable. We carried out a genome-wide association study for age at menarche in 4,714 women and report an association in LIN28B on chromosome 6 (rs314276, minor allele frequency (MAF) = 0.33, P = 1.5 × 10(-8)). In independent replication studies in 16,373 women, each major allele was associated with 0.12 years earlier menarche (95% CI = 0.08-0.16; P = 2.8 × 10(-10); combined P = 3.6 × 10(-16)). This allele was also associated with earlier breast development in girls (P = 0.001; N = 4,271); earlier voice breaking (P = 0.006, N = 1,026) and more advanced pubic hair development in boys (P = 0.01; N = 4,588); a faster tempo of height growth in girls (P = 0.00008; N = 4,271) and boys (P = 0.03; N = 4,588); and shorter adult height in women (P = 3.6 × 10(-7); N = 17,274) and men (P = 0.006; N = 9,840) in keeping with earlier growth cessation. These studies identify variation in LIN28B, a potent and specific regulator of microRNA processing, as the first genetic determinant regulating the timing of human pubertal growth and development.

    Funded by: Cancer Research UK; Medical Research Council: 73437, G0000934, G0401527, G0401527(74922), G0701863, G9815508, MC_U105630924, MC_U106179471, MC_U106179472, MC_U106179473, MC_U106188470, MC_U123092720, MC_U123092721, U.1061.00.001 (79471), U.1061.00.004(79472); Wellcome Trust: 068049, 068545/Z/02, 076467/Z/05/Z, 077011, 077016, 077016/Z/05/Z, 079996

    Nature genetics 2009;41;6;729-33

  • Combined effects of three independent SNPs greatly increase the risk estimate for RA at 6q23.

    Orozco G, Hinks A, Eyre S, Ke X, Gibbons LJ, Bowes J, Flynn E, Martin P, Wellcome Trust Case Control Consortium, YEAR consortium, Wilson AG, Bax DE, Morgan AW, Emery P, Steer S, Hocking L, Reid DM, Wordsworth P, Harrison P, Thomson W, Barton A and Worthington J

    arc-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester M13 9PT, UK.

    The most consistent finding derived from the WTCCC GWAS for rheumatoid arthritis (RA) was association to a SNP at 6q23. We performed a fine-mapping of the region in order to search the 6q23 region for additional disease variants. 3962 RA patients and 3531 healthy controls were included in the study. We found 18 SNPs associated with RA. The SNP showing the strongest association was rs6920220 [P = 2.6 x 10(-6), OR (95% CI) 1.22 (1.13-1.33)]. The next most strongly associated SNP was rs13207033 [P = 0.0001, OR (95% CI) 0.86 (0.8-0.93)] which was perfectly correlated with rs10499194, a SNP previously associated with RA in a US/European series. Additionally, we found a number of new potential RA markers, including rs5029937, located in the intron 2 of TNFAIP3. Of the 18 associated SNPs, three polymorphisms, rs6920220, rs13207033 and rs5029937, remained significant after conditional logistic regression analysis. The combination of the carriage of both risk alleles of rs6920220 and rs5029937 together with the absence of the protective allele of rs13207033 was strongly associated with RA when compared with carriage of none [OR of 1.86 (95% CI) (1.51-2.29)]. This equates to an effect size of 1.50 (95% CI 1.21-1.85) compared with controls and is higher than that obtained for any SNP individually. This is the first study to show that the confirmed loci from the GWA studies, that confer only a modest effect size, could harbour a significantly greater effect once the effect of additional risk variants are accounted for.

    Funded by: Arthritis Research UK: 17552, 18475; Medical Research Council: G0000934, G0600329; Versus Arthritis: 18475; Wellcome Trust: 061858, 068545/Z/02, 090532

    Human molecular genetics 2009;18;14;2693-9

  • Ethical data release in genome-wide association studies in developing countries.

    Parker M, Bull SJ, de Vries J, Agbenyega T, Doumbo OK and Kwiatkowski DP

    Ethox Centre, University of Oxford, Oxford, United Kingdom.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; PHS HHS: 566; Wellcome Trust: 077383/Z/05/Z, 087285/Z/08/Z

    PLoS medicine 2009;6;11;e1000143

  • A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose.

    Paterson AD, Waggott D, Boright AP, Hosseini SM, Shen E, Sylvestre MP, Wong I, Bharaj B, Cleary PA, Lachin JM, MAGIC (Meta-Analyses of Glucose and Insulin-related traits Consortium), Below JE, Nicolae D, Cox NJ, Canty AJ, Sun L, Bull SB and Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group

    Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto, Canada.

    Objective: Glycemia is a major risk factor for the development of long-term complications in type 1 diabetes; however, no specific genetic loci have been identified for glycemic control in individuals with type 1 diabetes. To identify such loci in type 1 diabetes, we analyzed longitudinal repeated measures of A1C from the Diabetes Control and Complications Trial.

    Research design and methods: We performed a genome-wide association study using the mean of quarterly A1C values measured over 6.5 years, separately in the conventional (n = 667) and intensive (n = 637) treatment groups of the DCCT. At loci of interest, linear mixed models were used to take advantage of all the repeated measures. We then assessed the association of these loci with capillary glucose and repeated measures of multiple complications of diabetes.

    Results: We identified a major locus for A1C levels in the conventional treatment group near SORCS1 (10q25.1, P = 7 x 10(-10)), which was also associated with mean glucose (P = 2 x 10(-5)). This was confirmed using A1C in the intensive treatment group (P = 0.01). Other loci achieved evidence close to genome-wide significance: 14q32.13 (GSC) and 9p22 (BNC2) in the combined treatment groups and 15q21.3 (WDR72) in the intensive group. Further, these loci gave evidence for association with diabetic complications, specifically SORCS1 with hypoglycemia and BNC2 with renal and retinal complications. We replicated the SORCS1 association in Genetics of Diabetes in Kidneys (GoKinD) study control subjects (P = 0.01) and the BNC2 association with A1C in nondiabetic individuals.

    Conclusions: A major locus for A1C and glucose in individuals with diabetes is near SORCS1. This may influence the design and analysis of genetic studies attempting to identify risk factors for long-term diabetic complications.

    Funded by: Canadian Institutes of Health Research; NIDDK NIH HHS: N01-DK-6-2204, P60 DK020595, P60-DK20595, R01 DK077489, R01 DK077510, R01-DK-077510, R01-DK077489

    Diabetes 2009;59;2;539-49

  • A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

    Funded by: Wellcome Trust

    PLoS genetics 2009;5;7;e1000569

  • Meta-analysis of genome-wide association data identifies two loci influencing age at menarche.

    Perry JR, Stolk L, Franceschini N, Lunetta KL, Zhai G, McArdle PF, Smith AV, Aspelund T, Bandinelli S, Boerwinkle E, Cherkas L, Eiriksdottir G, Estrada K, Ferrucci L, Folsom AR, Garcia M, Gudnason V, Hofman A, Karasik D, Kiel DP, Launer LJ, van Meurs J, Nalls MA, Rivadeneira F, Shuldiner AR, Singleton A, Soranzo N, Tanaka T, Visser JA, Weedon MN, Wilson SG, Zhuang V, Streeten EA, Harris TB, Murray A, Spector TD, Demerath EW, Uitterlinden AG and Murabito JM

    Institute of Biomedical and Clinical Science, Peninsula Medical School, Exeter, UK.

    We conducted a meta-analysis of genome-wide association data to detect genes influencing age at menarche in 17,510 women. The strongest signal was at 9q31.2 (P = 1.7 × 10(-9)), where the nearest genes include TMEM38B, FKTN, FSD1L, TAL2 and ZNF462. The next best signal was near the LIN28B gene (rs7759938; P = 7.0 × 10(-9)), which also influences adult height. We provide the first evidence for common genetic variants influencing female sexual maturation.

    Funded by: Intramural NIH HHS; NCRR NIH HHS: M01 RR 16500, M01 RR016500-02; NHLBI NIH HHS: N01 HC025195, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02 HL64278, N02-HL-6-4278, U01 HL072515, U01 HL072515-06, U01 HL72515; NIA NIH HHS: N.1-AG-1-1, N.1-AG-1-2111, N01 AG012100, N01-AG-12100, N01-AG-5-0002, R01 AR/AG 41398, R21 AG032598, R21 AG032598-02, R21AG032598, U19 AG023122, U19 AG023122-05; NIAMS NIH HHS: R01 AR041398, R01 AR041398-15; NIDDK NIH HHS: P30 DK072488, P30 DK072488-02; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164, 263 MD821336, 263 MD9164 13; Wellcome Trust

    Nature genetics 2009;41;6;648-50

  • Genetic evidence that raised sex hormone binding globulin (SHBG) levels reduce the risk of type 2 diabetes.

    Perry JR, Weedon MN, Langenberg C, Jackson AU, Lyssenko V, Sparsø T, Thorleifsson G, Grallert H, Ferrucci L, Maggio M, Paolisso G, Walker M, Palmer CN, Payne F, Young E, Herder C, Narisu N, Morken MA, Bonnycastle LL, Owen KR, Shields B, Knight B, Bennett A, Groves CJ, Ruokonen A, Jarvelin MR, Pearson E, Pascoe L, Ferrannini E, Bornstein SR, Stringham HM, Scott LJ, Kuusisto J, Nilsson P, Neptin M, Gjesing AP, Pisinger C, Lauritzen T, Sandbaek A, Sampson M, MAGIC, Zeggini E, Lindgren CM, Steinthorsdottir V, Thorsteinsdottir U, Hansen T, Schwarz P, Illig T, Laakso M, Stefansson K, Morris AD, Groop L, Pedersen O, Boehnke M, Barroso I, Wareham NJ, Hattersley AT, McCarthy MI and Frayling TM

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Magdalen Road, Exeter, UK.

    Epidemiological studies consistently show that circulating sex hormone binding globulin (SHBG) levels are lower in type 2 diabetes patients than non-diabetic individuals, but the causal nature of this association is controversial. Genetic studies can help dissect causal directions of epidemiological associations because genotypes are much less likely to be confounded, biased or influenced by disease processes. Using this Mendelian randomization principle, we selected a common single nucleotide polymorphism (SNP) near the SHBG gene, rs1799941, that is strongly associated with SHBG levels. We used data from this SNP, or closely correlated SNPs, in 27 657 type 2 diabetes patients and 58 481 controls from 15 studies. We then used data from additional studies to estimate the difference in SHBG levels between type 2 diabetes patients and controls. The SHBG SNP rs1799941 was associated with type 2 diabetes [odds ratio (OR) 0.94, 95% CI: 0.91, 0.97; P = 2 x 10(-5)], with the SHBG raising allele associated with reduced risk of type 2 diabetes. This effect was very similar to that expected (OR 0.92, 95% CI: 0.88, 0.96), given the SHBG-SNP versus SHBG levels association (SHBG levels are 0.2 standard deviations higher per copy of the A allele) and the SHBG levels versus type 2 diabetes association (SHBG levels are 0.23 standard deviations lower in type 2 diabetic patients compared to controls). Results were very similar in men and women. There was no evidence that this variant is associated with diabetes-related intermediate traits, including several measures of insulin secretion and resistance. Our results, together with those from another recent genetic study, strengthen evidence that SHBG and sex hormones are involved in the aetiology of type 2 diabetes.

    Funded by: Department of Health: DHCS/07/07/008; Intramural NIH HHS; Medical Research Council: G0000649, G016121, G0601261, MC_U106179471; NHGRI NIH HHS: 1 Z01 HG000024; NIA NIH HHS: R01 AG24233-0; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK069922, DK072193; Wellcome Trust: 076113, 077016/Z/05/Z, 083270/Z/07/Z, 090532, GR072960

    Human molecular genetics 2009;19;3;535-44

  • Agouti C57BL/6N embryonic stem cells for mouse genetic resources.

    Pettitt SJ, Liang Q, Rairdan XY, Moran JL, Prosser HM, Beier DR, Lloyd KC, Bradley A and Skarnes WC

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    We report the characterization of a highly germline competent C57BL/6N mouse embryonic stem cell line, JM8. To simplify breeding schemes, the dominant agouti coat color gene was restored in JM8 cells by targeted repair of the C57BL/6 nonagouti mutation. These cells provide a robust foundation for large-scale mouse knockout programs that aim to provide a public resource of targeted mutations in the C57BL/6 genetic background.

    Funded by: NHGRI NIH HHS: U01 HG004080, U01-HG004080; NIH HHS: UM1 OD023221; PHS HHS: U01-42430; Wellcome Trust: 077188, WT077187

    Nature methods 2009;6;7;493-5

  • The Citrobacter rodentium genome sequence reveals convergent evolution with human pathogenic Escherichia coli.

    Petty NK, Bulgin R, Crepin VF, Cerdeño-Tárraga AM, Schroeder GN, Quail MA, Lennard N, Corton C, Barron A, Clark L, Toribio AL, Parkhill J, Dougan G, Frankel G and Thomson NR

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.

    Funded by: Medical Research Council: G0700823

    Journal of bacteriology 2009;192;2;525-38

  • Preparation of bacteriophage lysates and pure DNA.

    Pickard DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Preparation of pure bacteriophage DNA used to rely on using CsCl gradients to give high purity or methods that yielded DNA that was either of low recovery or subject to significant genomic contamination. Recently though, new methods have come along that allow the purification of DNA from plate lysates that are not only capable of high yield but also, for all intents and purposes, free of genomic contamination (i.e. no visible genomic contamination on restriction analysis or when used for bacteriophage sequencing). This protocol that form the basis of this short section can be used to prepare bacteriophage DNA from one or two 9 cm L-agar plates. For these preps, the use of agarose in the top agar is recommended to avoid any restriction inhibitors that may be present in some agar preparations.

    Methods in molecular biology (Clifton, N.J.) 2009;502;3-9

  • A comprehensive catalogue of somatic mutations from a human cancer genome.

    Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, Mudie LJ, Ning Z, Royce T, Schulz-Trieglaff OB, Spiridou A, Stebbings LA, Szajkowski L, Teague J, Williamson D, Chin L, Ross MT, Campbell PJ, Bentley DR, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867

    Nature 2009;463;7278;191-6

  • A small-cell lung cancer genome with complex signatures of tobacco exposure.

    Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, Costa GL, Lee CC, Minna JD, Gazdar A, Birney E, Rhodes MD, McKernan KJ, Stratton MR, Futreal PA and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in frame, and another two lines carrying PVT1-CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.

    Funded by: NCI NIH HHS: P50 CA070907, P50CA70907; Wellcome Trust: 077012, 077012/Z/05/Z, 088340, 093867

    Nature 2009;463;7278;184-90

  • The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

    Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R and Lipman D

    National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA.

    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

    Funded by: Intramural NIH HHS; NHGRI NIH HHS: 1U54HG004555-01, U54 HG004555; Wellcome Trust: 062023, 077198

    Genome research 2009;19;7;1316-23

  • Improved protocols for the illumina genome analyzer sequencing system.

    Quail MA, Swerdlow H and Turner DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    In this unit, we describe a set of improvements we have made to the standard Illumina Genome Analyzer protocols to make the sequencing process more reliable in a high-throughput environment, reduce amplification bias, narrow the distribution of insert sizes, and reliably obtain high yields of data.

    Funded by: Wellcome Trust: 098051, WT079643

    Current protocols in human genetics 2009;Chapter 18;Unit 18.2

  • A genome-wide association study of testicular germ cell tumor.

    Rapley EA, Turnbull C, Al Olama AA, Dermitzakis ET, Linger R, Huddart RA, Renwick A, Hughes D, Hines S, Seal S, Morrison J, Nsengimana J, Deloukas P, UK Testicular Cancer Collaboration, Rahman N, Bishop DT, Easton DF and Stratton MR

    Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, UK.

    We conducted a genome-wide association study for testicular germ cell tumor (TGCT), genotyping 307,666 SNPs in 730 cases and 1,435 controls from the UK and replicating associations in a further 571 cases and 1,806 controls. We found strong evidence for susceptibility loci on chromosome 5 (per allele OR = 1.37 (95% CI = 1.19-1.58), P = 3 x 10(-13)), chromosome 6 (OR = 1.50 (95% CI = 1.28-1.75), P = 10(-13)) and chromosome 12 (OR = 2.55 (95% CI = 2.05-3.19), P = 10(-31)). KITLG, encoding the ligand for the receptor tyrosine kinase KIT, which has previously been implicated in the pathogenesis of TGCT and the biology of germ cells, may explain the association on chromosome 12.

    Funded by: Cancer Research UK: 10118, 10589, 11022, A4994; Medical Research Council: G0000934, G0700491; Wellcome Trust: 068545/Z/02, 077012

    Nature genetics 2009;41;7;807-10

  • MEROPS: the peptidase database.

    Rawlings ND, Barrett AJ and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfil the need for an integrated source of information about these. The database has a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The classification framework is used for attaching information at each level. An important focus of the database has become distinguishing one peptidase from another through identifying the specificity of the peptidase in terms of where it will cleave substrates and with which inhibitors it will interact. We have collected over 39,000 known cleavage sites in proteins, peptides and synthetic substrates. These allow us to display peptidase specificity and alignments of protein substrates to give an indication of how well a cleavage site is conserved, and thus its probable physiological relevance. While the number of new peptidase families and clans has only grown slowly the number of complete genomes has greatly increased. This has allowed us to add an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    Nucleic acids research 2009;38;Database issue;D227-33

  • Replication and extension of genome-wide association study results for obesity in 4923 adults from northern Sweden.

    Renström F, Payne F, Nordström A, Brito EC, Rolandsson O, Hallmans G, Barroso I, Nordström P, Franks PW and GIANT Consortium

    Department of Public Health and Clinical Medicine, Umeå University Hospital, Umeå, Sweden.

    Recent genome-wide association studies (GWAS) have identified multiple risk loci for common obesity (FTO, MC4R, TMEM18, GNPDA2, SH2B1, KCTD15, MTCH2, NEGR1 and PCSK1). Here we extend those studies by examining associations with adiposity and type 2 diabetes in Swedish adults. The nine single nucleotide polymorphisms (SNPs) were genotyped in 3885 non-diabetic and 1038 diabetic individuals with available measures of height, weight and body mass index (BMI). Adipose mass and distribution were objectively assessed using dual-energy X-ray absorptiometry in a sub-group of non-diabetics (n = 2206). In models with adipose mass traits, BMI or obesity as outcomes, the most strongly associated SNP was FTO rs1121980 (P < 0.001). Five other SNPs (SH2B1 rs7498665, MTCH2 rs4752856, MC4R rs17782313, NEGR1 rs2815752 and GNPDA2 rs10938397) were significantly associated with obesity. To summarize the overall genetic burden, a weighted risk score comprising a subset of SNPs was constructed; those in the top quintile of the score were heavier (+2.6 kg) and had more total (+2.4 kg), gynoid (+191 g) and abdominal (+136 g) adipose tissue than those in the lowest quintile (all P < 0.001). The genetic burden score significantly increased diabetes risk, with those in the highest quintile (n = 193/594 cases/controls) being at 1.55-fold (95% CI 1.21-1.99; P < 0.0001) greater risk of type 2 diabetes than those in the lowest quintile (n = 130/655 cases/controls). In summary, we have statistically replicated six of the previously associated obese-risk loci and our results suggest that the weight-inducing effects of these variants are explained largely by increased adipose accumulation.

    Funded by: Wellcome Trust: 090532

    Human molecular genetics 2009;18;8;1489-96

  • Genome-wide association study identifies five loci associated with lung function.

    Repapi E, Sayers I, Wain LV, Burton PR, Johnson T, Obeidat M, Zhao JH, Ramasamy A, Zhai G, Vitart V, Huffman JE, Igl W, Albrecht E, Deloukas P, Henderson J, Granell R, McArdle WL, Rudnicka AR, Wellcome Trust Case Control Consortium, Barroso I, Loos RJ, Wareham NJ, Mustelin L, Rantanen T, Surakka I, Imboden M, Wichmann HE, Grkovic I, Jankovic S, Zgaga L, Hartikainen AL, Peltonen L, Gyllensten U, Johansson A, Zaboli G, Campbell H, Wild SH, Wilson JF, Gläser S, Homuth G, Völzke H, Mangino M, Soranzo N, Spector TD, Polasek O, Rudan I, Wright AF, Heliövaara M, Ripatti S, Pouta A, Naluai AT, Olin AC, Torén K, Cooper MN, James AL, Palmer LJ, Hingorani AD, Wannamethee SG, Whincup PH, Smith GD, Ebrahim S, McKeever TM, Pavord ID, MacLeod AK, Morris AD, Porteous DJ, Cooper C, Dennison E, Shaheen S, Karrasch S, Schnabel E, Schulz H, Grallert H, Bouatia-Naji N, Delplanque J, Froguel P, Blakey JD, NSHD Respiratory Study Team, Britton JR, Morris RW, Holloway JW, Lawlor DA, Hui J, Nyberg F, Jarvelin MR, Jackson C, Kähönen M, Kaprio J, Probst-Hensch NM, Koch B, Hayward C, Evans DM, Elliott P, Strachan DP, Hall IP and Tobin MD

    Departments of Health Sciences and Genetics, Adrian Building, University of Leicester, Leicester, UK.

    Pulmonary function measures are heritable traits that predict morbidity and mortality and define chronic obstructive pulmonary disease (COPD). We tested genome-wide association with forced expiratory volume in 1 s (FEV(1)) and the ratio of FEV(1) to forced vital capacity (FVC) in the SpiroMeta consortium (n = 20,288 individuals of European ancestry). We conducted a meta-analysis of top signals with data from direct genotyping (n < or = 32,184 additional individuals) and in silico summary association data from the CHARGE Consortium (n = 21,209) and the Health 2000 survey (n < or = 883). We confirmed the reported locus at 4q31 and identified associations with FEV(1) or FEV(1)/FVC and common variants at five additional loci: 2q35 in TNS1 (P = 1.11 x 10(-12)), 4q24 in GSTCD (2.18 x 10(-23)), 5q33 in HTR4 (P = 4.29 x 10(-9)), 6p21 in AGER (P = 3.07 x 10(-15)) and 15q23 in THSD4 (P = 7.24 x 10(-15)). mRNA analyses showed expression of TNS1, GSTCD, AGER, HTR4 and THSD4 in human lung tissue. These associations offer mechanistic insight into pulmonary function regulation and indicate potential targets for interventions to alleviate respiratory disease.

    Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation: PG/06/154/22043, PG/97012, RG/08/013/25942; Cancer Research UK; Chief Scientist Office: CZB/4/710, CZD/16/6/2, CZD/16/6/4; Department of Health: 0020029; Medical Research Council: G0000934, G0000943, G0401540, G0500539, G0501942, G0600331, G0600705, G0800582, G0801056, G0902125, G9815508, G990146, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092721, MC_U127561128, MC_UP_A620_1014, U.1230.00.008.00005.02; NHLBI NIH HHS: 5R01HL087679-02, R01 HL087679; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02, R01 MH063706, RL1 MH083268; Wellcome Trust: 068545/Z/02, 075883, 076113/B/04/Z, 077016/Z/05/Z, 079895, 086160/Z/08/A

    Nature genetics 2009;42;1;36-44

  • Comparative genomic analysis of ten Streptococcus pneumoniae temperate bacteriophages.

    Romero P, Croucher NJ, Hiller NL, Hu FZ, Ehrlich GD, Bentley SD, García E and Mitchell TJ

    Division of Infection and Immunity, Glasgow Biomedical Research Centre, University of Glasgow, Glasgow, United Kingdom.

    Streptococcus pneumoniae is an important human pathogen that often carries temperate bacteriophages. As part of a program to characterize the genetic makeup of prophages associated with clinical strains and to assess the potential roles that they play in the biology and pathogenesis in their host, we performed comparative genomic analysis of 10 temperate pneumococcal phages. All of the genomes are organized into five major gene clusters: lysogeny, replication, packaging, morphogenesis, and lysis clusters. All of the phage particles observed showed a Siphoviridae morphology. The only genes that are well conserved in all the genomes studied are those involved in the integration and the lysis of the host in addition to two genes, of unknown function, within the replication module. We observed that a high percentage of the open reading frames contained no similarities to any sequences catalogued in public databases; however, genes that were homologous to known phage virulence genes, including the pblB gene of Streptococcus mitis and the vapE gene of Dichelobacter nodosus, were also identified. Interestingly, bioinformatic tools showed the presence of a toxin-antitoxin system in the phage phiSpn_6, and this represents the first time that an addition system in a pneumophage has been identified. Collectively, the temperate pneumophages contain a diverse set of genes with various levels of similarity among them.

    Funded by: NIDCD NIH HHS: DC02148, DC04173, DC05659, R01 DC002148, R01 DC004173, R01 DC005659

    Journal of bacteriology 2009;191;15;4854-62

  • Partial lipodystrophy and insulin resistant diabetes in a patient with a homozygous nonsense mutation in CIDEC.

    Rubio-Cabezas O, Puri V, Murano I, Saudek V, Semple RK, Dash S, Hyden CS, Bottomley W, Vigouroux C, Magré J, Raymond-Barker P, Murgatroyd PR, Chawla A, Skepper JN, Chatterjee VK, Suliman S, Patch AM, Agarwal AK, Garg A, Barroso I, Cinti S, Czech MP, Argente J, O'Rahilly S, Savage DB and LD Screening Consortium

    Department of Endocrinology, Hospital Infantil Universitario Niño Jesús, Madrid, Spain.

    Lipodystrophic syndromes are characterized by adipose tissue deficiency. Although rare, they are of considerable interest as they, like obesity, typically lead to ectopic lipid accumulation, dyslipidaemia and insulin resistant diabetes. In this paper we describe a female patient with partial lipodystrophy (affecting limb, femorogluteal and subcutaneous abdominal fat), white adipocytes with multiloculated lipid droplets and insulin-resistant diabetes, who was found to be homozygous for a premature truncation mutation in the lipid droplet protein cell death-inducing Dffa-like effector C (CIDEC) (E186X). The truncation disrupts the highly conserved CIDE-C domain and the mutant protein is mistargeted and fails to increase the lipid droplet size in transfected cells. In mice, Cidec deficiency also reduces fat mass and induces the formation of white adipocytes with multilocular lipid droplets, but in contrast to our patient, Cidec null mice are protected against diet-induced obesity and insulin resistance. In addition to describing a novel autosomal recessive form of familial partial lipodystrophy, these observations also suggest that CIDEC is required for unilocular lipid droplet formation and optimal energy storage in human fat.

    Funded by: Medical Research Council: G0600414; NIDDK NIH HHS: DK30898, DK32520, DK54387, DK60837, P30 DK032520, P30 DK032520-25, P30 DK032520-26, R01 DK030898, R01 DK054387, R01 DK060837, R37 DK030898, R37 DK030898-23; Wellcome Trust: 077016, 077016/Z/05/Z

    EMBO molecular medicine 2009;1;5;280-7

  • The versatility and adaptation of bacteria from the genus Stenotrophomonas.

    Ryan RP, Monchy S, Cardinale M, Taghavi S, Crossman L, Avison MB, Berg G, van der Lelie D and Dow JM

    BIOMERIT Research Centre, Department of Microbiology, BioSciences Institute, University College Cork, Cork, Ireland.

    The genus Stenotrophomonas comprises at least eight species. These bacteria are found throughout the environment, particularly in close association with plants. Strains of the most predominant species, Stenotrophomonas maltophilia, have an extraordinary range of activities that include beneficial effects for plant growth and health, the breakdown of natural and man-made pollutants that are central to bioremediation and phytoremediation strategies and the production of biomolecules of economic value, as well as detrimental effects, such as multidrug resistance, in human pathogenic strains. Here, we discuss the versatility of the bacteria in the genus Stenotrophomonas and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to our understanding of the adaptation of this genus to various niches.

    Funded by: Austrian Science Fund FWF: P 20542-B16; Wellcome Trust

    Nature reviews. Microbiology 2009;7;7;514-25

  • The Schistosoma japonicum genome reveals features of host-parasite interplay.

    Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium

    Schistosoma japonicum is a parasitic flatworm that causes human schistosomiasis, which is a significant cause of morbidity in China and the Philippines. Here we present a draft genomic sequence for the worm. The genome provides a global insight into the molecular architecture and host interaction of this complex metazoan pathogen, revealing that it can exploit host nutrients, neuroendocrine hormones and signalling pathways for growth, development and maturation. Having a complex nervous system and a well-developed sensory system, S. japonicum can accept stimulation of the corresponding ligands as a physiological response to different environments, such as fresh water or the tissues of its intermediate and mammalian hosts. Numerous proteases, including cercarial elastase, are implicated in mammalian skin penetration and haemoglobin degradation. The genomic information will serve as a valuable platform to facilitate development of new interventions for schistosomiasis control.

    Funded by: NIAID NIH HHS: AI39461, P50 AI039461; Wellcome Trust: 085775

    Nature 2009;460;7253;345-51

  • Genome flexibility in Neisseria meningitidis.

    Schoen C, Tettelin H, Parkhill J and Frosch M

    Institut für Hygiene und Mikrobiologie, der Universität Würzburg, Josef-Schneider-Strasse 2, Bau E1, Würzburg 97877, Germany.

    Neisseria meningitidis usually lives as a commensal bacterium in the upper airways of humans. However, occasionally some strains can also cause life-threatening diseases such as sepsis and bacterial meningitis. Comparative genomics demonstrates that only very subtle genetic differences between carriage and disease strains might be responsible for the observed virulence differences and that N. meningitidis is, evolutionarily, a very recent species. Comparative genome sequencing also revealed a panoply of genetic mechanisms underlying its enormous genomic flexibility which also might affect the virulence of particular strains. From these studies, N. meningitidis emerges as a paradigm for organisms that use genome variability as an adaptation to changing and thus challenging environments.

    Vaccine 2009;27 Suppl 2;B103-11

  • Genome watch: breaking the ICE.

    Seth-Smith H and Croucher NJ

    Nature reviews. Microbiology 2009;7;5;328-9

  • Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain.

    Seth-Smith HM, Harris SR, Persson K, Marsh P, Barron A, Bignell A, Bjartling C, Clark L, Cutcliffe LT, Lambden PR, Lennard N, Lockey SJ, Quail MA, Salim O, Skilton RJ, Wang Y, Holland MJ, Parkhill J, Thomson NR and Clarke IN

    Molecular Microbiology Group, University Medical School, Southampton General Hospital, Southampton, SO16 6YD, UK.

    Background: Chlamydia trachomatis is the most common cause of sexually transmitted infections globally and the leading cause of preventable blindness in the developing world. There are two biovariants of C. trachomatis: 'trachoma', causing ocular and genital tract infections, and the invasive 'lymphogranuloma venereum' strains. Recently, a new variant of the genital tract C. trachomatis emerged in Sweden. This variant escaped routine diagnostic tests because it carries a plasmid with a deletion. Failure to detect this strain has meant it has spread rapidly across the country provoking a worldwide alert. In addition to being a key diagnostic target, the plasmid has been linked to chlamydial virulence. Analysis of chlamydial plasmids and their cognate chromosomes was undertaken to provide insights into the evolutionary relationship between chromosome and plasmid. This is essential knowledge if the plasmid is to be continued to be relied on as a key diagnostic marker, and for an understanding of the evolution of Chlamydia trachomatis.

    Results: The genomes of two new C. trachomatis strains were sequenced, together with plasmids from six C. trachomatis isolates, including the new variant strain from Sweden. The plasmid from the new Swedish variant has a 377 bp deletion in the first predicted coding sequence, abolishing the site used for PCR detection, resulting in negative diagnosis. In addition, the variant plasmid has a 44 bp duplication downstream of the deletion. The region containing the second predicted coding sequence is the most highly conserved region of the plasmids investigated. Phylogenetic analysis of the plasmids and chromosomes are fully congruent. Moreover this analysis also shows that ocular and genital strains diverged from a common C. trachomatis progenitor.

    Conclusion: The evolutionary pathways of the chlamydial genome and plasmid imply that inheritance of the plasmid is tightly linked with its cognate chromosome. These data suggest that the plasmid is not a highly mobile genetic element and does not transfer readily between isolates. Comparative analysis of the plasmid sequences has revealed the most conserved regions that should be used to design future plasmid based nucleic acid amplification tests, to avoid diagnostic failures.

    Funded by: Medical Research Council: G0601640; Wellcome Trust

    BMC genomics 2009;10;239

  • A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations.

    Shi W, Ayub Q, Vermeulen M, Shao RG, Zuniga S, van der Gaag K, de Knijff P, Kayser M, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs., United Kingdom.

    We have investigated human male demographic history using 590 males from 51 populations in the Human Genome Diversity Project - Centre d'Etude du Polymorphisme Humain worldwide panel, typed with 37 Y-chromosomal Single Nucleotide Polymorphisms and 65 Y-chromosomal Short Tandem Repeats and analyzed with the program Bayesian Analysis of Trees With Internal Node Generation. The general patterns we observe show a gradient from the oldest population time to the most recent common ancestors (TMRCAs) and expansion times together with the largest effective population sizes in Africa, to the youngest times and smallest effective population sizes in the Americas. These parameters are significantly negatively correlated with distance from East Africa, and the patterns are consistent with most other studies of human variation and history. In contrast, growth rate showed a weaker correlation in the opposite direction. Y-lineage diversity and TMRCA also decrease with distance from East Africa, supporting a model of expansion with serial founder events starting from this source. A number of individual populations diverge from these general patterns, including previously documented examples such as recent expansions of the Yoruba in Africa, Basques in Europe, and Yakut in Northern Asia. However, some unexpected demographic histories were also found, including low growth rates in the Hazara and Kalash from Pakistan and recent expansion of the Mozabites in North Africa.

    Molecular biology and evolution 2009;27;2;385-93

  • Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens.

    Silby MW, Cerdeño-Tárraga AM, Vernikos GS, Giddens SR, Jackson RW, Preston GM, Zhang XX, Moon CD, Gehrig SM, Godfrey SA, Knight CG, Malone JG, Robinson Z, Spiers AJ, Harris S, Challis GL, Yaxley AM, Harris D, Seeger K, Murphy L, Rutter S, Squares R, Quail MA, Saunders E, Mavromatis K, Brettin TS, Bentley SD, Hothersall J, Stephens E, Thomas CM, Parkhill J, Levy SB, Rainey PB and Thomson NR

    Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Centre for Adaptation Genetics and Drug Resistance, Boston, MA 02111, USA.

    Background: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species.

    Results: Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome.

    Conclusions: P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.

    Funded by: Biotechnology and Biological Sciences Research Council: 104/P16729, P15257; Wellcome Trust

    Genome biology 2009;10;5;R51

  • Copy number variant detection in inbred strains from short read sequence data.

    Simpson JT, McIntyre RE, Adams DJ and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK.

    Summary: We have developed an algorithm to detect copy number variants (CNVs) in homozygous organisms, such as inbred laboratory strains of mice, from short read sequence data. Our novel approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model (HMM). This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms, to determine genomic copy number. We tested our algorithm on short read sequence data generated from re-sequencing chromosome 17 of the mouse strains A/J and CAST/EiJ with the Illumina platform. In total, we identified 118 copy number variants (43 for A/J and 75 for CAST/EiJ). We investigated the performance of our algorithm through comparison to CNVs previously identified by array-comparative genomic hybridization (array CGH). We performed quantitative-PCR validation on a subset of the calls that differed from the array CGH data sets.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;26;4;565-7

  • Floxin, a resource for genetically engineering mouse ESCs.

    Singla V, Hunkapiller J, Santos N, Seol AD, Norman AR, Wakenight P, Skarnes WC and Reiter JF

    Department of Biochemistry and Biophysics, Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA.

    We describe a method for the highly efficient and precise targeted modification of gene trap loci in mouse embryonic stem cells (ESCs). Through the Floxin method, gene trap mutations were reverted and new DNA sequences inserted using Cre recombinase and a shuttle vector, pFloxin. Floxin technology is applicable to the existing collection of 24,149 compatible gene trap cell lines, which should enable high-throughput modification of many genes in mouse ESCs.

    Funded by: NIAMS NIH HHS: R01 AR054396, R01 AR054396-01A1, R01AR054396

    Nature methods 2009;7;1;50-2

  • Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size.

    Soranzo N, Rivadeneira F, Chinappen-Horsley U, Malkina I, Richards JB, Hammond N, Stolk L, Nica A, Inouye M, Hofman A, Stephens J, Wheeler E, Arp P, Gwilliam R, Jhamai PM, Potter S, Chaney A, Ghori MJ, Ravindrarajah R, Ermakov S, Estrada K, Pols HA, Williams FM, McArdle WL, van Meurs JB, Loos RJ, Dermitzakis ET, Ahmadi KR, Hart DJ, Ouwehand WH, Wareham NJ, Barroso I, Sandhu MS, Strachan DP, Livshits G, Spector TD, Uitterlinden AG and Deloukas P

    Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Recent genome-wide (GW) scans have identified several independent loci affecting human stature, but their contribution through the different skeletal components of height is still poorly understood. We carried out a genome-wide scan in 12,611 participants, followed by replication in an additional 7,187 individuals, and identified 17 genomic regions with GW-significant association with height. Of these, two are entirely novel (rs11809207 in CATSPER4, combined P-value = 6.1x10(-8) and rs910316 in TMED10, P-value = 1.4x10(-7)) and two had previously been described with weak statistical support (rs10472828 in NPR3, P-value = 3x10(-7) and rs849141 in JAZF1, P-value = 3.2x10(-11)). One locus (rs1182188 at GNA12) identifies the first height eQTL. We also assessed the contribution of height loci to the upper- (trunk) and lower-body (hip axis and femur) skeletal components of height. We find evidence for several loci associated with trunk length (including rs6570507 in GPR126, P-value = 4x10(-5) and rs6817306 in LCORL, P-value = 4x10(-4)), hip axis length (including rs6830062 at LCORL, P-value = 4.8x10(-4) and rs4911494 at UQCC, P-value = 1.9x10(-4)), and femur length (including rs710841 at PRKG2, P-value = 2.4x10(-5) and rs10946808 at HIST1H1D, P-value = 6.4x10(-6)). Finally, we used conditional analyses to explore a possible differential contribution of the height loci to these different skeletal size measurements. In addition to validating four novel loci controlling adult stature, our study represents the first effort to assess the contribution of genetic loci to three skeletal components of height. Further statistical tests in larger numbers of individuals will be required to verify if the height loci affect height preferentially through these subcomponents of height.

    Funded by: Medical Research Council: G0000934, G0701863, MC_QA137934, MC_U106188470; Wellcome Trust: 068545/Z/02

    PLoS genetics 2009;5;4;e1000445

  • Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants?

    Southam L, Soranzo N, Montgomery SB, Frayling TM, McCarthy MI, Barroso I and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Aims/hypothesis: According to the thrifty genotype hypothesis, the high prevalence of type 2 diabetes and obesity is a consequence of genetic variants that have undergone positive selection during historical periods of erratic food supply. The recent expansion in the number of validated type 2 diabetes- and obesity-susceptibility loci, coupled with access to empirical data, enables us to look for evidence in support (or otherwise) of the thrifty genotype hypothesis using proven loci.

    Methods: We employed a range of tests to obtain complementary views of the evidence for selection: we determined whether the risk allele at associated 'index' single-nucleotide polymorphisms is derived or ancestral, calculated the integrated haplotype score (iHS) and assessed the population differentiation statistic fixation index (F (ST)) for 17 type 2 diabetes and 13 obesity loci.

    Results: We found no evidence for significant differences for the derived/ancestral allele test. None of the studied loci showed strong evidence for selection based on the iHS score. We find a high F (ST) for rs7901695 at TCF7L2, the largest type 2 diabetes effect size found to date.

    Conclusions/interpretation: Our results provide some evidence for selection at specific loci, but there are no consistent patterns of selection that provide conclusive confirmation of the thrifty genotype hypothesis. Discovery of more signals and more causal variants for type 2 diabetes and obesity is likely to allow more detailed examination of these issues.

    Funded by: Medical Research Council: G0601261; Wellcome Trust: 077016, 079557, 088885, WT077016/Z/05/Z, WT088885/Z/09/Z

    Diabetologia 2009;52;9;1846-51

  • Pooled analysis indicates that the GSTT1 deletion, GSTM1 deletion, and GSTP1 Ile105Val polymorphisms do not modify breast cancer risk in BRCA1 and BRCA2 mutation carriers.

    Spurdle AB, Fahey P, Chen X, McGuffog L, kConFab, Easton D, Peock S, Cook M, EMBRACE, Simard J, INHERIT, Rebbeck TR, MAGIC, Antoniou AC and Chenevix-Trench G

    Division of Genetics and Population Health, Queensland Institute of Medical Research, 300 Herston Rd, Herston 4006, Australia.

    The GSTP1, GSTM1, and GSTT1 detoxification genes all have functional polymorphisms that are common in the general population. A single study of 320 BRCA1/2 carriers previously assessed their effect in BRCA1 or BRCA2 mutation carriers. This study showed no evidence for altered risk of breast cancer for individuals with the GSTT1 and GSTM1 deletion variants, but did report that the GSTP1 Ile105Val (rs1695) variant was associated with increased breast cancer risk in carriers. We investigated the association between these three GST polymorphisms and breast cancer risk using existing data from 718 women BRCA1 and BRCA2 mutation carriers from Australia, the UK, Canada, and the USA. Data were analyzed within a proportional hazards framework using Cox regression. There was no evidence to show that any of the polymorphisms modified disease risk for BRCA1 or BRCA2 carriers, and there was no evidence for heterogeneity between sites. These results support the need for replication studies to confirm or refute hypothesis-generating studies.

    Funded by: CIHR; Cancer Research UK: 10118, 11022, 11174, C1287/A10118, C1287/A8874; NCI NIH HHS: R01 CA083855, R01 CA083855-01, R01 CA083855-02, R01 CA083855-03, R01 CA083855-04, R01 CA083855-05, R01 CA083855-06, R01 CA083855-07, R01 CA083855-08, R01 CA083855-09, R01 CA083855-10, R01 CA083855-11, R01 CA102776, R01 CA102776-01A1, R01 CA102776-02, R01 CA102776-03, R01 CA102776-04, R01 CA102776-05, R01-CA083855, R01-CA102776

    Breast cancer research and treatment 2009;122;1;281-5

  • Genomic and genic deletions of the FOX gene cluster on 16q24.1 and inactivating mutations of FOXF1 cause alveolar capillary dysplasia and other malformations.

    Stankiewicz P, Sen P, Bhatt SS, Storer M, Xia Z, Bejjani BA, Ou Z, Wiszniewska J, Driscoll DJ, Maisenbacher MK, Bolivar J, Bauer M, Zackai EH, McDonald-McGinn D, Nowaczyk MM, Murray M, Hustead V, Mascotti K, Schultz R, Hallam L, McRae D, Nicholson AG, Newbury R, Durham-O'Donnell J, Knight G, Kini U, Shaikh TH, Martin V, Tyreman M, Simonic I, Willatt L, Paterson J, Mehta S, Rajan D, Fitzgerald T, Gribble S, Prigmore E, Patel A, Shaffer LG, Carter NP, Cheung SW, Langston C and Shaw-Smith C

    Dept of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

    Alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV) is a rare, neonatally lethal developmental disorder of the lung with defining histologic abnormalities typically associated with multiple congenital anomalies (MCA). Using array CGH analysis, we have identified six overlapping microdeletions encompassing the FOX transcription factor gene cluster in chromosome 16q24.1q24.2 in patients with ACD/MPV and MCA. Subsequently, we have identified four different heterozygous mutations (frameshift, nonsense, and no-stop) in the candidate FOXF1 gene in unrelated patients with sporadic ACD/MPV and MCA. Custom-designed, high-resolution microarray analysis of additional ACD/MPV samples revealed one microdeletion harboring FOXF1 and two distinct microdeletions upstream of FOXF1, implicating a position effect. DNA sequence analysis revealed that in six of nine deletions, both breakpoints occurred in the portions of Alu elements showing eight to 43 base pairs of perfect microhomology, suggesting replication error Microhomology-Mediated Break-Induced Replication (MMBIR)/Fork Stalling and Template Switching (FoSTeS) as a mechanism of their formation. In contrast to the association of point mutations in FOXF1 with bowel malrotation, microdeletions of FOXF1 were associated with hypoplastic left heart syndrome and gastrointestinal atresias, probably due to haploinsufficiency for the neighboring FOXC2 and FOXL1 genes. These differences reveal the phenotypic consequences of gene alterations in cis.

    Funded by: Wellcome Trust

    American journal of human genetics 2009;84;6;780-91

  • Common variants conferring risk of schizophrenia.

    Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietiläinen OP, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Børglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Böttcher Y, Olesen J, Breuer R, Möller HJ, Giegling I, Rasmussen HB, Timm S, Mattheisen M, Bitter I, Réthelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TE, Thorsteinsdottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, Genetic Risk and Outcome in Psychosis (GROUP), Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jönsson EG, Terenius L, Agartz I, Petursson H, Nöthen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, St Clair D, Goldstein DB, Stefansson K and Collier DA

    deCODE genetics, Sturlugata 8, IS-101 Reykjavik, Iceland.

    Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.

    Funded by: Department of Health: PDA/02/06/016; NHLBI NIH HHS: 1R01HL087679-01; NIMH NIH HHS: R01 MH078075; Wellcome Trust: 089061

    Nature 2009;460;7256;744-7

  • Genome-wide end-sequenced BAC resources for the NOD/MrkTac() and NOD/ShiLtJ() mouse genomes.

    Steward CA, Humphray S, Plumb B, Jones MC, Quail MA, Rice S, Cox T, Davies R, Bonfield J, Keane TM, Nefedov M, de Jong PJ, Lyons P, Wicker L, Todd J, Hayashizaki Y, Gulban O, Danska J, Harrow J, Hubbard T, Rogers J and Adams DJ

    The Wellcome Trust Sanger Institute, Hinxton, UK.

    Non-obese diabetic (NOD) mice spontaneously develop type 1 diabetes (T1D) due to the progressive loss of insulin-secreting beta-cells by an autoimmune driven process. NOD mice represent a valuable tool for studying the genetics of T1D and for evaluating therapeutic interventions. Here we describe the development and characterization by end-sequencing of bacterial artificial chromosome (BAC) libraries derived from NOD/MrkTac (DIL NOD) and NOD/ShiLtJ (CHORI-29), two commonly used NOD substrains. The DIL NOD library is composed of 196,032 BACs and the CHORI-29 library is composed of 110,976 BACs. The average depth of genome coverage of the DIL NOD library, estimated from mapping the BAC end-sequences to the reference mouse genome sequence, was 7.1-fold across the autosomes and 6.6-fold across the X chromosome. Clones from this library have an average insert size of 150 kb and map to over 95.6% of the reference mouse genome assembly (NCBIm37), covering 98.8% of Ensembl mouse genes. By the same metric, the CHORI-29 library has an average depth over the autosomes of 5.0-fold and 2.8-fold coverage of the X chromosome, the reduced X chromosome coverage being due to the use of a male donor for this library. Clones from this library have an average insert size of 205 kb and map to 93.9% of the reference mouse genome assembly, covering 95.7% of Ensembl genes. We have identified and validated 191,841 single nucleotide polymorphisms (SNPs) for DIL NOD and 114,380 SNPs for CHORI-29. In total we generated 229,736,133 bp of sequence for the DIL NOD and 121,963,211 bp for the CHORI-29. These BAC libraries represent a powerful resource for functional studies, such as gene targeting in NOD embryonic stem (ES) cell lines, and for sequencing and mapping experiments.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust: 062023, 077198

    Genomics 2009;95;2;105-10

  • Loci at chromosomes 13, 19 and 20 influence age at natural menopause.

    Stolk L, Zhai G, van Meurs JB, Verbiest MM, Visser JA, Estrada K, Rivadeneira F, Williams FM, Cherkas L, Deloukas P, Soranzo N, de Keyzer JJ, Pop VJ, Lips P, Lebrun CE, van der Schouw YT, Grobbee DE, Witteman J, Hofman A, Pols HA, Laven JS, Spector TD and Uitterlinden AG

    Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands.

    We conducted a genome-wide association study for age at natural menopause in 2,979 European women and identified six SNPs in three loci associated with age at natural menopause: chromosome 19q13.4 (rs1172822; -0.4 year per T allele (39%); P = 6.3 × 10(-11)), chromosome 20p12.3 (rs236114; +0.5 year per A allele (21%); P = 9.7 × 10(-11)) and chromosome 13q34 (rs7333181; +0.5 year per A allele (12%); P = 2.5 × 10(-8)). These common genetic variants regulate timing of ovarian aging, an important risk factor for breast cancer, osteoporosis and cardiovascular disease.

    Funded by: Wellcome Trust: 077011

    Nature genetics 2009;41;6;645-7

  • The cancer genome.

    Stratton MR, Campbell PJ and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed.

    Funded by: Wellcome Trust: 077012, 088340

    Nature 2009;458;7239;719-24

  • Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels.

    Sudbery I, Stalker J, Simpson JT, Keane T, Rust AG, Hurles ME, Walter K, Lynch D, Teboul L, Brown SD, Li H, Ning Z, Nadeau JH, Croniger CM, Durbin R and Adams DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.

    Genome sequences are essential tools for comparative and mutational analyses. Here we present the short read sequence of mouse chromosome 17 from the Mus musculus domesticus derived strain A/J, and the Mus musculus castaneus derived strain CAST/Ei. We describe approaches for the accurate identification of nucleotide and structural variation in the genomes of vertebrate experimental organisms, and show how these techniques can be applied to help prioritize candidate genes within quantitative trait loci.

    Funded by: Cancer Research UK; Medical Research Council: G0800024, MC_UP_1502/1; NIAAA NIH HHS: P20 AA017837; Wellcome Trust

    Genome biology 2009;10;10;R112

  • A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation.

    Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, O'Meara S, Latimer C, Dicks E, Menzies A, Stephens P, Blow M, Greenman C, Xue Y, Tyler-Smith C, Thompson D, Gray K, Andrews J, Barthorpe S, Buck G, Cole J, Dunmore R, Jones D, Maddison M, Mironenko T, Turner R, Turrell K, Varian J, West S, Widaa S, Wray P, Teague J, Butler A, Jenkinson A, Jia M, Richardson D, Shepherd R, Wooster R, Tejada MI, Martinez F, Carvill G, Goliath R, de Brouwer AP, van Bokhoven H, Van Esch H, Chelly J, Raynaud M, Ropers HH, Abidi FE, Srivastava AK, Cox J, Luo Y, Mallya U, Moon J, Parnau J, Mohammed S, Tolmie JL, Shoubridge C, Corbett M, Gardner A, Haan E, Rujirabanjerd S, Shaw M, Vandeleur L, Fullston T, Easton DF, Boyle J, Partington M, Hackett A, Field M, Skinner C, Stevenson RE, Bobrow M, Turner G, Schwartz CE, Gecz J, Raymond FL, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Large-scale systematic resequencing has been proposed as the key future strategy for the discovery of rare, disease-causing sequence variants across the spectrum of human complex disease. We have sequenced the coding exons of the X chromosome in 208 families with X-linked mental retardation (XLMR), the largest direct screen for constitutional disease-causing mutations thus far reported. The screen has discovered nine genes implicated in XLMR, including SYP, ZNF711 and CASK reported here, confirming the power of this strategy. The study has, however, also highlighted issues confronting whole-genome sequencing screens, including the observation that loss of function of 1% or more of X-chromosome genes is compatible with apparently normal existence.

    Funded by: Cancer Research UK: 10118, 11022; NICHD NIH HHS: HD26202, R01 HD026202; Wellcome Trust: 077012

    Nature genetics 2009;41;5;535-43

  • Microarray-based cytogenetic profiling reveals recurrent and subtype-associated genomic copy number aberrations in feline sarcomas.

    Thomas R, Valli VE, Ellis P, Bell J, Karlsson EK, Cullen J, Lindblad-Toh K, Langford CF and Breen M

    Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA.

    Injection-site-associated sarcomas (ISAS), commonly arising at the site of routine vaccine administration, afflict as many as 22,000 domestic cats annually in the USA. These tumors are typically more aggressive and prone to recurrence than spontaneous sarcomas (non-ISAS), generally receiving a poorer long-term prognosis and warranting a more aggressive therapeutic approach. Although certain clinical and histological factors are highly suggestive of ISAS, timely diagnosis and optimal clinical management may be hindered by the absence of definitive markers that can distinguish between tumors with underlying injection-related etiology and their spontaneous counterpart. Specific nonrandom chromosome copy number aberrations (CNAs) have been associated with the clinical behavior of a vast spectrum of human tumors, providing an extensive resource of potential diagnostic and prognostic biomarkers. Although similar principles are now being applied with great success in other species, their relevance to feline molecular oncology has not yet been investigated in any detail. We report the construction of a genomic microarray platform for detection of recurrent CNAs in feline tumors through cytogenetic assignment of 210 large-insert DNA clones selected at intervals of approximately 15 Mb from the feline genome sequence assembly. Microarray-based profiling of 19 ISAS and 27 non-ISAS cases identified an extensive range of genomic imbalances that were highly recurrent throughout the combined panel of 46 sarcomas. Deletions of two specific regions were significantly associated with the non-ISAS phenotype. Further characterization of these regions may ultimately permit molecular distinction between ISAS and non-ISAS, as a tool for predicting tumor behavior and prognosis, as well as refining means for therapeutic intervention.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;8;987-1000

  • Influence of genetic background on tumor karyotypes: evidence for breed-associated cytogenetic aberrations in canine appendicular osteosarcoma.

    Thomas R, Wang HJ, Tsai PC, Langford CF, Fosmire SP, Jubala CM, Getzy DM, Cutter GR, Modiano JF and Breen M

    Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough Street, Raleigh, NC 27606, USA.

    Recurrent chromosomal aberrations in solid tumors can reveal the genetic pathways involved in the evolution of a malignancy and in some cases predict biological behavior. However, the role of individual genetic backgrounds in shaping karyotypes of sporadic tumors is unknown. The genetic structure of purebred dog breeds, coupled with their susceptibility to spontaneous cancers, provides a robust model with which to address this question. We tested the hypothesis that there is an association between breed and the distribution of genomic copy number imbalances in naturally occurring canine tumors through assessment of a cohort of Golden Retrievers and Rottweilers diagnosed with spontaneous appendicular osteosarcoma. Our findings reveal significant correlations between breed and tumor karyotypes that are independent of gender, age at diagnosis, and histological classification. These data indicate for the first time that individual genetic backgrounds, as defined by breed in dogs, influence tumor karyotypes in a cancer with extensive genomic instability.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;3;365-77

  • Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease.

    Trégouët DA, König IR, Erdmann J, Munteanu A, Braund PS, Hall AS, Grosshennig A, Linsel-Nitschke P, Perret C, DeSuremain M, Meitinger T, Wright BJ, Preuss M, Balmforth AJ, Ball SG, Meisinger C, Germain C, Evans A, Arveiler D, Luc G, Ruidavets JB, Morrison C, van der Harst P, Schreiber S, Neureuther K, Schäfer A, Bugert P, El Mokhtari NE, Schrezenmeir J, Stark K, Rubin D, Wichmann HE, Hengstenberg C, Ouwehand W, Wellcome Trust Case Control Consortium, Cardiogenics Consortium, Ziegler A, Tiret L, Thompson JR, Cambien F, Schunkert H and Samani NJ

    Institut National de la Santé Et de la Recherche Médicale (INSERM) Unité Mixte de Recherche (UMR_S) 525, Université Pierre et Marie Curie (UPMC). Paris 06, Paris 75013, France.

    We identify the SLC22A3-LPAL2-LPA gene cluster as a strong susceptibility locus for coronary artery disease (CAD) through a genome-wide haplotype association (GWHA) study. This locus was not identified from previous genome-wide association (GWA) studies focused on univariate analyses of SNPs. The proposed approach may have wide utility for analyzing GWA data for other complex traits.

    Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust

    Nature genetics 2009;41;3;283-5

  • Next-generation sequencing of vertebrate experimental organisms.

    Turner DJ, Keane TM, Sudbery I and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Next-generation sequencing technologies are revolutionizing biology by allowing for genome-wide transcription factor binding-site profiling, transcriptome sequencing, and more recently, whole-genome resequencing. While it is currently not possible to generate complete de novo assemblies of higher-vertebrate genomes using next-generation sequencing, improvements in sequence read lengths and throughput, coupled with new assembly algorithms for large data sets, will soon make this a reality. These developments will in turn spawn a revolution in how genomic data are used to understand genetics and how model organisms are used for disease gene discovery. This review provides an overview of the current next-generation sequencing platforms and the newest computational tools for the analysis of next-generation sequencing data. We also describe how next-generation sequencing may be applied in the context of vertebrate model organism genetics.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 2009;20;6;327-38

  • Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a.

    Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky LA, King RJ, Lin AA, Chow CE, Semino O, Battaglia V, Kutuev I, Järve M, Chaubey G, Ayub Q, Mohyuddin A, Mehdi SQ, Sengupta S, Rogaev EI, Khusnutdinova EK, Pshenichnov A, Balanovsky O, Balanovska E, Jeran N, Augustin DH, Baldovic M, Herrera RJ, Thangaraj K, Singh V, Singh L, Majumder P, Rudan P, Primorac D, Villems R and Kivisild T

    Division of Child and Adolescent Psychiatry and Child Development, Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, 1201 Welch Road, Stanford, CA 94304-5485, USA.

    Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.

    European journal of human genetics : EJHG 2009;18;4;479-84

  • A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites.

    Uren AG, Mikkers H, Kool J, van der Weyden L, Lund AH, Wilson CH, Rance R, Jonkers J, van Lohuizen M, Berns A and Adams DJ

    Division of Molecular Genetics, Cancer Genomics Centre, Netherlands Cancer Institute, Plesmanlaan, Amsterdam, The Netherlands.

    Insertional mutagens such as viruses and transposons are a useful tool for performing forward genetic screens in mice to discover cancer genes. These screens are most effective when performed using hundreds of mice; however, until recently, the cost-effective isolation and sequencing of insertion sites has been a major limitation to performing screens on this scale. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA isolation, DNA digestion, linker or splinkerette ligation, primary and secondary PCR amplification, and sequencing. This method, which takes about 1 week to perform, has allowed us to isolate hundreds of thousands of insertion sites from mouse tumors and, unlike other methods, has been specifically optimized for the murine leukemia virus (MuLV), and can easily be performed in a 96-well plate format for the efficient multiplex isolation of insertion sites.

    Funded by: Cancer Research UK: A6542; Wellcome Trust: 098051

    Nature protocols 2009;4;5;789-98

  • Megaoesophagus in Rassf1a-null mice.

    van der Weyden L, Happerfield L, Arends MJ and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Megaoesophagus, or oesophageal achalasia, is a neuromuscular disorder characterized by an absence of peristalsis and flaccid dilatation of the oesophagus, resulting in the retention of ingesta in the dilated segment. The aetiology and pathogenesis of idiopathic (or primary) megaoesophagus are still poorly understood and very little is known about the genetic causes of megaoesophagus in humans. Attempts to develop animal models of this condition have been largely unsuccessful and although the ICRC/HiCri strain of mice spontaneously develop megaoesophagus, the underlying genetic cause remains unknown. In this report, we show that aged Rassf1a-null mice have an enhanced susceptibility to megaoesophagus compared with wild-type littermates (approximately 20%vs. approximately 2% incidence respectively; P = 0.01). Histological examination of the dilated oesophaguses shows a reduction in the numbers of nerve cells (both ganglia and nerve fibres) in the myenteric plexus of the dilated mid and lower oesophagus that was confirmed by S100 immunohistochemistry. There was also a chronic inflammatory infiltrate and subsequent fibrosis of the myenteric plexus and the muscle layers. These appearances closely mimic the gross and histopathological findings in human cases of megaoesophagus/achalasia, thus demonstrating that this is a representative mouse model of the disease. Thus, we have identified a genetic cause of the development of megaoesophagus/achalasia that could be screened for in patients, and may eventually facilitate the development of therapies that could prevent further progression of the disease once it is diagnosed at an early stage.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C515412/1; Cancer Research UK; Wellcome Trust

    International journal of experimental pathology 2009;90;2;101-8

  • Somatic mutations of the histone H3K27 demethylase gene UTX in human cancer.

    van Haaften G, Dalgliesh GL, Davies H, Chen L, Bignell G, Greenman C, Edkins S, Hardy C, O'Meara S, Teague J, Butler A, Hinton J, Latimer C, Andrews J, Barthorpe S, Beare D, Buck G, Campbell PJ, Cole J, Forbes S, Jia M, Jones D, Kok CY, Leroy C, Lin ML, McBride DJ, Maddison M, Maquire S, McLay K, Menzies A, Mironenko T, Mulderrig L, Mudie L, Pleasance E, Shepherd R, Smith R, Stebbings L, Stephens P, Tang G, Tarpey PS, Turner R, Turrell K, Varian J, West S, Widaa S, Wray P, Collins VP, Ichimura K, Law S, Wong J, Yuen ST, Leung SY, Tonon G, DePinho RA, Tai YT, Anderson KC, Kahnoski RJ, Massie A, Khoo SK, Teh BT, Stratton MR and Futreal PA

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Somatically acquired epigenetic changes are present in many cancers. Epigenetic regulation is maintained via post-translational modifications of core histones. Here, we describe inactivating somatic mutations in the histone lysine demethylase gene UTX, pointing to histone H3 lysine methylation deregulation in multiple tumor types. UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes. These data identify UTX as a new human cancer gene.

    Funded by: Wellcome Trust: 077012, 088340

    Nature genetics 2009;41;5;521-3

  • Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms.

    Vermeulen M, Wollstein A, van der Gaag K, Lao O, Xue Y, Wang Q, Roewer L, Knoblauch H, Tyler-Smith C, de Knijff P and Kayser M

    Department of Forensic Molecular Biology, Erasmus University Medical Center Rotterdam, 3000 CA Rotterdam, The Netherlands.

    We analyzed 67 short tandem repeat polymorphisms from the non-recombining part of the Y-chromosome (Y-STRs), including 49 rarely studied simple single-copy (ss)Y-STRs and 18 widely used Y-STRs, in 590 males from 51 populations belonging to 8 worldwide regions (HGDP-CEPH panel). Although autosomal DNA profiling provided no evidence for close relationship, we found 18 Y-STR haplotypes (defined by 67 Y-STRs) that were shared by two to five men in 13 worldwide populations, revealing high and widespread levels of cryptic male relatedness. Maximal (95.9%) haplotype resolution was achieved with the best 25 out of 67 Y-STRs in the global dataset, and with the best 3-16 markers in regional datasets (89.6-100% resolution). From the 49 rarely studied ssY-STRs, the 25 most informative markers were sufficient to reach the highest possible male lineage differentiation in the global (92.2% resolution), and 3-15 markers in the regional datasets (85.4-100%). Considerably lower haplotype resolutions were obtained with the three commonly used Y-STR sets (Minimal Haplotype, PowerPlex Y, and AmpFlSTR Yfiler. Six ssY-STRs (DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643) were most informative to supplement the existing Y-STR kits for increasing haplotype resolution, or - together with additional ssY-STRs - as a new set for maximizing male lineage differentiation. Mutation rates of the 49 ssY-STRs were estimated from 403 meiotic transfers in deep-rooted pedigrees, and ranged from approximately 4.8 x 10(-4) for 31 ssY-STRs with no mutations observed to 1.3 x 10(-2) and 1.5 x 10(-2) for DYS570 and DYS576, respectively, the latter representing the highest mutation rates reported for human Y-STRs so far. Our findings thus demonstrate that ssY-STRs are useful for maximizing global and regional resolution of male lineages, either as a new set, or when added to commonly used Y-STR sets, and support their application to forensic, genealogical and anthropological studies.

    Funded by: Wellcome Trust: 077009

    Forensic science international. Genetics 2009;3;4;205-13

  • Milk and two oligosaccharides.

    Walker A

    Nature reviews. Microbiology 2009;7;7;483

  • Single domain antibodies against the collagen signalling receptor glycoprotein VI are inhibitors of collagen induced thrombus formation.

    Walker A, Pugh N, Garner SF, Stephens J, Maddox B, Ouwehand WH, Farndale RW, Steward M and Bloodomics Consortium

    Domantis Ltd., 315 Cambridge Science Park, Cambridge, UK.

    Human Domain Antibodies (dAbs) that bind to and inhibit the function of platelet glycoprotein VI (GPVI) have been isolated from phage display libraries and their efficacy demonstrated using in vitro models of platelet activation. Here we describe the properties of one such antibody, BLO8-1, which has been shown to specifically inhibit the binding of recombinant human GPVI to cross-linked collagen related peptide (CRP-XL) in vitro. BLO8-1 specifically binds to the platelet cell surface and prevents CRP-XL induced platelet aggregation in platelet-rich plasma, as well as inhibiting thrombus formation in whole blood under arterial shear conditions. Using a series of mutant GPVI molecules, BLO8-1 was shown to recognize an epitope within the collagen binding domain of GPVI, therefore the anti-thrombotic effect of this dAb is predicted to be due to direct blocking of the collagen-GPVI interaction. These data, together with the desirable properties of Domain Antibodies, show that dAbs could potentially be used to generate novel biopharmaceuticals with anti-thrombotic properties.

    Funded by: British Heart Foundation: RG/09/003/27122; Medical Research Council: G0500707

    Platelets 2009;20;4;268-76

  • CLIP: construction of cDNA libraries for high-throughput sequencing from RNAs cross-linked to proteins in vivo.

    Wang Z, Tollervey J, Briese M, Turner D and Ule J

    MRC-Laboratory of Molecular Biology, Hills Road, Cambridge CB20QH, UK.

    UV cross-linking and immunoprecipitation assay (CLIP) can identify direct interaction sites between RNA-binding proteins and RNAs in vivo, and has been used to study several proteins in tissues and cell cultures. The main challenge of the method is to specifically amplify the low amount of isolated RNA. The current protocol is optimised for efficient RNA purification and ligation of barcoded RNA adapters. High-throughput sequencing of the multiplexed cDNA library allows for a comprehensive coverage of the target sequences.

    Funded by: Medical Research Council: MC_U105185858; Wellcome Trust: 089701

    Methods (San Diego, Calif.) 2009;48;3;287-93

  • Comparative genomics of the emerging human pathogen Photorhabdus asymbiotica with the insect pathogen Photorhabdus luminescens.

    Wilkinson P, Waterfield NR, Crossman L, Corton C, Sanchez-Contreras M, Vlisidou I, Barron A, Bignell A, Clark L, Ormond D, Mayho M, Bason N, Smith F, Simmonds M, Churcher C, Harris D, Thompson NR, Quail M, Parkhill J and Ffrench-Constant RH

    School of Biosciences, University of Exeter in Cornwall, Penryn TR10 9EZ, UK.

    Background: The Gram-negative bacterium Photorhabdus asymbiotica (Pa) has been recovered from human infections in both North America and Australia. Recently, Pa has been shown to have a nematode vector that can also infect insects, like its sister species the insect pathogen P. luminescens (Pl). To understand the relationship between pathogenicity to insects and humans in Photorhabdus we have sequenced the complete genome of Pa strain ATCC43949 from North America. This strain (formerly referred to as Xenorhabdus luminescens strain 2) was isolated in 1977 from the blood of an 80 year old female patient with endocarditis, in Maryland, USA. Here we compare the complete genome of Pa ATCC43949 with that of the previously sequenced insect pathogen P. luminescens strain TT01 which was isolated from its entomopathogenic nematode vector collected from soil in Trinidad and Tobago.

    Results: We found that the human pathogen Pa had a smaller genome (5,064,808 bp) than that of the insect pathogen Pl (5,688,987 bp) but that each pathogen carries approximately one megabase of DNA that is unique to each strain. The reduced size of the Pa genome is associated with a smaller diversity in insecticidal genes such as those encoding the Toxin complexes (Tc's), Makes caterpillars floppy (Mcf) toxins and the Photorhabdus Virulence Cassettes (PVCs). The Pa genome, however, also shows the addition of a plasmid related to pMT1 from Yersinia pestis and several novel pathogenicity islands including a novel Type Three Secretion System (TTSS) encoding island. Together these data suggest that Pa may show virulence against man via the acquisition of the pMT1-like plasmid and specific effectors, such as SopB, that promote its persistence inside human macrophages. Interestingly the loss of insecticidal genes in Pa is not reflected by a loss of pathogenicity towards insects.

    Conclusion: Our results suggest that North American isolates of Pa have acquired virulence against man via the acquisition of a plasmid and specific virulence factors with similarity to those shown to play roles in pathogenicity against humans in other bacteria.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E021328/1

    BMC genomics 2009;10;302

  • Signal initiation in biological systems: the properties and detection of transient extracellular protein interactions.

    Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Individual cells within biological systems frequently coordinate their functions through signals initiated by specific extracellular protein interactions involving receptors that bridge the cellular membrane. Due to their biochemical nature, these membrane-embedded receptor proteins are difficult to manipulate and their interactions are characterised by very weak binding strengths that cannot be detected using popular high throughput assays. This review will provide a general outline of the biochemical attributes of receptor proteins focussing in particular on the biophysical properties of their transient interactions. Methods that are able to detect these weak extracellular binding events and especially those that can be used for identifying novel interactions will be compared. Finally, I discuss the feasibility of constructing a complete and accurate extracellular protein interaction map, and the methods that are likely to be useful in achieving this goal.

    Molecular bioSystems 2009;5;12;1405-12

  • CARM1 is required in embryonic stem cells to maintain pluripotency and resist differentiation.

    Wu Q, Bruce AW, Jedrusik A, Ellis PD, Andrews RM, Langford CF, Glover DM and Zernicka-Goetz M

    Wellcome Trust and Cancer Research UK Gurdon Institute, Cambridge, United Kingdom.

    Histone H3 methylation at R17 and R26 recently emerged as a novel epigenetic mechanism regulating pluripotency in mouse embryos. Blastomeres of four-cell embryos with high H3 methylation at these sites show unrestricted potential, whereas those with lower levels cannot support development when aggregated in chimeras of like cells. Increasing histone H3 methylation, through expression of coactivator-associated-protein-arginine-methyltransferase 1 (CARM1) in embryos, elevates expression of key pluripotency genes and directs cells to the pluripotent inner cell mass. We demonstrate CARM1 is also required for the self-renewal and pluripotency of embryonic stem (ES) cells. In ES cells, CARM1 depletion downregulates pluripotency genes leading to their differentiation. CARM1 associates with Oct4/Pou5f1 and Sox2 promoters that display detectable levels of R17/26 histone H3 methylation. In CARM1 overexpressing ES cells, histone H3 arginine methylation is also at the Nanog promoter to which CARM1 now associates. Such cells express Nanog at elevated levels and delay their response to differentiation signals. Thus, like in four-cell embryo blastomeres, histone H3 arginine methylation by CARM1 in ES cells allows epigenetic modulation of pluripotency.

    Funded by: Medical Research Council: G0300723, G0800784; Wellcome Trust: 064421, 079643

    Stem cells (Dayton, Ohio) 2009;27;11;2637-45

  • Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree.

    Xue Y, Wang Q, Long Q, Ng BL, Swerdlow H, Burton J, Skuce C, Taylor R, Abdellah Z, Zhao Y, Asan, MacArthur DG, Quail MA, Carter NP, Yang H and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1SA, UK.

    Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. In the past, estimates of mutation rates have generally been inferred from phenotypic observations or comparisons of homologous sequences among closely related species. Here, we apply new sequencing technology to measure directly one mutation rate, that of base substitutions on the human Y chromosome. The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 11x or 20x, respectively. Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in approximately 10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 x 10(-8) mutations/nucleotide/generation (95% CI: 8.9 x 10(-9)-7.0 x 10(-8)), consistent with estimates of 2.3 x 10(-8)-6.3 x 10(-8) mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons depending on the generation and split times assumed.

    Funded by: Wellcome Trust

    Current biology : CB 2009;19;17;1453-7

  • Generation of transgene-free induced pluripotent mouse stem cells by the piggyBac transposon.

    Yusa K, Rad R, Takeda J and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Induced pluripotent stem cells (iPSCs) have been generated from somatic cells by transgenic expression of Oct4 (Pou5f1), Sox2, Klf4 and Myc. A major difficulty in the application of this technology for regenerative medicine, however, is the delivery of reprogramming factors. Whereas retroviral transduction increases the risk of tumorigenicity, transient expression methods have considerably lower reprogramming efficiencies. Here we describe an efficient piggyBac transposon-based approach to generate integration-free iPSCs. Transposons carrying 2A peptide-linked reprogramming factors induced reprogramming of mouse embryonic fibroblasts with equivalent efficiencies to retroviral transduction. We removed transposons from these primary iPSCs by re-expressing transposase. Transgene-free iPSCs could be identified by negative selection. piggyBac excised without a footprint, leaving the iPSC genome without any genetic alteration. iPSCs fulfilled all criteria of pluripotency, such as pluripotency gene expression, teratoma formation and contribution to chimeras. piggyBac transposon-based reprogramming may be used to generate therapeutically applicable iPSCs.

    Funded by: Wellcome Trust: 077187

    Nature methods 2009;6;5;363-9