Sanger Institute - Publications 2009
Number of papers published in 2009: 137
Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2.
Department of Oncology, University of Cambridge, UK.
Genome-wide association studies (GWAS) have identified seven breast cancer susceptibility loci, but these explain only a small fraction of the familial risk of the disease. Five of these loci were identified through a two-stage GWAS involving 390 familial cases and 364 controls in the first stage, and 3,990 cases and 3,916 controls in the second stage. To identify additional loci, we tested over 800 promising associations from this GWAS in a further two stages involving 37,012 cases and 40,069 controls from 33 studies in the CGEMS collaboration and Breast Cancer Association Consortium. We found strong evidence for additional susceptibility loci on 3p (rs4973768: per-allele OR = 1.11, 95% CI = 1.08-1.13, P = 4.1 x 10(-23)) and 17q (rs6504950: per-allele OR = 0.95, 95% CI = 0.92-0.97, P = 1.4 x 10(-8)). Potential causative genes include SLC4A7 and NEK10 on 3p and COX11 on 17q.
Funded by: Cancer Research UK: 10118, 11021, A10123, C1287/A10118, C1287/A5260, C1287/A7497, C490/A11021; Intramural NIH HHS; NCI NIH HHS: 5UO1CA098233, CA-06-503, CA-58860, CA-92044, CA-95-011, CA49449, CA50385, CA65725, CA67262, CA87969, P30 CA062203, P50 CA116201, R01 CA102740-01A2, R01 CA104021-04, R01 CA122340, U01 CA69398, U01 CA69417, U01 CA69446, U01 CA69467, U01 CA69631, U01 CA69638, UO1 CA098710, UO1 CA69467
Nature genetics 2009;41;5;585-90
Genetic diversity amongst isolates of Neospora caninum, and the development of a multiplex assay for the detection of distinct strains.
Department of Medical and Molecular Biosciences, University of Technology, Sydney, P.O. Box 123, Broadway, New South Wales 2007, Australia.
Infection with Neospora caninum is regarded as a significant cause of abortion in cattle. Despite the economic impact of this infection, relatively little is known about the biology of this parasite. In this study, mini and microsatellite DNAs were detected in the genome of N. caninum and eight loci were identified that each contained repetitive DNA which was polymorphic among different isolates of this parasite. A multiplex PCR assay was developed for the detection of genetic variation within N. caninum based on length polymorphism associated with three different repetitive markers. The utility of the multiplex PCR was demonstrated in that it was able to distinguish amongst strains of N. caninum used as either vaccine or challenge strains in animal vaccination experiments and that it could genotype N. caninum associated with naturally acquired infections of animals. The multiplex PCR is simple, rapid, informative and sensitive and should provide a valuable tool for further studies on the epidemiology of N. caninum in different host species.
Molecular and cellular probes 2009;23;3-4;132-9
SnoopCGH: software for visualizing comparative genomic hybridization data.
Wellcome Trust Sanger Institute, Hinxton, The Weatherall Institute of Molecular Medicine and Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. email@example.com
Unlabelled: Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data.
Availability and implementation: SnoopCGH is written in java and is available from http://snoopcgh.sourceforge.net/
Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust
Bioinformatics (Oxford, England) 2009;25;20;2732-3
Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org
Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.
Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.
Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.
Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198
BMC genomics 2009;10;606
Testing for rare variant associations in complex diseases.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK. email@example.com.
The study of rare variants holds the promise of accounting for some of the missing heritability in complex traits. Next-generation sequencing technologies enable probing of variation across the full spectrum of allele frequencies. Multiple methods for the analysis of rare variants have been proposed and, recently, Ionita-Laza et al. have presented an approach with the theoretical capacity to detect risk and protective variants. The identification of rare risk variants could have major implications in understanding complex disease etiopathogenesis.
Genome medicine 2009;1;11;24
ABACAS: algorithm-based automatic contiguation of assembled sequences.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated.
Availability and implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net.
Funded by: Wellcome Trust: WT085775/Z/08/Z
Bioinformatics (Oxford, England) 2009;25;15;1968-9
Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts.
 Department of Epidemiology and Biostatistics, Erasmus University Medical Center, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands.  These authors contributed equally to this work.
Recent genome-wide association (GWA) studies of lipids have been conducted in samples ascertained for other phenotypes, particularly diabetes. Here we report the first GWA analysis of loci affecting total cholesterol (TC), low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglycerides sampled randomly from 16 population-based cohorts and genotyped using mainly the Illumina HumanHap300-Duo platform. Our study included a total of 17,797-22,562 persons, aged 18-104 years and from geographic regions spanning from the Nordic countries to Southern Europe. We established 22 loci associated with serum lipid levels at a genome-wide significance level (P < 5 x 10(-8)), including 16 loci that were identified by previous GWA studies. The six newly identified loci in our cohort samples are ABCG5 (TC, P = 1.5 x 10(-11); LDL, P = 2.6 x 10(-10)), TMEM57 (TC, P = 5.4 x 10(-10)), CTCF-PRMT8 region (HDL, P = 8.3 x 10(-16)), DNAH11 (LDL, P = 6.1 x 10(-9)), FADS3-FADS2 (TC, P = 1.5 x 10(-10); LDL, P = 4.4 x 10(-13)) and MADD-FOLH1 region (HDL, P = 6 x 10(-11)). For three loci, effect sizes differed significantly by sex. Genetic risk scores based on lipid loci explain up to 4.8% of variation in lipids and were also associated with increased intima media thickness (P = 0.001) and coronary heart disease incidence (P = 0.04). The genetic risk score improves the screening of high-risk groups of dyslipidemia over classical risk factors.
Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: MC_U127561128; NHLBI NIH HHS: 5R01HL087679-02; Wellcome Trust: 089061
Nature genetics 2009;41;1;47-55
A novel system of polymorphic and diverse NK cell receptors in primates.
Department of Primate Genetics, German Primate Centre, Göttingen, Germany.
There are two main classes of natural killer (NK) cell receptors in mammals, the killer cell immunoglobulin-like receptors (KIR) and the structurally unrelated killer cell lectin-like receptors (KLR). While KIR represent the most diverse group of NK receptors in all primates studied to date, including humans, apes, and Old and New World monkeys, KLR represent the functional equivalent in rodents. Here, we report a first digression from this rule in lemurs, where the KLR (CD94/NKG2) rather than KIR constitute the most diverse group of NK cell receptors. We demonstrate that natural selection contributed to such diversification in lemurs and particularly targeted KLR residues interacting with the peptide presented by MHC class I ligands. We further show that lemurs lack a strict ortholog or functional equivalent of MHC-E, the ligands of non-polymorphic KLR in "higher" primates. Our data support the existence of a hitherto unknown system of polymorphic and diverse NK cell receptors in primates and of combinatorial diversity as a novel mechanism to increase NK cell receptor repertoire.
Funded by: Intramural NIH HHS; NIAID NIH HHS: AI 31168, R01 AI031168; PHS HHS: HHSN261200800001E
PLoS genetics 2009;5;10;e1000688
Gene body methylation of the dimethylarginine dimethylamino-hydrolase 2 (Ddah2) gene is an epigenetic biomarker for neural stem cell differentiation.
UCL Cancer Institute, University College London, London WC1E 6BT, UK. email@example.com
DNA methylation is an important epigenetic mark that is involved in the regulation of many cellular processes such as gene expression, genomic imprinting and silencing of repetitive elements. Because of their ability to cause and capture phenotypic plasticity, epigenetic marks such as DNA methylation represent potential biomarkers to distinguish between different types of tissues and stages of differentiation. Here, we have identified differential DNA methylation in the gene body of the nitric oxide inhibitor Ddah2 that discriminates embryonic stem cells from neural stem cells and is positively correlated with differential gene expression.
Funded by: Wellcome Trust: WT-084071
Genomic complexity of the Y-STR DYS19: inversions, deletions and founder lineages carrying duplications.
Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK.
The Y-STR DYS19 is firmly established in the repertoire of Y-chromosomal markers used in forensic analysis yet is poorly understood at the molecular level, lying in a complex genomic environment and exhibiting null alleles, as well as duplications and occasional triplications in population samples. Here, we analyse three null alleles and 51 duplications and show that DYS19 can also be involved in inversion events, so that even its location within the short arm of the Y chromosome is uncertain. Deletion mapping in the three chromosomes carrying null alleles shows that their deletions are less than approximately 300 kb in size. Haplotypic analysis with binary markers shows that they belong to three different haplogroups and so represent independent events. In contrast, a collection of 51 DYS19 duplication chromosomes belong to only four haplogroups: two are singletons and may represent somatic mutation in lymphoblastoid cell lines, but two, in haplogroups G and C3c, represent founder lineages that have spread widely in Central Europe/West Asia and East Asia, respectively. Consideration of candidate mechanisms underlying both deletions and duplications provides no evidence for the involvement of non-allelic homologous recombination, and they are likely to represent sporadic events with low mutation rates. Understanding the basis and population distribution of these DYS19 alleles will aid in the utilisation and interpretation of profiles that contain them.
Funded by: Wellcome Trust: 057559, 077009
International journal of legal medicine 2009;123;1;15-23
Replication analysis identifies TYK2 as a multiple sclerosis susceptibility factor.
Department of Clinical Neuroscience, Addenbrooke's, Hospital, University of Cambridge, Cambridge, UK. firstname.lastname@example.org
In a recent genome-wide association study (GWAS) based on 12,374 non-synonymous single nucleotide polymorphisms we identified a number of candidate multiple sclerosis susceptibility genes. Here, we describe the extended analysis of 17 of these loci undertaken using an additional 4234 patients, 2983 controls and 2053 trio families. In the final analysis combining all available data, we found that evidence for association was substantially increased for one of the 17 loci, rs34536443 from the tyrosine kinase 2 (TYK2) gene (P=2.7 x 10(-6), odds ratio=1.32 (1.17-1.47)). This single nucleotide polymorphism results in an amino acid substitution (proline to alanine) in the kinase domain of TYK2, which is predicted to influence the levels of phosphorylation and therefore activity of the protein and so is likely to have a functional role in multiple sclerosis.
Funded by: Medical Research Council: G0000934, G0600329, G0700061, MC_U105292688; NINDS NIH HHS: NS 049477-01A1, R01 NS049477, R01 NS049477-01A1; Wellcome Trust: 061858, 068545/Z/02, 076113, 085475, 090532
European journal of human genetics : EJHG 2009;17;10;1309-13
Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.
Type 1 diabetes (T1D) is a common autoimmune disorder that arises from the action of multiple genetic and environmental risk factors. We report the findings of a genome-wide association study of T1D, combined in a meta-analysis with two previously published studies. The total sample set included 7,514 cases and 9,045 reference samples. Forty-one distinct genomic locations provided evidence for association with T1D in the meta-analysis (P < 10(-6)). After excluding previously reported associations, we further tested 27 regions in an independent set of 4,267 cases, 4,463 controls and 2,319 affected sib-pair (ASP) families. Of these, 18 regions were replicated (P < 0.01; overall P < 5 × 10(-8)) and 4 additional regions provided nominal evidence of replication (P < 0.05). The many new candidate genes suggested by these results include IL10, IL19, IL20, GLIS3, CD69 and IL27.
Funded by: Medical Research Council: G0000934; NIDDK NIH HHS: DK46635, K08 DK002876, K08 DK002876-06, R01 DK046635, R01 DK046635-15, U01 DK062418, U01 DK062418-06; NIMH NIH HHS: MH 63420, MH059565, MH059571, MH059588, MH060879, MH061675, MH067257, MH59566, MH59586, MH59587, MH60870; Wellcome Trust: 061858, 076113
Nature genetics 2009;41;6;703-7
Phospholipid scramblases and Tubby-like proteins belong to a new superfamily of membrane tethered transcription factors.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. email@example.com
Motivation: Phospholipid scramblases (PLSCRs) constitute a family of cytoplasmic membrane-associated proteins that were identified based upon their capacity to mediate a Ca(2+)-dependent bidirectional movement of phospholipids across membrane bilayers, thereby collapsing the normally asymmetric distribution of such lipids in cell membranes. The exact function and mechanism(s) of these proteins nevertheless remains obscure: data from several laboratories now suggest that in addition to their putative role in mediating transbilayer flip/flop of membrane lipids, the PLSCRs may also function to regulate diverse processes including signaling, apoptosis, cell proliferation and transcription. A major impediment to deducing the molecular details underlying the seemingly disparate biology of these proteins is the current absence of any representative molecular structures to provide guidance to the experimental investigation of their function.
Results: Here, we show that the enigmatic PLSCR family of proteins is directly related to another family of cellular proteins with a known structure. The Arabidopsis protein At5g01750 from the DUF567 family was solved by X-ray crystallography and provides the first structural model for this family. This model identifies that the presumed C-terminal transmembrane helix is buried within the core of the PLSCR structure, suggesting that palmitoylation may represent the principal membrane anchorage for these proteins. The fold of the PLSCR family is also shared by Tubby-like proteins. A search of the PDB with the HHpred server suggests a common evolutionary ancestry. Common functional features also suggest that tubby and PLSCR share a functional origin as membrane tethered transcription factors with capacity to modulate phosphoinositide-based signaling.
Funded by: NHLBI NIH HHS: HL036946, HL063819, HL076215; Wellcome Trust: 087656, WT077044/Z/05/Z
Bioinformatics (Oxford, England) 2009;25;2;159-62
Neuroproteomics: understanding the molecular organization and complexity of the brain.
Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
Advances in technology have equipped the field of neuroproteomics with refined tools for the study of the expression, interaction and function of proteins in the nervous system. In combination with bioinformatics, neuroproteomics can address the organization of dynamic, functional protein networks and macromolecular structures that underlie physiological, anatomical and behavioural processes. Furthermore, neuroproteomics is contributing to the elucidation of disease mechanisms and is a powerful tool for the identification of biomarkers.
Funded by: Medical Research Council; Wellcome Trust
Nature reviews. Neuroscience 2009;10;9;635-46
The genome of the blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. firstname.lastname@example.org
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873, R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845, R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z
Public health. The cholera crisis in Africa.
Indian Council of Medical Research, Ansari Nagore, New Delhi, 110029, India.
Science (New York, N.Y.) 2009;324;5929;885
Calcium-dependent signaling and kinases in apicomplexan parasites.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Calcium controls many critical events in the complex life cycles of apicomplexan parasites including protein secretion, motility, and development. Calcium levels are normally tightly regulated and rapid release of calcium into the cytosol activates a family of calcium-dependent protein kinases (CDPKs), which are normally characteristic of plants. CDPKs present in apicomplexans have acquired a number of unique domain structures likely reflecting their diverse functions. Calcium regulation in parasites is closely linked to signaling by cyclic nucleotides and their associated kinases. This Review summarizes the pivotal roles that calcium- and cyclic nucleotide-dependent kinases play in unique aspects of parasite biology.
Funded by: Medical Research Council: G0501670; NIAID NIH HHS: AI34036, R01 AI034036, R01 AI034036-17, R01 AI082423, R01 AI082423-01, R21 AI067051
Cell host & microbe 2009;5;6;612-22
IRS2 variants and syndromes of severe insulin resistance.
Funded by: Wellcome Trust: 077016, 077016/Z/05/Z, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z
The genome sequence of taurine cattle: a window to ruminant biology and evolution.
To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/13438, BBS/B/13446; NHGRI NIH HHS: U54 HG003273, U54 HG003273-04, U54 HG003273-04S1, U54 HG003273-05, U54 HG003273-05S1, U54 HG003273-05S2, U54 HG003273-06, U54 HG003273-06S1, U54 HG003273-06S2, U54 HG003273-07, U54 HG003273-08; NIDA NIH HHS: P30 DA018310; Wellcome Trust: 062023, 077198
Science (New York, N.Y.) 2009;324;5926;522-8
Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds.
The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.
Funded by: NHGRI NIH HHS: U54 HG003273; NIGMS NIH HHS: R01 GM083606, R01 GM083606-02
Science (New York, N.Y.) 2009;324;5926;528-32
Accurate and sensitive peptide identification with Mascot Percolator.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.
Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.
Funded by: Wellcome Trust: 077198
Journal of proteome research 2009;8;6;3176-81
Functional diversity for REST (NRSF) is defined by in vivo binding affinity hierarchies at the DNA sequence level.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. email@example.com
The molecular events that contribute to, and result from, the in vivo binding of transcription factors to their cognate DNA sequence motifs in mammalian genomes are poorly understood. We demonstrate that variations within the DNA sequence motifs that bind the transcriptional repressor REST (NRSF) encode in vivo DNA binding affinity hierarchies that contribute to regulatory function during lineage-specific and developmental programs in fundamental ways. First, canonical sequence motifs for REST facilitate strong REST binding and control functional classes of REST targets that are common to all cell types, whilst atypical motifs participate in weak interactions and control those targets, which are cell- or tissue-specific. Second, variations in REST binding relate directly to variations in expression and chromatin configurations of REST's target genes. Third, REST clearance from its binding sites is also associated with variations in the RE1 motif. Finally, and most surprisingly, weak REST binding sites reside in DNA sequences that show the highest levels of constraint through evolution, thus facilitating their roles in maintaining tissue-specific functions. These relationships have never been reported in mammalian systems for any transcription factor.
Funded by: Wellcome Trust
Genome research 2009;19;6;994-1005
Genome-wide microarray-based comparative genomic hybridization analysis of lymphoplasmacytic lymphomas reveals heterogeneous aberrations.
Department of Cancer Genetics, Royal College of Surgeons in Ireland, Dublin, Ireland. firstname.lastname@example.org
Lymphoplasmacytic lymphoma (LPL) is not a sharply delineated lymphoma entity, either morphologically, phenotypically, or clinically. The diagnosis is often made by excluding other small cell lymphomas with plasmacytic differentiation, thus a genetic diagnostic marker would be of great benefit. Conventional cytogenetic techniques have previously demonstrated a deletion of 6q in a proportion of cases, varying from 7 to 55%. In this report, we apply array-based comparative genomic hybridization on 11 LPL samples. Genomic aberrations were detected in 9 of 11 cases, and included gains and losses. In general, the number of genetic aberrations was relatively low (two to three abnormalities per case). Recurrent aberrations detected were deletion of 6q (two cases), deletion of chromosome 17 (two cases), gain of 3q (two cases), and gain of chromosome 7 (two cases). This report not only confirms the reported loss of 6q in a proportion of cases but also highlights the genetic heterogeneity of LPL, in accordance with the known immunophenotypical, morphological, and clinical diversity of the disease.
Funded by: Wellcome Trust
Leukemia & lymphoma 2009;50;9;1528-34
The T3SS effector EspT defines a new category of invasive enteropathogenic E. coli (EPEC) which form intracellular actin pedestals.
Centre for Molecular Microbiology and Infection, Division of Cell and Molecular Biology, Imperial College London, London, United Kingdom.
Enteropathogenic Escherichia coli (EPEC) strains are defined as extracellular pathogens which nucleate actin rich pedestal-like membrane extensions on intestinal enterocytes to which they intimately adhere. EPEC infection is mediated by type III secretion system effectors, which modulate host cell signaling. Recently we have shown that the WxxxE effector EspT activates Rac1 and Cdc42 leading to formation of membrane ruffles and lamellipodia. Here we report that EspT-induced membrane ruffles facilitate EPEC invasion into non-phagocytic cells in a process involving Rac1 and Wave2. Internalized EPEC resides within a vacuole and Tir is localized to the vacuolar membrane, resulting in actin polymerization and formation of intracellular pedestals. To the best of our knowledge this is the first time a pathogen has been shown to induce formation of actin comets across a vacuole membrane. Moreover, our data breaks the dogma of EPEC as an extracellular pathogen and defines a new category of invasive EPEC.
Funded by: Medical Research Council: G0700823; Wellcome Trust
PLoS pathogens 2009;5;12;e1000683
Evolution of pathogenicity and sexual reproduction in eight Candida genomes.
UCD School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland. email@example.com
Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F00513X/1, BB/F013566/1; Medical Research Council: G0400284; NHGRI NIH HHS: R01 HG004037, R01 HG004037-02, U54 HG003067, U54 HG003067-06; NIAID NIH HHS: HHSN266200400001C, R01 AI050113, R01 AI075096; NIDCR NIH HHS: R01 DE015873; Wellcome Trust
Somatic and germline genetics at the JAK2 locus.
Wellcome Trust Sanger Institute, Hinxton, UK. firstname.lastname@example.org
Myeloproliferative neoplasms are hematological malignancies frequently associated with somatically acquired mutation of the JAK2 gene. A new study shows that these mutations are preferentially found within a particular inherited JAK2 haplotype, implying the existence of a strong, but uncharacterized, interaction between somatic and germline genetics at this locus.
Funded by: Wellcome Trust: 088340
Nature genetics 2009;41;4;385-6
TLR9 polymorphisms in African populations: no association with severe malaria, but evidence of cis-variants acting on gene expression.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. email@example.com
Background: During malaria infection the Toll-like receptor 9 (TLR9) is activated through induction with plasmodium DNA or another malaria motif not yet identified. Although TLR9 activation by malaria parasites is well reported, the implication to the susceptibility to severe malaria is not clear. The aim of this study was to assess the contribution of genetic variation at TLR9 to severe malaria.
Methods: This study explores the contribution of TLR9 genetic variants to severe malaria using two approaches. First, an association study of four common single nucleotide polymorphisms was performed on both family- and population-based studies from Malawian and Gambian populations (n>6000 individual). Subsequently, it was assessed whether TLR9 expression is affected by cis-acting variants and if these variants could be mapped. For this work, an allele specific expression (ASE) assay on a panel of HapMap cell lines was carried out.
Results: No convincing association was found with polymorphisms in TLR9 for malaria severity, in either Gambian or Malawian populations, using both case-control and family based study designs. Using an allele specific expression assay it was observed that TLR9 expression is affected by cis-acting variants, these results were replicated in a second experiment using biological replicates.
Conclusion: By using the largest cohorts analysed to date, as well as a standardized phenotype definition and study design, no association of TLR9 genetic variants with severe malaria was found. This analysis considered all common variants in the region, but it is remains possible that there are rare variants with association signals. This report also shows that TLR9 expression is potentially modulated through cis-regulatory variants, which may lead to differential inflammatory responses to infection between individuals.
Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust
Malaria journal 2009;8;44
Genome watch: What a scorcher!
This month's Genome Watch looks at the publication of four hyperthermophilic archaeal genomes, three of which belong to the Crenarchaeota phylum and one of which belongs to the newly defined Nanoarchaeota phylum.
Nature reviews. Microbiology 2009;7;6;408-9
Induction of antibody responses to African horse sickness virus (AHSV) in ponies after vaccination with recombinant modified vaccinia Ankara (MVA).
Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk, United Kingdom.
Background: African horse sickness virus (AHSV) causes a non-contagious, infectious disease in equids, with mortality rates that can exceed 90% in susceptible horse populations. AHSV vaccines play a crucial role in the control of the disease; however, there are concerns over the use of polyvalent live attenuated vaccines particularly in areas where AHSV is not endemic. Therefore, it is important to consider alternative approaches for AHSV vaccine development. We have carried out a pilot study to investigate the ability of recombinant modified vaccinia Ankara (MVA) vaccines expressing VP2, VP7 or NS3 genes of AHSV to stimulate immune responses against AHSV antigens in the horse.
Methodology/principal findings: VP2, VP7 and NS3 genes from AHSV-4/Madrid87 were cloned into the vaccinia transfer vector pSC11 and recombinant MVA viruses generated. Antigen expression or transcription of the AHSV genes from cells infected with the recombinant viruses was confirmed. Pairs of ponies were vaccinated with MVAVP2, MVAVP7 or MVANS3 and both MVA vector and AHSV antigen-specific antibody responses were analysed. Vaccination with MVAVP2 induced a strong AHSV neutralising antibody response (VN titre up to a value of 2). MVAVP7 also induced AHSV antigen-specific responses, detected by western blotting. NS3 specific antibody responses were not detected.
Conclusions: This pilot study demonstrates the immunogenicity of recombinant MVA vectored AHSV vaccines, in particular MVAVP2, and indicates that further work to investigate whether these vaccines would confer protection from lethal AHSV challenge in the horse is justifiable.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/00654
PloS one 2009;4;6;e5997
Lineage-specific biology revealed by a finished genome assembly of the mouse.
National Center for Biotechnology Information, Bethesda, Maryland, United States of America. firstname.lastname@example.org
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.
Funded by: Medical Research Council: MC_U127561112, MC_U137761446, MC_U142684174; NHGRI NIH HHS: HG002385, U54 HG003273
PLoS biology 2009;7;5;e1000112
Tumor necrosis factor and lymphotoxin-alpha polymorphisms and severe malaria in African populations.
Wellcome Trust Centre for Human Genetics, University of Oxford, Nuffield Department of Medicine, John Radcliffe Hospital, Oxford, United Kingdom. email@example.com
The tumor necrosis factor gene (TNF) and lymphotoxin-alpha gene (LTA) have long attracted attention as candidate genes for susceptibility traits for malaria, and several of their polymorphisms have been found to be associated with severe malaria (SM) phenotypes. In a large study involving >10,000 individuals and encompassing 3 African populations, we found evidence to support the reported associations between the TNF -238 polymorphism and SM in The Gambia. However, no TNF/LTA polymorphisms were found to be associated with SM in cohorts in Kenya and Malawi. It has been suggested that the causal polymorphisms regulating the TNF and LTA responses may be located some distance from the genes. Therefore, more-detailed mapping of variants across TNF/LTA genes and their flanking regions in the Gambian and allied populations may need to be undertaken to find any causal polymorphisms.
Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 076934
The Journal of infectious diseases 2009;199;4;569-75
Neurotransmitters drive combinatorial multistate postsynaptic density networks.
Genes to Cognition, Wellcome Trust Sanger Institute, Cambridgeshire, UK.
The mammalian postsynaptic density (PSD) comprises a complex collection of approximately 1100 proteins. Despite extensive knowledge of individual proteins, the overall organization of the PSD is poorly understood. Here, we define maps of molecular circuitry within the PSD based on phosphorylation of postsynaptic proteins. Activation of a single neurotransmitter receptor, the N-methyl-D-aspartate receptor (NMDAR), changed the phosphorylation status of 127 proteins. Stimulation of ionotropic and metabotropic glutamate receptors and dopamine receptors activated overlapping networks with distinct combinatorial phosphorylation signatures. Using peptide array technology, we identified specific phosphorylation motifs and switching mechanisms responsible for the integration of neurotransmitter receptor pathways and their coordination of multiple substrates in these networks. These combinatorial networks confer high information-processing capacity and functional diversity on synapses, and their elucidation may provide new insights into disease mechanisms and new opportunities for drug discovery.
Funded by: Medical Research Council: G90/93; Wellcome Trust: 066717
Science signaling 2009;2;68;ra19
Encyclopedia of Neuroscience 2009;971-81
Large scale association analysis of novel genetic loci for coronary artery disease.
Background: Combined analysis of 2 genome-wide association studies in cases enriched for family history recently identified 7 loci (on 1p13.3, 1q41, 2q36.3, 6q25.1, 9p21, 10q11.21, and 15q22.33) that may affect risk of coronary artery disease (CAD). Apart from the 9p21 locus, the other loci await substantive replication. Furthermore, the effect of these loci on CAD risk in a broader range of individuals remains to be determined.
Methods and results: We undertook association analysis of single nucleotide polymorphisms at each locus with CAD risk in 11,550 cases and 11,205 controls from 9 European studies. The 9p21.3 locus showed unequivocal association (rs1333049, combined odds ratio [OR]=1.20, 95% CI [1.16 to 1.25], probability value=2.81 x 10(-21)). We also confirmed association signals at 1p13.3 (rs599839, OR=1.13 [1.08 to 1.19], P=1.44 x 10(-7)), 1q41 (rs3008621, OR=1.10 [1.04 to 1.17], P=1.02 x 10(-3)), and 10q11.21 (rs501120, OR=1.11 [1.05 to 1.18], P=4.34 x 10(-4)). The associations with 6q25.1 (rs6922269, P=0.020) and 2q36.3 (rs2943634, P=0.032) were borderline and not statistically significant after correction for multiple testing. The 15q22.33 locus did not replicate. The 10q11.21 locus showed a possible sex interaction (P=0.015), with a significant effect in women (OR=1.29 [1.15 to 1.45], P=1.86 x 10(-5)) but not men (OR=1.03 [0.96 to 1.11], P=0.387). There were no other strong interactions of any of the loci with other traditional risk factors. The loci at 9p21, 1p13.3, 2q36.3, and 10q11.21 acted independently and cumulatively increased CAD risk by 15% (12% to 18%), per additional risk allele.
Conclusions: The findings provide strong evidence for association between at least 4 genetic loci and CAD risk. Cumulatively, these novel loci have a significant impact on risk of CAD at least in European populations.
Funded by: British Heart Foundation: CH/03/001/15569; Medical Research Council: G0401527, G0701863, MC_U106179471; Wellcome Trust: 077011, 082371, 091746
Arteriosclerosis, thrombosis, and vascular biology 2009;29;5;774-80
From small reads do mighty genomes grow.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
This month's Genome Watch discusses the use of next-generation sequencing technologies to assemble draft genomes for two pseudomonad species.
Nature reviews. Microbiology 2009;7;9;621
Influenza-specific amino acid substitution model
KSE 2009 - The 1st International Conference on Knowledge and Systems Engineering. 2009;5361735;19-25
X-box binding protein 1 contributes to induction of the Kaposi's sarcoma-associated herpesvirus lytic cycle under hypoxic conditions.
Department of Infection, UCL, London, United Kingdom.
Kaposi's sarcoma-associated herpesvirus (KSHV), like other herpesviruses, has two stages to its life cycle: latency and lytic replication. KSHV is required for development of Kaposi's sarcoma, a tumor of endothelial origin, and is associated with the B-cell tumor primary effusion lymphoma (PEL) and the plasmablastic variant of multicentric Castleman's disease, all of which are characterized by predominantly latent KSHV infection. Recently, we and others have shown that the activated form of transcription factor X-box binding protein 1 (XBP-1) is a physiological trigger of KSHV lytic reactivation in PEL. Here, we show that XBP-1s transactivates the ORF50/RTA promoter though an ACGT core containing the XBP-1 response element, an element previously identified as a weakly active hypoxia response element (HRE). Hypoxia induces the KSHV lytic cycle, and active HREs that respond to hypoxia-inducible factor 1alpha are present in the ORF50/RTA promoter. Hypoxia also induces active XBP-1s, and here, we show that both transcription factors contribute to the induction of RTA expression, leading to the production of infectious KSHV under hypoxic conditions.
Funded by: Cancer Research UK; Medical Research Council; Wellcome Trust
Journal of virology 2009;83;14;7202-9
A truncation mutation in TBC1D4 in a family with acanthosis nigricans and postprandial hyperinsulinemia.
Departments of Medicine and Clinical Biochemistry, University of Cambridge, Addenbrooke's Hospital, Cambridge, United Kingdom.
Tre-2, BUB2, CDC16, 1 domain family member 4 (TBC1D4) (AS160) is a Rab-GTPase activating protein implicated in insulin-stimulated glucose transporter 4 (GLUT4) translocation in adipocytes and myotubes. To determine whether loss-of-function mutations in TBC1D4 might impair GLUT4 translocation and cause insulin resistance in humans, we screened the coding regions of this gene in 156 severely insulin-resistant patients. A female presenting at age 11 years with acanthosis nigricans and extreme postprandial hyperinsulinemia was heterozygous for a premature stop mutation (R363X) in TBC1D4. After demonstrating reduced expression of wild-type TBC1D4 protein and expression of the truncated protein in lymphocytes from the proband, we further characterized the biological effects of the truncated protein in 3T3L1 adipocytes. Prematurely truncated TBC1D4 protein tended to increase basal cell membrane GLUT4 levels (P = 0.053) and significantly reduced insulin-stimulated GLUT4 cell membrane translocation (P < 0.05). When coexpressed with wild-type TBC1D4, the truncated protein dimerized with full-length TBC1D4, suggesting that the heterozygous truncated variant might interfere with its wild-type counterpart in a dominant negative fashion. Two overweight family members with the mutation also manifested normal fasting glucose and insulin levels but disproportionately elevated insulin levels following an oral glucose challenge. This family provides unique genetic evidence of TBC1D4 involvement in human insulin action.
Funded by: British Heart Foundation; Medical Research Council: G0600414; NCI NIH HHS: P30 CA023108; NIDDK NIH HHS: DK25336, R01 DK025336; Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2009;106;23;9350-5
Common regulatory variation impacts gene expression in a cell type-dependent manner.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1HH, Cambridge, UK.
Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.
Funded by: Wellcome Trust: 077011, 077046
Science (New York, N.Y.) 2009;325;5945;1246-50
Ectopic recombination of a malaria var gene during mitosis associated with an altered var switch rate.
Department of Medicine at RMH, University of Melbourne, Parkville 3050, Australia. email@example.com
The Plasmodium falciparum var multigene family encodes P. falciparum erythrocyte membrane protein 1, which is responsible for the pathogenic traits of antigenic variation and adhesion of infected erythrocytes to host receptors during malaria infection. Clonal antigenic variation of P. falciparum erythrocyte membrane protein 1 is controlled by the switching between exclusively transcribed var genes. The tremendous diversity of the var gene repertoire both within and between parasite strains is critical for the parasite's strategy of immune evasion. We show that ectopic recombination between var genes occurs during mitosis, providing P. falciparum with opportunities to diversify its var repertoire, even during the course of a single infection. We show that the regulation of the recombined var gene has been disrupted, resulting in its persistent activation although the regulation of most other var genes is unaffected. The var promoter and intron of the recombined var gene are not responsible for its atypically persistent activity, and we conclude that altered subtelomeric cis sequence is the most likely cause of the persistent activity of the recombined var gene.
Journal of molecular biology 2009;389;3;453-69
Genome-wide association study identifies variants at 9p21 and 22q13 associated with development of cutaneous nevi.
Department of Twin Research & Genetic Epidemiology, Kings College London, St. Thomas' Hospital Campus, London, UK. firstname.lastname@example.org
A high melanocytic nevi count is the strongest known risk factor for cutaneous melanoma. We conducted a genome-wide association study for nevus count using 297,108 SNPs in 1,524 twins, with validation in an independent cohort of 4,107 individuals. We identified strongly associated variants in MTAP, a gene adjacent to the familial melanoma susceptibility locus CDKN2A on 9p21 (rs4636294, combined P = 3.4 x 10(-15)), as well as in PLA2G6 on 22q13.1 (rs2284063, combined P = 3.4 x 10(-8)). In addition, variants in these two loci showed association with melanoma risk in 3,131 melanoma cases from two independent studies, including rs10757257 at 9p21, combined P = 3.4 x 10(-8), OR = 1.23 (95% CI = 1.15-1.30) and rs132985 at 22q13.1, combined P = 2.6 x 10(-7), OR = 1.23 (95% CI = 1.15-1.30). This provides the first report of common variants associated to nevus number and demonstrates association of these variants with melanoma susceptibility.
Funded by: Cancer Research UK: 10589, C588/A4994; Department of Health; NCI NIH HHS: CA88363, R01 CA083115, R01 CA083115-08, R01 CA83115; Wellcome Trust: 077011, 091746
Nature genetics 2009;41;8;915-9
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.
The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.
Funded by: Wellcome Trust
Molecular systems biology 2009;5;269
DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources.
Cambridge University Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK. email@example.com
Many patients suffering from developmental disorders harbor submicroscopic deletions or duplications that, by affecting the copy number of dosage-sensitive genes or disrupting normal gene expression, lead to disease. However, many aberrations are novel or extremely rare, making clinical interpretation problematic and genotype-phenotype correlations uncertain. Identification of patients sharing a genomic rearrangement and having phenotypic features in common leads to greater certainty in the pathogenic nature of the rearrangement and enables new syndromes to be defined. To facilitate the analysis of these rare events, we have developed an interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations. DECIPHER catalogs common copy-number changes in normal populations and thus, by exclusion, enables changes that are novel and potentially pathogenic to be identified. DECIPHER enhances genetic counseling by retrieving relevant information from a variety of bioinformatics resources. Known and predicted genes within an aberration are listed in the DECIPHER patient report, and genes of recognized clinical importance are highlighted and prioritized. DECIPHER enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community through display in the genome browser Ensembl. By sharing cases worldwide, clusters of rare cases having phenotype and structural rearrangement in common can be identified, leading to the delineation of new syndromes and furthering understanding of gene function.
Funded by: Wellcome Trust: WT077008
American journal of human genetics 2009;84;4;524-33
Rfam: updates to the RNA families database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK. firstname.lastname@example.org
Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/.
Funded by: Howard Hughes Medical Institute; Wellcome Trust: 077044
Nucleic acids research 2009;37;Database issue;D136-40
A home for RNA families at RNA Biology
Rna Biology 2009;6;2-4
Reduced TFAP2A function causes variable optic fissure closure and retinal defects and sensitizes eye development to mutations in other morphogenetic regulators.
Department of Cell and Developmental Biology, UCL, London, UK.
Mutations in the transcription factor encoding TFAP2A gene underlie branchio-oculo-facial syndrome (BOFS), a rare dominant disorder characterized by distinctive craniofacial, ocular, ectodermal and renal anomalies. To elucidate the range of ocular phenotypes caused by mutations in TFAP2A, we took three approaches. First, we screened a cohort of 37 highly selected individuals with severe ocular anomalies plus variable defects associated with BOFS for mutations or deletions in TFAP2A. We identified one individual with a de novo TFAP2A four amino acid deletion, a second individual with two non-synonymous variations in an alternative splice isoform TFAP2A2, and a sibling-pair with a paternally inherited whole gene deletion with variable phenotypic expression. Second, we determined that TFAP2A is expressed in the lens, neural retina, nasal process, and epithelial lining of the oral cavity and palatal shelves of human and mouse embryos--sites consistent with the phenotype observed in patients with BOFS. Third, we used zebrafish to examine how partial abrogation of the fish ortholog of TFAP2A affects the penetrance and expressivity of ocular phenotypes due to mutations in genes encoding bmp4 or tcf7l1a. In both cases, we observed synthetic, enhanced ocular phenotypes including coloboma and anophthalmia when tfap2a is knocked down in embryos with bmp4 or tcf7l1a mutations. These results reveal that mutations in TFAP2A are associated with a wide range of eye phenotypes and that hypomorphic tfap2a mutations can increase the risk of developmental defects arising from mutations at other loci.
Funded by: Medical Research Council: G0501487, G0700089; Wellcome Trust: 074376, 078047, WT077008
Human genetics 2009;126;6;791-803
Neonates harbour highly active gammadelta T cells with selective impairments in preterm infants.
Peter Gorer Department of Immunobiology, London, UK.
Acknowledgement of the breadth of T-cell pleiotropy has provoked increasing interest in the degree to which functional responsiveness is elicited by environmental cues versus differentiation. This is particularly relevant for young animals requiring rapid responses to acute environmental exposure. In young mice, gammadelta T cells are disproportionately important for immuno-protection. To examine the situation in humans, we compared populations and clones of T cells from term and preterm babies, and adults. By comparison with alphabeta T cells, neonate-derived gammadelta cells show stronger, pleiotropic functional responsiveness, and lack signatory deficits in IFN-gamma production. Emphasising the acquisition of functional competence in utero, IFN-gamma was produced by gammadelta cells sampled from premature births, and, although one month's post-partum environmental exposure invariably increased their TNF-alpha production, it had no consistent effect on IFN-gamma or IL-2. In sum, gammadelta cells seem well positioned at birth to contribute to immuno-protection and immuno-regulation, possibly compensating for selective immaturity in the alphabeta compartment. With regard to the susceptibilities of preterm babies to viral infection, gammadelta cells from preterm neonates were commonly impaired in Toll-like receptor-3 and -7 expression and compared with cells from term babies failed to optimise cytokine production in response to coincident TCR and TLR agonists.
Funded by: PHS HHS: R0161799; Wellcome Trust: 071534
European journal of immunology 2009;39;7;1794-806
A general basis for cognition in the evolution of synapse signaling complexes.
Genes to Cognition Programme, Wellcome Trust Sanger Institute, Cambridge, United Kingdom. email@example.com
Beneath the complexity of the human brain are molecular principles shaped by evolution explaining the origins of the behavioral repertoire. The role of the nervous system is to provide a repertoire of behaviors allowing the animal to respond and adapt to changing environments during the course of its life. Multiprotein complexes in the postsynaptic terminal of synapses control adaptive and cognitive processes in metazoan nervous systems. These multiprotein complexes are organized into molecular networks that detect and respond to patterns of neural activity. Combinations of proteins are used to build different complexes and pathways producing great diversity. These complexes evolved from an ancestral core set of proteins controlling adaptive behaviors in unicellular organisms known as the protosynapse. Later expansion in numbers and interactions resulted in more complex synapses in invertebrates and vertebrates. The resultant combinatorial complexity has contributed to the neuroanatomical, neurophysiological, and behavioral diversity in these species. Mutations in genes encoding the complexes result in many human diseases of the nervous system. This general mechanism of cognition provides a useful template for studying evolution of behavior in all animals.
Funded by: Wellcome Trust
Cold Spring Harbor symposia on quantitative biology 2009;74;249-57
Genetic utility of broadly defined bipolar schizoaffective disorder as a diagnostic concept.
Biostatistics and Bioinformatics Unit and Department of Psychological Medicine, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK.
Background: Psychiatric phenotypes are currently defined according to sets of descriptive criteria. Although many of these phenotypes are heritable, it would be useful to know whether any of the various diagnostic categories in current use identify cases that are particularly helpful for biological-genetic research.
Aims: To use genome-wide genetic association data to explore the relative genetic utility of seven different descriptive operational diagnostic categories relevant to bipolar illness within a large UK case-control bipolar disorder sample.
Method: We analysed our previously published Wellcome Trust Case Control Consortium (WTCCC) bipolar disorder genome-wide association data-set, comprising 1868 individuals with bipolar disorder and 2938 controls genotyped for 276 122 single nucleotide polymorphisms (SNPs) that met stringent criteria for genotype quality. For each SNP we performed a test of association (bipolar disorder group v. control group) and used the number of associated independent SNPs statistically significant at P<0.00001 as a metric for the overall genetic signal in the sample. We next compared this metric with that obtained using each of seven diagnostic subsets of the group with bipolar disorder: Research Diagnostic Criteria (RDC): bipolar I disorder; manic disorder; bipolar II disorder; schizoaffective disorder, bipolar type; DSM-IV: bipolar I disorder; bipolar II disorder; schizoaffective disorder, bipolar type.
Results: The RDC schizoaffective disorder, bipolar type (v. controls) stood out from the other diagnostic subsets as having a significant excess of independent association signals (P<0.003) compared with that expected in samples of the same size selected randomly from the total bipolar disorder group data-set. The strongest association in this subset of participants with bipolar disorder was at rs4818065 (P = 2.42 x 10(-7)). Biological systems implicated included gamma amniobutyric acid (GABA)(A) receptors. Genes having at least one associated polymorphism at P<10(-4) included B3GALTS, A2BP1, GABRB1, AUTS2, BSN, PTPRG, GIRK2 and CDH12.
Conclusions: Our findings show that individuals with broadly defined bipolar schizoaffective features have either a particularly strong genetic contribution or that, as a group, are genetically more homogeneous than the other phenotypes tested. The results point to the importance of using diagnostic approaches that recognise this group of individuals. Our approach can be applied to similar data-sets for other psychiatric and non-psychiatric phenotypes.
Funded by: Medical Research Council: G0000647, G0000934, G0701003; Wellcome Trust: 060620
The British journal of psychiatry : the journal of mental science 2009;195;1;23-9
Identification of MAMDC1 as a candidate susceptibility gene for systemic lupus erythematosus (SLE).
Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.
Background: Systemic lupus erythematosus (SLE) is a complex autoimmune disorder with multiple susceptibility genes. We have previously reported suggestive linkage to the chromosomal region 14q21-q23 in Finnish SLE families.
Genetic fine mapping of this region in the same family material, together with a large collection of parent affected trios from UK and two independent case-control cohorts from Finland and Sweden, indicated that a novel uncharacterized gene, MAMDC1 (MAM domain containing glycosylphosphatidylinositol anchor 2, also known as MDGA2, MIM 611128), represents a putative susceptibility gene for SLE. In a combined analysis of the whole dataset, significant evidence of association was detected for the MAMDC1 intronic single nucleotide polymorphisms (SNP) rs961616 (P -value = 0.001, Odds Ratio (OR) = 1.292, 95% CI 1.103-1.513) and rs2297926 (P -value = 0.003, OR = 1.349, 95% CI 1.109-1.640). By Northern blot, real-time PCR (qRT-PCR) and immunohistochemical (IHC) analyses, we show that MAMDC1 is expressed in several tissues and cell types, and that the corresponding mRNA is up-regulated by the pro-inflammatory cytokines tumour necrosis factor alpha (TNF-alpha) and interferon gamma (IFN-gamma) in THP-1 monocytes. Based on its homology to known proteins with similar structure, MAMDC1 appears to be a novel member of the adhesion molecules of the immunoglobulin superfamily (IgCAM), which is involved in cell adhesion, migration, and recruitment to inflammatory sites. Remarkably, some IgCAMs have been shown to interact with ITGAM, the product of another SLE susceptibility gene recently discovered in two independent genome wide association (GWA) scans.
Significance: Further studies focused on MAMDC1 and other molecules involved in these pathways might thus provide new insight into the pathogenesis of SLE.
PloS one 2009;4;12;e8037
Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. firstname.lastname@example.org
Background: Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood.
Methodology/principal findings: The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, approximately 40% of the approximately 2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three approximately 90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors.
Conclusions/significance: The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G019274/1; Wellcome Trust: 089472
PloS one 2009;4;7;e6072
Genomic evidence for the evolution of Streptococcus equi: host restriction, increased virulence, and genetic exchange with human pathogens.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The continued evolution of bacterial pathogens has major implications for both human and animal disease, but the exchange of genetic material between host-restricted pathogens is rarely considered. Streptococcus equi subspecies equi (S. equi) is a host-restricted pathogen of horses that has evolved from the zoonotic pathogen Streptococcus equi subspecies zooepidemicus (S. zooepidemicus). These pathogens share approximately 80% genome sequence identity with the important human pathogen Streptococcus pyogenes. We sequenced and compared the genomes of S. equi 4047 and S. zooepidemicus H70 and screened S. equi and S. zooepidemicus strains from around the world to uncover evidence of the genetic events that have shaped the evolution of the S. equi genome and led to its emergence as a host-restricted pathogen. Our analysis provides evidence of functional loss due to mutation and deletion, coupled with pathogenic specialization through the acquisition of bacteriophage encoding a phospholipase A(2) toxin, and four superantigens, and an integrative conjugative element carrying a novel iron acquisition system with similarity to the high pathogenicity island of Yersinia pestis. We also highlight that S. equi, S. zooepidemicus, and S. pyogenes share a common phage pool that enhances cross-species pathogen evolution. We conclude that the complex interplay of functional loss, pathogenic specialization, and genetic exchange between S. equi, S. zooepidemicus, and S. pyogenes continues to influence the evolution of these important streptococci.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G019274/1; Wellcome Trust: 047072, 087622, 089472
PLoS pathogens 2009;5;3;e1000346
The genome of Burkholderia cenocepacia J2315, an epidemic pathogen of cystic fibrosis patients.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom. email@example.com
Bacterial infections of the lungs of cystic fibrosis (CF) patients cause major complications in the treatment of this common genetic disease. Burkholderia cenocepacia infection is particularly problematic since this organism has high levels of antibiotic resistance, making it difficult to eradicate; the resulting chronic infections are associated with severe declines in lung function and increased mortality rates. B. cenocepacia strain J2315 was isolated from a CF patient and is a member of the epidemic ET12 lineage that originated in Canada or the United Kingdom and spread to Europe. The 8.06-Mb genome of this highly transmissible pathogen comprises three circular chromosomes and a plasmid and encodes a broad array of functions typical of this metabolically versatile genus, as well as numerous virulence and drug resistance functions. Although B. cenocepacia strains can be isolated from soil and can be pathogenic to both plants and man, J2315 is representative of a lineage of B. cenocepacia rarely isolated from the environment and which spreads between CF patients. Comparative analysis revealed that ca. 21% of the genome is unique in comparison to other strains of B. cenocepacia, highlighting the genomic plasticity of this species. Pseudogenes in virulence determinants suggest that the pathogenic response of J2315 may have been recently selected to promote persistence in the CF lung. The J2315 genome contains evidence that its unique and highly adapted genetic content has played a significant role in its success as an epidemic CF pathogen.
Funded by: Wellcome Trust
Journal of bacteriology 2009;191;1;261-77
Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.
MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Heath Park, CF23 6BQ Cardiff, UK. firstname.lastname@example.org
We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis.
Funded by: Medical Research Council; Wellcome Trust
American journal of human genetics 2009;85;1;13-24
Complete genome sequence and comparative genome analysis of enteropathogenic Escherichia coli O127:H6 strain E2348/69.
Division of Bioenvironmental Science, Frontier Science Research Center, University of Miyazaki, Miyazaki, Japan.
Enteropathogenic Escherichia coli (EPEC) was the first pathovar of E. coli to be implicated in human disease; however, no EPEC strain has been fully sequenced until now. Strain E2348/69 (serotype O127:H6 belonging to E. coli phylogroup B2) has been used worldwide as a prototype strain to study EPEC biology, genetics, and virulence. Studies of E2348/69 led to the discovery of the locus of enterocyte effacement-encoded type III secretion system (T3SS) and its cognate effectors, which play a vital role in attaching and effacing lesion formation on gut epithelial cells. In this study, we determined the complete genomic sequence of E2348/69 and performed genomic comparisons with other important E. coli strains. We identified 424 E2348/69-specific genes, most of which are carried on mobile genetic elements, and a number of genetic traits specifically conserved in phylogroup B2 strains irrespective of their pathotypes, including the absence of the ETT2-related T3SS, which is present in E. coli strains belonging to all other phylogroups. The genome analysis revealed the entire gene repertoire related to E2348/69 virulence. Interestingly, E2348/69 contains only 21 intact T3SS effector genes, all of which are carried on prophages and integrative elements, compared to over 50 effector genes in enterohemorrhagic E. coli O157. As E2348/69 is the most-studied pathogenic E. coli strain, this study provides a genomic context for the vast amount of existing experimental data. The unexpected simplicity of the E2348/69 T3SS provides the first opportunity to fully dissect the entire virulence strategy of attaching and effacing pathogens in the genomic context.
Funded by: Medical Research Council: G0700151; Wellcome Trust
Journal of bacteriology 2009;191;1;347-54
Common variants at five new loci associated with early-onset inflammatory bowel disease.
Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
The inflammatory bowel diseases (IBD) Crohn's disease and ulcerative colitis are common causes of morbidity in children and young adults in the western world. Here we report the results of a genome-wide association study in early-onset IBD involving 3,426 affected individuals and 11,963 genetically matched controls recruited through international collaborations in Europe and North America, thereby extending the results from a previous study of 1,011 individuals with early-onset IBD. We have identified five new regions associated with early-onset IBD susceptibility, including 16p11 near the cytokine gene IL27 (rs8049439, P = 2.41 x 10(-9)), 22q12 (rs2412973, P = 1.55 x 10(-9)), 10q22 (rs1250550, P = 5.63 x 10(-9)), 2q37 (rs4676410, P = 3.64 x 10(-8)) and 19q13.11 (rs10500264, P = 4.26 x 10(-10)). Our scan also detected associations at 23 of 32 loci previously implicated in adult-onset Crohn's disease and at 8 of 17 loci implicated in adult-onset ulcerative colitis, highlighting the close pathogenetic relationship between early- and adult-onset IBD.
Funded by: Canadian Institutes of Health Research; Chief Scientist Office: CZB/4/540, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: C06-RR11234, M01 RR002172-26, M01-RR00064; NIDDK NIH HHS: DK062423, DK069513, K24 DK060617, K24 DK060617-07, P30 DK040561, P30 DK040561-14, P30 DK043351, T32 DK007477, U01 DK062413, U01 DK062420, U01 DK062420-08; Wellcome Trust: 072789/Z/03/Z
Nature genetics 2009;41;12;1335-40
Transposon-mediated genome manipulation in vertebrates.
Max Delbrück Center for Molecular Medicine, Berlin, Germany. email@example.com
Transposable elements are DNA segments with the unique ability to move about in the genome. This inherent feature can be exploited to harness these elements as gene vectors for genome manipulation. Transposon-based genetic strategies have been established in vertebrate species over the last decade, and current progress in this field suggests that transposable elements will serve as indispensable tools. In particular, transposons can be applied as vectors for somatic and germline transgenesis, and as insertional mutagens in both loss-of-function and gain-of-function forward mutagenesis screens. In addition, transposons will gain importance in future cell-based clinical applications, including nonviral gene transfer into stem cells and the rapidly developing field of induced pluripotent stem cells. Here we provide an overview of transposon-based methods used in vertebrate model organisms with an emphasis on the mouse system and highlight the most important considerations concerning genetic applications of the transposon systems.
Funded by: NCI NIH HHS: P01 CA016519, P01 CA016519-340010; NIGMS NIH HHS: R01 GM036481
Nature methods 2009;6;6;415-22
Genome-wide and fine-resolution association analysis of malaria in West Africa.
MRC Laboratories, Fajara, Banjul, Gambia.
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
Funded by: Chief Scientist Office: CZB/4/540, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062, 090532
Nature genetics 2009;41;6;657-65
Effects of calcium signaling on Plasmodium falciparum erythrocyte invasion and post-translational modification of gliding-associated protein 45 (PfGAP45).
Department of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
Plasmodium falciparum erythrocyte invasion is powered by an actin/myosin motor complex that is linked both to the tight junction and to the merozoite cytoskeleton through the Inner Membrane Complex (IMC). The IMC association of the myosin motor, PfMyoA, is maintained by its association with three proteins: PfMTIP, a myosin light chain, PfGAP45, an IMC peripheral membrane protein, and PfGAP50, an integral membrane protein of the IMC. This protein complex is referred to as the glideosome, and given its central role in erythrocyte invasion, this complex is likely the target of several specific regulatory effectors that ensure it is properly localized, assembled, and activated as the merozoite prepares to invade its target cell. However, little is known about how erythrocyte invasion as a whole is regulated, or about how or whether that regulation impacts the glideosome. Here we show that P. falciparum erythrocyte invasion is regulated by the release of intracellular calcium via the cyclic-ADP Ribose (cADPR) pathway, but that inhibition of cADPR-mediated calcium release does not affect PfGAP45 phosphorylation or glideosome association. By contrast, the serine/threonine kinase inhibitor, staurosporine, affects both PfGAP45 isoform distribution and the integrity of the glideosome complex. This data identifies specific regulatory elements involved in controlling P. falciparum erythrocyte invasion and reveals that the assembly status of the merozoite glideosome, which is central to erythrocyte invasion, is surprisingly dynamic.
Funded by: NIAID NIH HHS: T32 AI055438, T32 AI055438-05
Molecular and biochemical parasitology 2009;168;1;55-62
Vaccines for Biodefense and Emerging and Neglected Diseases 2009;Chapter 57;1147–1161
Support for the involvement of large copy number variants in the pathogenesis of schizophrenia.
Department of Psychological Medicine, Cardiff University, Heath Park, Cardiff, UK.
We investigated the involvement of rare (<1%) copy number variants (CNVs) in 471 cases of schizophrenia and 2792 controls that had been genotyped using the Affymetrix GeneChip 500K Mapping Array. Large CNVs >1 Mb were 2.26 times more common in cases (P = 0.00027), with the effect coming mostly from deletions (odds ratio, OR = 4.53, P = 0.00013) although duplications were also more common (OR = 1.71, P = 0.04). Two large deletions were found in two cases each, but in no controls: a deletion at 22q11.2 known to be a susceptibility factor for schizophrenia and a deletion on 17p12, at 14.0-15.4 Mb. The latter is known to cause hereditary neuropathy with liability to pressure palsies. The same deletion was found in 6 of 4618 (0.13%) cases and 6 of 36 092 (0.017%) controls in the re-analysed data of two recent large CNV studies of schizophrenia (OR = 7.82, P = 0.001), with the combined significance level for all three studies achieving P = 5 x 10(-5). One large duplication on 16p13.1, which has been previously implicated as a susceptibility factor for autism, was found in three cases and six controls (0.6% versus 0.2%, OR = 2.98, P = 0.13). We also provide the first support for a recently reported association between deletions at 15q11.2 and schizophrenia (P = 0.026). This study confirms the involvement of rare CNVs in the pathogenesis of schizophrenia and contributes to the growing list of specific CNVs that are implicated.
Funded by: Medical Research Council; NIMH NIH HHS: 2 P50 MH066392-05A1; Wellcome Trust: 076113
Human molecular genetics 2009;18;8;1497-503
Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations.
Institute of Epidemiology, Helmholtz Zentrum München, National Research Center for Environment and Health, Neuherberg, Germany.
Elevated serum uric acid levels cause gout and are a risk factor for cardiovascular disease and diabetes. To investigate the polygenetic basis of serum uric acid levels, we conducted a meta-analysis of genome-wide association scans from 14 studies totalling 28,141 participants of European descent, resulting in identification of 954 SNPs distributed across nine loci that exceeded the threshold of genome-wide significance, five of which are novel. Overall, the common variants associated with serum uric acid levels fall in the following nine regions: SLC2A9 (p = 5.2x10(-201)), ABCG2 (p = 3.1x10(-26)), SLC17A1 (p = 3.0x10(-14)), SLC22A11 (p = 6.7x10(-14)), SLC22A12 (p = 2.0x10(-9)), SLC16A9 (p = 1.1x10(-8)), GCKR (p = 1.4x10(-9)), LRRC16A (p = 8.5x10(-9)), and near PDZK1 (p = 2.7x10(-9)). Identified variants were analyzed for gender differences. We found that the minor allele for rs734553 in SLC2A9 has greater influence in lowering uric acid levels in women and the minor allele of rs2231142 in ABCG2 elevates uric acid levels more strongly in men compared to women. To further characterize the identified variants, we analyzed their association with a panel of metabolites. rs12356193 within SLC16A9 was associated with DL-carnitine (p = 4.0x10(-26)) and propionyl-L-carnitine (p = 5.0x10(-8)) concentrations, which in turn were associated with serum UA levels (p = 1.4x10(-57) and p = 8.1x10(-54), respectively), forming a triangle between SNP, metabolites, and UA levels. Taken together, these associations highlight additional pathways that are important in the regulation of serum uric acid levels and point toward novel potential targets for pharmacological intervention to prevent or treat hyperuricemia. In addition, these findings strongly support the hypothesis that transport proteins are key in regulating serum uric acid levels.
Funded by: Arthritis Research UK; British Heart Foundation: FS/05/061/19501, PG02/128; Chief Scientist Office: CZB/4/710; Medical Research Council: G0400874, G9521010, G9521010D, MC_U127561128; NIA NIH HHS: N01-AG-1-2109; NIAAA NIH HHS: AA007535; Wellcome Trust: 076113/B/04/Z
PLoS genetics 2009;5;6;e1000504
Parental origin of sequence variants associated with complex diseases.
deCODE genetics, Sturlugata 8, 101 Reykjavík, Iceland. firstname.lastname@example.org
Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.
Funded by: Medical Research Council: G9723500, MC_U106179471, MC_U106179474, MC_U127592696; NIAMS NIH HHS: K08 AR055688; NIDDK NIH HHS: R01 DK029867; Wellcome Trust: 077016
Common genetic variation in the melatonin receptor 1B gene (MTNR1B) is associated with decreased early-phase insulin response.
MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK. email@example.com
Aims/hypothesis: We investigated whether variation in MTNR1B, which was recently identified as a common genetic determinant of fasting glucose levels in healthy, diabetes-free individuals, is associated with measures of beta cell function and whole-body insulin sensitivity.
Methods: We studied 1,276 healthy individuals of European ancestry at 19 centres of the Relationship between Insulin Sensitivity and Cardiovascular disease (RISC) study. Whole-body insulin sensitivity was assessed by euglycaemic-hyperinsulinaemic clamp and indices of beta cell function were derived from a 75 g oral glucose tolerance test (including 30 min insulin response and glucose sensitivity). We studied rs10830963 in MTNR1B using additive genetic models, adjusting for age, sex and recruitment centre.
Results: The minor (G) allele of rs10830963 in MTNR1B (frequency 0.30 in HapMap Centre d'Etude du Polymorphisme [Utah residents with northern and western European ancestry] [CEU]; 0.29 in RISC participants) was associated with higher levels of fasting plasma glucose (standardised beta [95% CI] 0.17 [0.085, 0.25] per G allele, p = 5.8 x 10(-5)), consistent with recent observations. In addition, the G-allele was significantly associated with lower early insulin response (-0.19 [-0.28, -0.10], p = 1.7 x 10(-5)), as well as with decreased beta cell glucose sensitivity (-0.11 [-0.20, -0.027], p = 0.010). No associations were observed with clamp-assessed insulin sensitivity (p = 0.15) or different measures of body size (p > 0.7 for all).
Conclusions/interpretation: Genetic variation in MTNR1B is associated with defective early insulin response and decreased beta cell glucose sensitivity, which may contribute to the higher glucose levels of non-diabetic individuals carrying the minor G allele of rs10830963 in MTNR1B.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0701863, MC_U106188470; Wellcome Trust: 077016, 077016/Z/05/Z
Testing the water: marine metagenomics.
Nature reviews. Microbiology 2009;7;8;552
Antibiotic treatment of clostridium difficile carrier mice triggers a supershedder state, spore-mediated transmission, and severe disease in immunocompromised hosts.
Microbial Pathogenesis Laboratory1 and Pathogen Genomics, Hinxton, United Kingdom. firstname.lastname@example.org
Clostridium difficile persists in hospitals by exploiting an infection cycle that is dependent on humans shedding highly resistant and infectious spores. Here we show that human virulent C. difficile can asymptomatically colonize the intestines of immunocompetent mice, establishing a carrier state that persists for many months. C. difficile carrier mice consistently shed low levels of spores but, surprisingly, do not transmit infection to cohabiting mice. However, antibiotic treatment of carriers triggers a highly contagious supershedder state, characterized by a dramatic reduction in the intestinal microbiota species diversity, C. difficile overgrowth, and excretion of high levels of spores. Stopping antibiotic treatment normally leads to recovery of the intestinal microbiota species diversity and suppresses C. difficile levels, although some mice persist in the supershedding state for extended periods. Spore-mediated transmission to immunocompetent mice treated with antibiotics results in self-limiting mucosal inflammation of the large intestine. In contrast, transmission to mice whose innate immune responses are compromised (Myd88(-/-)) leads to a severe intestinal disease that is often fatal. Thus, mice can be used to investigate distinct stages of the C. difficile infection cycle and can serve as a valuable surrogate for studying the spore-mediated transmission and interactions between C. difficile and the host and its microbiota, and the results obtained should guide infection control measures.
Funded by: Wellcome Trust
Infection and immunity 2009;77;9;3661-9
Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.
Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. email@example.com
Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.
Funded by: Wellcome Trust
Journal of bacteriology 2009;191;17;5377-86
GLIDERS--a web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. firstname.lastname@example.org
Background: A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 >or= 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers.
Description: GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range). The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%), distance limits between SNPs (minimum and maximum), r2 (0.3 to 1), HapMap population sample (CEU, YRI and JPT+CHB combined) and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file.
Conclusion: GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.
Funded by: Wellcome Trust: 079557, 079557MA, 088885/Z/09/Z
BMC bioinformatics 2009;10;367
An ENU-induced mutation of miR-96 associated with progressive hearing loss in mice.
Wellcome Trust Sanger Institute, Hinxton, UK.
Progressive hearing loss is common in the human population, but little is known about the molecular basis. We report a new N-ethyl-N-nitrosurea (ENU)-induced mouse mutant, diminuendo, with a single base change in the seed region of Mirn96. Heterozygotes show progressive loss of hearing and hair cell anomalies, whereas homozygotes have no cochlear responses. Most microRNAs are believed to downregulate target genes by binding to specific sites on their mRNAs, so mutation of the seed should lead to target gene upregulation. Microarray analysis revealed 96 transcripts with significantly altered expression in homozygotes; notably, Slc26a5, Ocm, Gfi1, Ptprq and Pitpnm1 were downregulated. Hypergeometric P-value analysis showed that hundreds of genes were upregulated in mutants. Different genes, with target sites complementary to the mutant seed, were downregulated. This is the first microRNA found associated with deafness, and diminuendo represents a model for understanding and potentially moderating progressive hair cell degeneration in hearing loss more generally.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077189, 077198
Nature genetics 2009;41;5;614-8
The Sequence Alignment/Map format and SAMtools.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK, Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA.
Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Funded by: NHGRI NIH HHS: R01 HG004719, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, U54HG002750; Wellcome Trust: 077192/Z/05/Z
Bioinformatics (Oxford, England) 2009;25;16;2078-9
Chromosomal mobilization and reintegration of Sleeping Beauty and PiggyBac transposons.
The Sleeping Beauty and PiggyBac DNA transposon systems have recently been developed as tools for insertional mutagenesis. We have compared the chromosomal mobilization efficiency and insertion site preference of the two transposons mobilized from the same donor site in mouse embryonic stem (ES) cells under conditions in which there were no selective constraints on the transposons' insertion sites. Compared with Sleeping Beauty, PiggyBac exhibits higher transposition efficiencies, no evidence for local hopping and a significant bias toward reintegration in intragenic regions, which demonstrate its utility for insertional mutagenesis. Although Sleeping Beauty had no detectable genomic bias with respect to insertions in genes or intergenic regions, both Sleeping Beauty and PiggyBac transposons displayed preferential integration into actively transcribed loci.
Genesis (New York, N.Y. : 2000) 2009;47;6;404-8
Genome-wide association scan meta-analysis identifies three Loci influencing adiposity and fat distribution.
Wellcome Trust Centre for Human Genetics, University of Oxford, , Oxford, United Kingdom.
To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.
Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation; Intramural NIH HHS; Medical Research Council: 0600705, G0000649, G0000934, G0500539, G0600705, G0601261, G0701863, G0801056, G9521010, G9521010D, MC_QA137934, MC_U106188470, MC_UP_A620_1014; NHLBI NIH HHS: HL084729, HL087679; NIDDK NIH HHS: DK062370, DK067288, DK07191, DK072193, DK075787, DK079466, DK080145, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, R01 DK029867, R01 DK072193; PHS HHS: G02651; Wellcome Trust: 064890, 068545/Z/02, 081682, 086596/Z/08/Z, 090532, GR069224, GR072960, GR076113
PLoS genetics 2009;5;6;e1000508
HI: haplotype improver using paired-end short reads.
The Wellcome Trust Sanger Institute, Hinxton, Cambs, UK. email@example.com
Summary: We present a program to improve haplotype reconstruction by incorporating information from paired-end reads, and demonstrate its utility on simulated data. We find that given a fixed coverage, longer reads (implying fewer of them) are preferable.
Availability: The executable and user manual can be freely downloaded from ftp://ftp.sanger.ac.uk/pub/zn1/HI.
Funded by: Wellcome Trust
Bioinformatics (Oxford, England) 2009;25;18;2436-7
Biology of Genomes: making sense of sequence.
Human Evolution, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org.
A report on the Biology of Genomes meeting held at Cold Spring Harbor Laboratory, NY, USA, 5-9 May 2009.
Genome medicine 2009;1;6;61
LookSeq: a browser-based viewer for deep sequencing data.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. email@example.com
Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust
Genome research 2009;19;11;2125-32
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. firstname.lastname@example.org
Motivation: High throughput sequencing technologies generate large amounts of short reads. Mapping these to a reference sequence consumes large amounts of processing time and memory, and read mapping errors can lead to noisy or incorrect alignments. SNP-o-matic is a fast, memory-efficient and stringent read mapping tool offering a variety of analytical output functions, with an emphasis on genotyping.
Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust
Bioinformatics (Oxford, England) 2009;25;18;2434-5
Donor-recipient mismatch for common gene deletion polymorphisms in graft-versus-host disease.
Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA. email@example.com
Transplantation and pregnancy, in which two diploid genomes reside in one body, can each lead to diseases in which immune cells from one individual target antigens encoded in the other's genome. One such disease, graft-versus-host disease (GVHD) after hematopoietic stem cell transplantation (HSCT, or bone marrow transplant), is common even after transplants between HLA-identical siblings, indicating that cryptic histocompatibility loci exist outside the HLA locus. The immune system of an individual whose genome is homozygous for a gene deletion could recognize epitopes encoded by that gene as alloantigens. Analyzing common gene deletions in three HSCT cohorts (1,345 HLA-identical sibling donor-recipient pairs), we found that risk of acute GVHD was greater (odds ratio (OR) = 2.5; 95% confidence interval (CI) 1.4-4.6) when donor and recipient were mismatched for homozygous deletion of UGT2B17, a gene expressed in GVHD-affected tissues and giving rise to multiple histocompatibility antigens. Human genome structural variation merits investigation as a potential mechanism in diseases of alloimmunity.
Funded by: NCI NIH HHS: CA18029, P01 CA018029, P01 CA018029-270048, P01 CA018029-349016; NHLBI NIH HHS: HL087690, P01 HL070149, P01 HL070149-05, R01 HL087690, R01 HL087690-03; NIAID NIH HHS: AI29530, AI33484, P01 AI029530, P01 AI029530-130007, P01 AI033484, P01 AI033484-13; PHS HHS: HA070149
Nature genetics 2009;41;12;1341-4
Microduplications of 16p11.2 are associated with schizophrenia.
Recurrent microdeletions and microduplications of a 600-kb genomic region of chromosome 16p11.2 have been implicated in childhood-onset developmental disorders. We report the association of 16p11.2 microduplications with schizophrenia in two large cohorts. The microduplication was detected in 12/1,906 (0.63%) cases and 1/3,971 (0.03%) controls (P = 1.2 x 10(-5), OR = 25.8) from the initial cohort, and in 9/2,645 (0.34%) cases and 1/2,420 (0.04%) controls (P = 0.022, OR = 8.3) of the replication cohort. The 16p11.2 microduplication was associated with a 14.5-fold increased risk of schizophrenia (95% CI (3.3, 62)) in the combined sample. A meta-analysis of datasets for multiple psychiatric disorders showed a significant association of the microduplication with schizophrenia (P = 4.8 x 10(-7)), bipolar disorder (P = 0.017) and autism (P = 1.9 x 10(-7)). In contrast, the reciprocal microdeletion was associated only with autism and developmental disorders (P = 2.3 x 10(-13)). Head circumference was larger in patients with the microdeletion than in patients with the microduplication (P = 0.0007).
Funded by: Intramural NIH HHS: ZIA MH002581-19; Medical Research Council: G0800509; NCRR NIH HHS: RR000037; NICHD NIH HHS: HD04147; NIDCR NIH HHS: DE016442; NIGMS NIH HHS: GM081519; NIMH NIH HHS: 1U24MH081810, K99 MH086756, K99 MH086756-01, K99 MH086756-02, MH061009, MH071523, MH074027, MH076431, MH077139, MH081810, MH083989, MH31340, MH44245, N01 MH90001, R00 MH086756, R00 MH086756-03, R01 MH091350; PHS HHS: HF004222; Wellcome Trust: 076113
Nature genetics 2009;41;11;1223-7
Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss.
Unidad de Genética Molecular, Hospital Ramón y Cajal, Madrid, Spain.
MicroRNAs (miRNAs) bind to complementary sites in their target mRNAs to mediate post-transcriptional repression, with the specificity of target recognition being crucially dependent on the miRNA seed region. Impaired miRNA target binding resulting from SNPs within mRNA target sites has been shown to lead to pathologies associated with dysregulated gene expression. However, no pathogenic mutations within the mature sequence of a miRNA have been reported so far. Here we show that point mutations in the seed region of miR-96, a miRNA expressed in hair cells of the inner ear, result in autosomal dominant, progressive hearing loss. This is the first study implicating a miRNA in a mendelian disorder. The identified mutations have a strong impact on miR-96 biogenesis and result in a significant reduction of mRNA targeting. We propose that these mutations alter the regulatory role of miR-96 in maintaining gene expression profiles in hair cells required for their normal function.
Funded by: Action on Hearing Loss: G41; Medical Research Council: G0300212, MC_QA137918; Wellcome Trust
Nature genetics 2009;41;5;609-13
Genetic structure of nomadic Bedouin from Kuwait.
Division of Genomic Medicine, University of Sheffield, Sheffield, UK.
Bedouin are traditionally nomadic inhabitants of the Persian Gulf who claim descent from two male lineages: Adnani and Qahtani. We have investigated whether or not this tradition is reflected in the current genetic structure of a sample of 153 Bedouin males from six Kuwaiti tribes, including three tribes from each traditional lineage. Volunteers were genotyped using a panel of autosomal and Y-STRs, and Y-SNPs. The samples clustered with their geographical neighbours in both the autosomal and Y-chromosomal analyses, and showed strong evidence of genetic isolation and drift. Although there was no evidence of segregation into the two male lineages, other aspects of genetic structure were in accord with tradition.
Funded by: Wellcome Trust: 077009
Inferring selection on amino acid preference in protein domains.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. firstname.lastname@example.org
Models that explicitly account for the effect of selection on new mutations have been proposed to account for "codon bias" or the excess of "preferred" codons that results from selection for translational efficiency and/or accuracy. In principle, such models can be applied to any mutation that results in a preferred allele, but in most cases, the fitness effect of a specific mutation cannot be predicted. Here we show that it is possible to assign preferred and unpreferred states to amino acid changing mutations that occur in protein domains. We propose that mutations that lead to more common amino acids (at a given position in a domain) can be considered "preferred alleles" just as are synonymous mutations leading to codons for more abundant tRNAs. We use genome-scale polymorphism data to show that alleles for preferred amino acids in protein domains occur at higher frequencies in the population, as has been shown for preferred codons. We show that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect. As expected, we also observe a reduction in the numbers of polymorphisms and substitutions at more important positions in domains, consistent with stronger selection at those positions. We examine the derived allele frequency distribution and polymorphism to divergence ratios of preferred and unpreferred differences and find evidence for both negative and positive selections acting to maintain protein domains in the human population. Finally, we analyze a model for selection on amino acid preferences in protein domains and find that it is consistent with the quantitative effects that we observe.
Funded by: Wellcome Trust: 077192
Molecular biology and evolution 2009;26;3;527-36
Gene-wide analyses of genome-wide association data sets: evidence for multiple common risk alleles for schizophrenia and bipolar disorder and for overlap in genetic risk.
Department of Psychological Medicine, School of Medicine, Cardiff University, Cardiff, UK.
Genome-wide association (GWAS) analyses have identified susceptibility loci for many diseases, but most risk for any complex disorder remains unattributed. There is therefore scope for complementary approaches to these data sets. Gene-wide approaches potentially offer additional insights. They might identify association to genes through multiple signals. Also, by providing support for genes rather than single nucleotide polymorphisms (SNPs), they offer an additional opportunity to compare the results across data sets. We have undertaken gene-wide analysis of two GWAS data sets: schizophrenia and bipolar disorder. We performed two forms of analysis, one based on the smallest P-value per gene, the other on a truncated product of P method. For each data set and at a range of statistical thresholds, we observed significantly more SNPs within genes (P(min) for excess<0.001) showing evidence for association than expected whereas this was not true for extragenic SNPs (P(min) for excess>0.1). At a range of thresholds of significance, we also observed substantially more associated genes than expected (P(min) for excess in schizophrenia=1.8 x 10(-8), in bipolar=2.4 x 10(-6)). Moreover, an excess of genes showed evidence for association across disorders. Among those genes surpassing thresholds highly enriched for true association, we observed evidence for association to genes reported in other GWAS data sets (CACNA1C) or to closely related family members of those genes including CSF2RB, CACNA1B and DGKI. Our analyses show that association signals are enriched in and around genes, large numbers of genes contribute to both disorders and gene-wide analyses offer useful complementary approaches to more standard methods.
Funded by: Medical Research Council: G0000934, G0800509; Wellcome Trust: 076113, 079643
Molecular psychiatry 2009;14;3;252-60
Novel genes in cell cycle control and lipid metabolism with dynamically regulated binding sites for sterol regulatory element-binding protein 1 and RNA polymerase II in HepG2 cells detected by chromatin immunoprecipitation with microarray detection.
Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden.
Sterol regulatory element-binding proteins 1 and 2 (SREBP-1 and SREBP-2) are important regulators of genes involved in cholesterol and fatty acid metabolism, but have also been implicated in the regulation of the cell cycle and have been associated with the pathogenesis of type 2 diabetes, atherosclerosis and obesity, among others. In this study, we aimed to characterize the binding sites of SREBP-1 and RNA polymerase II through chromatin immunoprecipitation and microarray analysis in 1% of the human genome, as defined by the Encyclopaedia of DNA Elements consortium, in a hepatocellular carcinoma cell line (HepG2). Our data identified novel binding sites for SREBP-1 in genes directly or indirectly involved in cholesterol metabolism, e.g. apolipoprotein C-III (APOC3). The most interesting biological findings were the binding sites for SREBP-1 in genes for host cell factor C1 (HCFC1), involved in cell cycle regulation, and for filamin A (FLNA). For RNA polymerase II, we found binding sites at classical promoters, but also in intergenic and intragenic regions. Furthermore, we found evidence of sterol-regulated binding of SREBP-1 and RNA polymerase II to HCFC1 and FLNA. From the results of this work, we infer that SREBP-1 may be involved in processes other than lipid metabolism.
The FEBS journal 2009;276;7;1878-90
Abnormal behavior in a chromosome-engineered mouse model for human 15q11-13 duplication seen in autism.
Osaka Bioscience Institute, Suita, Osaka 565-0874, Japan.
Substantial evidence suggests that chromosomal abnormalities contribute to the risk of autism. The duplication of human chromosome 15q11-13 is known to be the most frequent cytogenetic abnormality in autism. We have modeled this genetic change in mice by using chromosome engineering to generate a 6.3 Mb duplication of the conserved linkage group on mouse chromosome 7. Mice with a paternal duplication display poor social interaction, behavioral inflexibility, abnormal ultrasonic vocalizations, and correlates of anxiety. An increased MBII52 snoRNA within the duplicated region, affecting the serotonin 2c receptor (5-HT2cR), correlates with altered intracellular Ca(2+) responses elicited by a 5-HT2cR agonist in neurons of mice with a paternal duplication. This chromosome-engineered mouse model for autism seems to replicate various aspects of human autistic phenotypes and validates the relevance of the human chromosome abnormality. This model will facilitate forward genetics of developmental brain disorders and serve as an invaluable tool for therapeutic development.
Genome-wide association study identifies eight loci associated with blood pressure.
Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. email@example.com
Elevated blood pressure is a common, heritable cause of cardiovascular disease worldwide. To date, identification of common genetic variants influencing blood pressure has proven challenging. We tested 2.5 million genotyped and imputed SNPs for association with systolic and diastolic blood pressure in 34,433 subjects of European ancestry from the Global BPgen consortium and followed up findings with direct genotyping (N ≤ 71,225 European ancestry, N ≤ 12,889 Indian Asian ancestry) and in silico comparison (CHARGE consortium, N = 29,136). We identified association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2 (P = 1 × 10(-23)), FGF5 (P = 1 × 10(-21)), SH2B3 (P = 3 × 10(-18)), MTHFR (P = 2 × 10(-13)), c10orf107 (P = 1 × 10(-9)), ZNF652 (P = 5 × 10(-9)) and PLCD3 (P = 1 × 10(-8)) genes. All variants associated with continuous blood pressure were associated with dichotomous hypertension. These associations between common variants and blood pressure and hypertension offer mechanistic insights into the regulation of blood pressure and may point to novel targets for interventions to prevent cardiovascular disease.
Funded by: British Heart Foundation: FS/05/061/19501, PG02/128, SP/04/002; Cancer Research UK; Chief Scientist Office: CZB/4/540, ETM/75; Intramural NIH HHS; Medical Research Council: 85374, G0000934, G0400874, G0401527, G0501942, G0600329, G0701863, G0800759, G0801056, G9521010, G9521010D, MC_QA137934, MC_U105630924, MC_U106188470, MC_U137686857; NCRR NIH HHS: U54RR020278; NHGRI NIH HHS: 1Z01HG000024; NHLBI NIH HHS: K23 HL080025, K23 HL080025-04, K23HL083102, K23HL80025, R01 HL056931, R01 HL056931-02, R01 HL056931-03, R01 HL056931-04, R01HL056931, R01HL087676, R01HL087679; NIA NIH HHS: N01-AG-1-2109, N01AG-821336, N01AG-916413; NICHD NIH HHS: N01-HD-1-3107; NIDA NIH HHS: U54DA021519; NIDDK NIH HHS: DK062370, DK072193, R01 DK029867, R01 DK072193, U01DK062418; NIEHS NIH HHS: P30ES007033; NIMH NIH HHS: RL1MH083268; NIMHD NIH HHS: 263MD821336, 263MD916413; PHS HHS: 263-MA-410953; Wellcome Trust: 061858, 068545/Z/02, 070191/Z/03/Z, 076113, 076113/B/04/Z, 077011, 077016, 077016/Z/05/Z, 079557, 079895, 088885, 089061, 090532, WT088885/Z/09/Z
Nature genetics 2009;41;6;666-76
Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies.
Unit of Genetic Epidemiology and Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands.
To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the TwinsUK and BRIGHT cohorts in the UK and the DCCT/EDIC cohort from North America. Five loci were significantly associated with QT interval at P<1x10(-6). To validate these findings we performed an in silico comparison with data from two QT consortia: QTSCD (n = 15,842) and QTGEN (n = 13,685). Analysis confirmed the association between common variants near NOS1AP (P = 1.4x10(-83)) and the phospholamban (PLN) gene (P = 1.9x10(-29)). The most associated SNP near NOS1AP (rs12143842) explains 0.82% variance; the SNP near PLN (rs11153730) explains 0.74% variance of QT interval duration. We found no evidence for interaction between these two SNPs (P = 0.99). PLN is a key regulator of cardiac diastolic function and is involved in regulating intracellular calcium cycling, it has only recently been identified as a susceptibility locus for QT interval. These data offer further mechanistic insights into genetic influence on the QT interval which may predispose to life threatening arrhythmias and sudden cardiac death.
Funded by: Biotechnology and Biological Sciences Research Council: G20234; British Heart Foundation: 06/094, FS/05/061/19501, PG02/128, SP/02/001; Department of Health; Medical Research Council: G0400874, G0501942, G9521010, G9521010D; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL054512, HL86694, K23-HL-080025, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85086, N02-HL-6-4278, R01 HL087652, R01HL086694, R01HL087641, R01HL59367, U01 HL080295; NIA NIH HHS: N01-AG-1-2109; NIDDK NIH HHS: N01-DK-6-2204, R01-DK-077510; PHS HHS: 263-MA-410953, HHSN268200625226C; Wellcome Trust: WT088885/Z/09/Z
PloS one 2009;4;7;e6138
Functional genomics in zebrafish permits rapid characterization of novel platelet membrane proteins.
Department of Haematology, University of Cambridge, Cambridge, United Kingdom.
In this study, we demonstrate the suitability of the vertebrate Danio rerio (zebrafish) for functional screening of novel platelet genes in vivo by reverse genetics. Comparative transcript analysis of platelets and their precursor cell, the megakaryocyte, together with nucleated blood cell elements, endothelial cells, and erythroblasts, identified novel platelet membrane proteins with hitherto unknown roles in thrombus formation. We determined the phenotype induced by antisense morpholino oligonucleotide (MO)-based knockdown of 5 of these genes in a laser-induced arterial thrombosis model. To validate the model, the genes for platelet glycoprotein (GP) IIb and the coagulation protein factor VIII were targeted. MO-injected fish showed normal thrombus initiation but severely impaired thrombus growth, consistent with the mouse knockout phenotypes, and concomitant knockdown of both resulted in spontaneous bleeding. Knockdown of 4 of the 5 novel platelet proteins altered arterial thrombosis, as demonstrated by modified kinetics of thrombus initiation and/or development. We identified a putative role for BAMBI and LRRC32 in promotion and DCBLD2 and ESAM in inhibition of thrombus formation. We conclude that phenotypic analysis of MO-injected zebrafish is a fast and powerful method for initial screening of novel platelet proteins for function in thrombosis.
Funded by: Wellcome Trust: WT077037/Z/05/Z, WT082597/Z/07/Z
Somatic mutation databases as tools for molecular epidemiology and molecular pathology of cancer: proposed guidelines for improving data collection, distribution, and integration.
Group of Molecular Carcinogenesis and Biomarkers, International Agency for Research on Cancer, World Health Organization, Lyon, France. firstname.lastname@example.org
There are currently less than 40 locus-specific databases (LSDBs) and one large general database that curate data on somatic mutations in human cancer genes. These databases have different scope and use different annotation standards and database systems, resulting in duplicated efforts in data curation, and making it difficult for users to find clear and consistent information. As data related to somatic mutations are generated at an increasing pace it is urgent to create a framework for improving the collecting of this information and making it more accessible to clinicians, scientists, and epidemiologists to facilitate research on biomarkers. Here we propose a data flow for improving the connectivity between existing databases and we provide practical guidelines for data reporting, database contents, and annotation standards. These proposals are based on common standards recommended by the Human Genome Variation Society (HGVS) with additions related to specific requirements of somatic mutations in cancer. Indeed, somatic mutations may be used in molecular pathology and clinical studies to characterize tumor types, help treatment choice, predict response to treatment and patient outcome, or in epidemiological studies as markers for tumor etiology or exposure assessment. Thus, specific annotations are required to cover these diverse research topics. This initiative is meant to promote collaboration and discussion on these issues and the development of adequate resources that would avoid the loss of extremely valuable information generated by years of basic and clinical research.
Human mutation 2009;30;3;275-82
Genetic variation in LIN28B is associated with the timing of puberty.
Medical Research Council (MRC) Epidemiology Unit, Addenbrooke's Hospital, Cambridge, UK. email@example.com
The timing of puberty is highly variable. We carried out a genome-wide association study for age at menarche in 4,714 women and report an association in LIN28B on chromosome 6 (rs314276, minor allele frequency (MAF) = 0.33, P = 1.5 × 10(-8)). In independent replication studies in 16,373 women, each major allele was associated with 0.12 years earlier menarche (95% CI = 0.08-0.16; P = 2.8 × 10(-10); combined P = 3.6 × 10(-16)). This allele was also associated with earlier breast development in girls (P = 0.001; N = 4,271); earlier voice breaking (P = 0.006, N = 1,026) and more advanced pubic hair development in boys (P = 0.01; N = 4,588); a faster tempo of height growth in girls (P = 0.00008; N = 4,271) and boys (P = 0.03; N = 4,588); and shorter adult height in women (P = 3.6 × 10(-7); N = 17,274) and men (P = 0.006; N = 9,840) in keeping with earlier growth cessation. These studies identify variation in LIN28B, a potent and specific regulator of microRNA processing, as the first genetic determinant regulating the timing of human pubertal growth and development.
Funded by: Cancer Research UK; Medical Research Council: 73437, G0000934, G0401527, G0401527(74922), G0701863, G9815508, MC_U105630924, MC_U106179471, MC_U106179472, MC_U106179473, MC_U106188470, MC_U123092720, MC_U123092721, U.1061.00.001 (79471), U.1061.00.004(79472); Wellcome Trust: 068049, 068545/Z/02, 076467/Z/05/Z, 077011, 077016, 077016/Z/05/Z, 079996
Nature genetics 2009;41;6;729-33
Combined effects of three independent SNPs greatly increase the risk estimate for RA at 6q23.
arc-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester M13 9PT, UK. firstname.lastname@example.org
The most consistent finding derived from the WTCCC GWAS for rheumatoid arthritis (RA) was association to a SNP at 6q23. We performed a fine-mapping of the region in order to search the 6q23 region for additional disease variants. 3962 RA patients and 3531 healthy controls were included in the study. We found 18 SNPs associated with RA. The SNP showing the strongest association was rs6920220 [P = 2.6 x 10(-6), OR (95% CI) 1.22 (1.13-1.33)]. The next most strongly associated SNP was rs13207033 [P = 0.0001, OR (95% CI) 0.86 (0.8-0.93)] which was perfectly correlated with rs10499194, a SNP previously associated with RA in a US/European series. Additionally, we found a number of new potential RA markers, including rs5029937, located in the intron 2 of TNFAIP3. Of the 18 associated SNPs, three polymorphisms, rs6920220, rs13207033 and rs5029937, remained significant after conditional logistic regression analysis. The combination of the carriage of both risk alleles of rs6920220 and rs5029937 together with the absence of the protective allele of rs13207033 was strongly associated with RA when compared with carriage of none [OR of 1.86 (95% CI) (1.51-2.29)]. This equates to an effect size of 1.50 (95% CI 1.21-1.85) compared with controls and is higher than that obtained for any SNP individually. This is the first study to show that the confirmed loci from the GWA studies, that confer only a modest effect size, could harbour a significantly greater effect once the effect of additional risk variants are accounted for.
Funded by: Arthritis Research UK: 17552, 18475; Medical Research Council: G0000934, G0600329; Wellcome Trust: 061858, 068545/Z/02, 090532
Human molecular genetics 2009;18;14;2693-9
The discovery of genes implicated in myocardial infarction
Journal of Thrombosis and Haemostasis; 22nd Congress of the International-Society-on-Thrombosis-and-Haemostasis. 2009;7;305-7
Ethical data release in genome-wide association studies in developing countries.
Ethox Centre, University of Oxford, Oxford, United Kingdom. email@example.com
Funded by: Medical Research Council: G0600230, G0600718, G19/9; PHS HHS: 566; Wellcome Trust: 077383/Z/05/Z, 087285/Z/08/Z
PLoS medicine 2009;6;11;e1000143
A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.
Funded by: Wellcome Trust
PLoS genetics 2009;5;7;e1000569
Meta-analysis of genome-wide association data identifies two loci influencing age at menarche.
Institute of Biomedical and Clinical Science, Peninsula Medical School, Exeter, UK.
We conducted a meta-analysis of genome-wide association data to detect genes influencing age at menarche in 17,510 women. The strongest signal was at 9q31.2 (P = 1.7 × 10(-9)), where the nearest genes include TMEM38B, FKTN, FSD1L, TAL2 and ZNF462. The next best signal was near the LIN28B gene (rs7759938; P = 7.0 × 10(-9)), which also influences adult height. We provide the first evidence for common genetic variants influencing female sexual maturation.
Funded by: Intramural NIH HHS; NCRR NIH HHS: M01 RR 16500, M01 RR016500-02; NHLBI NIH HHS: N01 HC025195, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02 HL64278, N02-HL-6-4278, U01 HL072515, U01 HL072515-06, U01 HL72515; NIA NIH HHS: N.1-AG-1-1, N.1-AG-1-2111, N01 AG012100, N01-AG-12100, N01-AG-5-0002, R01 AR/AG 41398, R21 AG032598, R21 AG032598-02, R21AG032598, U19 AG023122, U19 AG023122-05; NIAMS NIH HHS: R01 AR041398, R01 AR041398-15; NIDDK NIH HHS: P30 DK072488, P30 DK072488-02; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164, 263 MD821336, 263 MD9164 13; Wellcome Trust
Nature genetics 2009;41;6;648-50
Agouti C57BL/6N embryonic stem cells for mouse genetic resources.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.
We report the characterization of a highly germline competent C57BL/6N mouse embryonic stem cell line, JM8. To simplify breeding schemes, the dominant agouti coat color gene was restored in JM8 cells by targeted repair of the C57BL/6 nonagouti mutation. These cells provide a robust foundation for large-scale mouse knockout programs that aim to provide a public resource of targeted mutations in the C57BL/6 genetic background.
Funded by: NHGRI NIH HHS: U01-HG004080; PHS HHS: U01-42430; Wellcome Trust: 077188, WT077187
Nature methods 2009;6;7;493-5
Preparation of bacteriophage lysates and pure DNA.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Preparation of pure bacteriophage DNA used to rely on using CsCl gradients to give high purity or methods that yielded DNA that was either of low recovery or subject to significant genomic contamination. Recently though, new methods have come along that allow the purification of DNA from plate lysates that are not only capable of high yield but also, for all intents and purposes, free of genomic contamination (i.e. no visible genomic contamination on restriction analysis or when used for bacteriophage sequencing). This protocol that form the basis of this short section can be used to prepare bacteriophage DNA from one or two 9 cm L-agar plates. For these preps, the use of agarose in the top agar is recommended to avoid any restriction inhibitors that may be present in some agar preparations.
Methods in molecular biology (Clifton, N.J.) 2009;502;3-9
Variants in MTNR1B influence fasting glucose levels.
 Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford OX3 7LJ, UK.  Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.  These authors contributed equally to this work.
To identify previously unknown genetic loci associated with fasting glucose concentrations, we examined the leading association signals in ten genome-wide association scans involving a total of 36,610 individuals of European descent. Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten studies. The strongest signal was observed at rs10830963, where each G allele (frequency 0.30 in HapMap CEU) was associated with an increase of 0.07 (95% CI = 0.06-0.08) mmol/l in fasting glucose levels (P = 3.2 x 10(-50)) and reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, P = 1.1 x 10(-15)). The same allele was associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05-1.12), per G allele P = 3.3 x 10(-7)) in a meta-analysis of 13 case-control studies totaling 18,236 cases and 64,453 controls. Our analyses also confirm previous associations of fasting glucose with variants at the G6PC2 (rs560887, P = 1.1 x 10(-57)) and GCK (rs4607517, P = 1.0 x 10(-25)) loci.
Funded by: Intramural NIH HHS; Medical Research Council: G0000649, G016121, G0500539, G0601261, G0701863, MC_U106179471, MC_U106188470; NCRR NIH HHS: RR-163736; NHGRI NIH HHS: HG-02651, R01 HG002651, R01 HG002651-05; NHLBI NIH HHS: HC-25195, HL-084729, HL-087679, N01 HC025195, N02-HL-6-4278, R01 HL087679-02, U01 HL084729, U01 HL084729-03; NIDA NIH HHS: DA-021519, U54 DA021519, U54 DA021519-04; NIDDK NIH HHS: DK-062370, DK-065978, DK-072193, DK-078616, DK-080140, DK069922, K23 DK065978, K23 DK065978-05, K24 DK080140, K24 DK080140-01, K24 DK080140-02, R01 DK029867, R01 DK062370, R01 DK062370-05, R01 DK069922, R01 DK069922-02, R01 DK072193, R01 DK072193-04, R01 DK078616, R01 DK078616-01A1; NIMH NIH HHS: MH059160, R01 MH059160, R01 MH059160-04; Wellcome Trust: 076113, 077011, 077016, 079557, 083948, 089061, 090532, GR069224, GR072960
Nature genetics 2009;41;1;77-81
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA. Pruitt@ncbi.nlm.nih.gov
Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.
Funded by: Intramural NIH HHS; NHGRI NIH HHS: 1U54HG004555-01, U54 HG004555; Wellcome Trust: 062023, 077198, WT062023, WT077198
Genome research 2009;19;7;1316-23
Improved protocols for the illumina genome analyzer sequencing system.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
In this unit, we describe a set of improvements we have made to the standard Illumina Genome Analyzer protocols to make the sequencing process more reliable in a high-throughput environment, reduce amplification bias, narrow the distribution of insert sizes, and reliably obtain high yields of data.
Funded by: Wellcome Trust: 098051, WT079643
Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 2009;Chapter 18;Unit 18.2
A genome-wide association study of testicular germ cell tumor.
Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, UK.
We conducted a genome-wide association study for testicular germ cell tumor (TGCT), genotyping 307,666 SNPs in 730 cases and 1,435 controls from the UK and replicating associations in a further 571 cases and 1,806 controls. We found strong evidence for susceptibility loci on chromosome 5 (per allele OR = 1.37 (95% CI = 1.19-1.58), P = 3 x 10(-13)), chromosome 6 (OR = 1.50 (95% CI = 1.28-1.75), P = 10(-13)) and chromosome 12 (OR = 2.55 (95% CI = 2.05-3.19), P = 10(-31)). KITLG, encoding the ligand for the receptor tyrosine kinase KIT, which has previously been implicated in the pathogenesis of TGCT and the biology of germ cells, may explain the association on chromosome 12.
Funded by: Cancer Research UK: 10118, 10589, A4994; Medical Research Council: G0000934, G0700491; Wellcome Trust: 068545/Z/02, 077012
Nature genetics 2009;41;7;807-10
Replication and extension of genome-wide association study results for obesity in 4923 adults from northern Sweden.
Department of Public Health and Clinical Medicine, Umeå University Hospital, Umeå, Sweden.
Recent genome-wide association studies (GWAS) have identified multiple risk loci for common obesity (FTO, MC4R, TMEM18, GNPDA2, SH2B1, KCTD15, MTCH2, NEGR1 and PCSK1). Here we extend those studies by examining associations with adiposity and type 2 diabetes in Swedish adults. The nine single nucleotide polymorphisms (SNPs) were genotyped in 3885 non-diabetic and 1038 diabetic individuals with available measures of height, weight and body mass index (BMI). Adipose mass and distribution were objectively assessed using dual-energy X-ray absorptiometry in a sub-group of non-diabetics (n = 2206). In models with adipose mass traits, BMI or obesity as outcomes, the most strongly associated SNP was FTO rs1121980 (P < 0.001). Five other SNPs (SH2B1 rs7498665, MTCH2 rs4752856, MC4R rs17782313, NEGR1 rs2815752 and GNPDA2 rs10938397) were significantly associated with obesity. To summarize the overall genetic burden, a weighted risk score comprising a subset of SNPs was constructed; those in the top quintile of the score were heavier (+2.6 kg) and had more total (+2.4 kg), gynoid (+191 g) and abdominal (+136 g) adipose tissue than those in the lowest quintile (all P < 0.001). The genetic burden score significantly increased diabetes risk, with those in the highest quintile (n = 193/594 cases/controls) being at 1.55-fold (95% CI 1.21-1.99; P < 0.0001) greater risk of type 2 diabetes than those in the lowest quintile (n = 130/655 cases/controls). In summary, we have statistically replicated six of the previously associated obese-risk loci and our results suggest that the weight-inducing effects of these variants are explained largely by increased adipose accumulation.
Funded by: Wellcome Trust: 090532
Human molecular genetics 2009;18;8;1489-96
Comparative genomic analysis of ten Streptococcus pneumoniae temperate bacteriophages.
Division of Infection and Immunity, Glasgow Biomedical Research Centre, University of Glasgow, Glasgow, United Kingdom.
Streptococcus pneumoniae is an important human pathogen that often carries temperate bacteriophages. As part of a program to characterize the genetic makeup of prophages associated with clinical strains and to assess the potential roles that they play in the biology and pathogenesis in their host, we performed comparative genomic analysis of 10 temperate pneumococcal phages. All of the genomes are organized into five major gene clusters: lysogeny, replication, packaging, morphogenesis, and lysis clusters. All of the phage particles observed showed a Siphoviridae morphology. The only genes that are well conserved in all the genomes studied are those involved in the integration and the lysis of the host in addition to two genes, of unknown function, within the replication module. We observed that a high percentage of the open reading frames contained no similarities to any sequences catalogued in public databases; however, genes that were homologous to known phage virulence genes, including the pblB gene of Streptococcus mitis and the vapE gene of Dichelobacter nodosus, were also identified. Interestingly, bioinformatic tools showed the presence of a toxin-antitoxin system in the phage phiSpn_6, and this represents the first time that an addition system in a pneumophage has been identified. Collectively, the temperate pneumophages contain a diverse set of genes with various levels of similarity among them.
Funded by: NIDCD NIH HHS: DC02148, DC04173, DC05659
Journal of bacteriology 2009;191;15;4854-62
Partial lipodystrophy and insulin resistant diabetes in a patient with a homozygous nonsense mutation in CIDEC.
Department of Endocrinology, Hospital Infantil Universitario Niño Jesús, Madrid, Spain.
Lipodystrophic syndromes are characterized by adipose tissue deficiency. Although rare, they are of considerable interest as they, like obesity, typically lead to ectopic lipid accumulation, dyslipidaemia and insulin resistant diabetes. In this paper we describe a female patient with partial lipodystrophy (affecting limb, femorogluteal and subcutaneous abdominal fat), white adipocytes with multiloculated lipid droplets and insulin-resistant diabetes, who was found to be homozygous for a premature truncation mutation in the lipid droplet protein cell death-inducing Dffa-like effector C (CIDEC) (E186X). The truncation disrupts the highly conserved CIDE-C domain and the mutant protein is mistargeted and fails to increase the lipid droplet size in transfected cells. In mice, Cidec deficiency also reduces fat mass and induces the formation of white adipocytes with multilocular lipid droplets, but in contrast to our patient, Cidec null mice are protected against diet-induced obesity and insulin resistance. In addition to describing a novel autosomal recessive form of familial partial lipodystrophy, these observations also suggest that CIDEC is required for unilocular lipid droplet formation and optimal energy storage in human fat.
Funded by: Medical Research Council: G0600414; NIDDK NIH HHS: DK30898, DK32520, DK54387, DK60837, P30 DK032520, P30 DK032520-25, P30 DK032520-26, R01 DK054387, R37 DK030898, R37 DK030898-23; Wellcome Trust: 077016, 077016/Z/05/Z
EMBO molecular medicine 2009;1;5;280-7
The versatility and adaptation of bacteria from the genus Stenotrophomonas.
BIOMERIT Research Centre, Department of Microbiology, BioSciences Institute, University College Cork, Cork, Ireland. firstname.lastname@example.org
The genus Stenotrophomonas comprises at least eight species. These bacteria are found throughout the environment, particularly in close association with plants. Strains of the most predominant species, Stenotrophomonas maltophilia, have an extraordinary range of activities that include beneficial effects for plant growth and health, the breakdown of natural and man-made pollutants that are central to bioremediation and phytoremediation strategies and the production of biomolecules of economic value, as well as detrimental effects, such as multidrug resistance, in human pathogenic strains. Here, we discuss the versatility of the bacteria in the genus Stenotrophomonas and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to our understanding of the adaptation of this genus to various niches.
Funded by: Austrian Science Fund FWF: P 20542-B16; Wellcome Trust
Nature reviews. Microbiology 2009;7;7;514-25
Presence of interstereocilial links in waltzer mutants suggests Cdh23 is not essential for tip link formation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.
Cadherin23 has been proposed to form the upper part of the tip link, an interstereocilial link believed to control opening of transducer channels of sensory hair cells. However, we detect tip link-like links in mouse mutants with null alleles of Cdh23, suggesting the presence of other components that permit formation of a link between the tip of one stereocilium and the side of the adjacent taller stereocilium.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust
Genome-wide association analysis of metabolic traits in a birth cohort from a founder population.
Department of Human Genetics and Los Angeles, Los Angeles, California 90095, USA.
Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
Funded by: NCRR NIH HHS: U54 RR020278; NIGMS NIH HHS: GM053275-14; NIMH NIH HHS: MH083268; Wellcome Trust: 089061
Nature genetics 2009;41;1;35-46
The Schistosoma japonicum genome reveals features of host-parasite interplay.
Schistosoma japonicum is a parasitic flatworm that causes human schistosomiasis, which is a significant cause of morbidity in China and the Philippines. Here we present a draft genomic sequence for the worm. The genome provides a global insight into the molecular architecture and host interaction of this complex metazoan pathogen, revealing that it can exploit host nutrients, neuroendocrine hormones and signalling pathways for growth, development and maturation. Having a complex nervous system and a well-developed sensory system, S. japonicum can accept stimulation of the corresponding ligands as a physiological response to different environments, such as fresh water or the tissues of its intermediate and mammalian hosts. Numerous proteases, including cercarial elastase, are implicated in mammalian skin penetration and haemoglobin degradation. The genomic information will serve as a valuable platform to facilitate development of new interventions for schistosomiasis control.
Funded by: NIAID NIH HHS: AI39461; Wellcome Trust: 085775
Genome flexibility in Neisseria meningitidis.
Institut für Hygiene und Mikrobiologie, der Universität Würzburg, Josef-Schneider-Strasse 2, Bau E1, Würzburg 97877, Germany. email@example.com
Neisseria meningitidis usually lives as a commensal bacterium in the upper airways of humans. However, occasionally some strains can also cause life-threatening diseases such as sepsis and bacterial meningitis. Comparative genomics demonstrates that only very subtle genetic differences between carriage and disease strains might be responsible for the observed virulence differences and that N. meningitidis is, evolutionarily, a very recent species. Comparative genome sequencing also revealed a panoply of genetic mechanisms underlying its enormous genomic flexibility which also might affect the virulence of particular strains. From these studies, N. meningitidis emerges as a paradigm for organisms that use genome variability as an adaptation to changing and thus challenging environments.
Vaccine 2009;27 Suppl 2;B103-11
Genome watch: breaking the ICE.
Nature reviews. Microbiology 2009;7;5;328-9
Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain.
Molecular Microbiology Group, University Medical School, Southampton General Hospital, Southampton, SO16 6YD, UK. firstname.lastname@example.org
Background: Chlamydia trachomatis is the most common cause of sexually transmitted infections globally and the leading cause of preventable blindness in the developing world. There are two biovariants of C. trachomatis: 'trachoma', causing ocular and genital tract infections, and the invasive 'lymphogranuloma venereum' strains. Recently, a new variant of the genital tract C. trachomatis emerged in Sweden. This variant escaped routine diagnostic tests because it carries a plasmid with a deletion. Failure to detect this strain has meant it has spread rapidly across the country provoking a worldwide alert. In addition to being a key diagnostic target, the plasmid has been linked to chlamydial virulence. Analysis of chlamydial plasmids and their cognate chromosomes was undertaken to provide insights into the evolutionary relationship between chromosome and plasmid. This is essential knowledge if the plasmid is to be continued to be relied on as a key diagnostic marker, and for an understanding of the evolution of Chlamydia trachomatis.
Results: The genomes of two new C. trachomatis strains were sequenced, together with plasmids from six C. trachomatis isolates, including the new variant strain from Sweden. The plasmid from the new Swedish variant has a 377 bp deletion in the first predicted coding sequence, abolishing the site used for PCR detection, resulting in negative diagnosis. In addition, the variant plasmid has a 44 bp duplication downstream of the deletion. The region containing the second predicted coding sequence is the most highly conserved region of the plasmids investigated. Phylogenetic analysis of the plasmids and chromosomes are fully congruent. Moreover this analysis also shows that ocular and genital strains diverged from a common C. trachomatis progenitor.
Conclusion: The evolutionary pathways of the chlamydial genome and plasmid imply that inheritance of the plasmid is tightly linked with its cognate chromosome. These data suggest that the plasmid is not a highly mobile genetic element and does not transfer readily between isolates. Comparative analysis of the plasmid sequences has revealed the most conserved regions that should be used to design future plasmid based nucleic acid amplification tests, to avoid diagnostic failures.
Funded by: Medical Research Council: G0601640; Wellcome Trust
BMC genomics 2009;10;239
Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens.
Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Centre for Adaptation Genetics and Drug Resistance, Boston, MA 02111, USA. email@example.com
Background: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species.
Results: Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome.
Conclusions: P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.
Funded by: Biotechnology and Biological Sciences Research Council: 104/P16729, P15257; Wellcome Trust
Genome biology 2009;10;5;R51
Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size.
Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Recent genome-wide (GW) scans have identified several independent loci affecting human stature, but their contribution through the different skeletal components of height is still poorly understood. We carried out a genome-wide scan in 12,611 participants, followed by replication in an additional 7,187 individuals, and identified 17 genomic regions with GW-significant association with height. Of these, two are entirely novel (rs11809207 in CATSPER4, combined P-value = 6.1x10(-8) and rs910316 in TMED10, P-value = 1.4x10(-7)) and two had previously been described with weak statistical support (rs10472828 in NPR3, P-value = 3x10(-7) and rs849141 in JAZF1, P-value = 3.2x10(-11)). One locus (rs1182188 at GNA12) identifies the first height eQTL. We also assessed the contribution of height loci to the upper- (trunk) and lower-body (hip axis and femur) skeletal components of height. We find evidence for several loci associated with trunk length (including rs6570507 in GPR126, P-value = 4x10(-5) and rs6817306 in LCORL, P-value = 4x10(-4)), hip axis length (including rs6830062 at LCORL, P-value = 4.8x10(-4) and rs4911494 at UQCC, P-value = 1.9x10(-4)), and femur length (including rs710841 at PRKG2, P-value = 2.4x10(-5) and rs10946808 at HIST1H1D, P-value = 6.4x10(-6)). Finally, we used conditional analyses to explore a possible differential contribution of the height loci to these different skeletal size measurements. In addition to validating four novel loci controlling adult stature, our study represents the first effort to assess the contribution of genetic loci to three skeletal components of height. Further statistical tests in larger numbers of individuals will be required to verify if the height loci affect height preferentially through these subcomponents of height.
Funded by: Medical Research Council: G0000934, G0701863, MC_QA137934, MC_U106188470; Wellcome Trust: 068545/Z/02
PLoS genetics 2009;5;4;e1000445
Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants?
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Aims/hypothesis: According to the thrifty genotype hypothesis, the high prevalence of type 2 diabetes and obesity is a consequence of genetic variants that have undergone positive selection during historical periods of erratic food supply. The recent expansion in the number of validated type 2 diabetes- and obesity-susceptibility loci, coupled with access to empirical data, enables us to look for evidence in support (or otherwise) of the thrifty genotype hypothesis using proven loci.
Methods: We employed a range of tests to obtain complementary views of the evidence for selection: we determined whether the risk allele at associated 'index' single-nucleotide polymorphisms is derived or ancestral, calculated the integrated haplotype score (iHS) and assessed the population differentiation statistic fixation index (F (ST)) for 17 type 2 diabetes and 13 obesity loci.
Results: We found no evidence for significant differences for the derived/ancestral allele test. None of the studied loci showed strong evidence for selection based on the iHS score. We find a high F (ST) for rs7901695 at TCF7L2, the largest type 2 diabetes effect size found to date.
Conclusions/interpretation: Our results provide some evidence for selection at specific loci, but there are no consistent patterns of selection that provide conclusive confirmation of the thrifty genotype hypothesis. Discovery of more signals and more causal variants for type 2 diabetes and obesity is likely to allow more detailed examination of these issues.
Funded by: Medical Research Council: G0601261; Wellcome Trust: 077016, 079557, 088885, WT077016/Z/05/Z, WT088885/Z/09/Z
Genomic and genic deletions of the FOX gene cluster on 16q24.1 and inactivating mutations of FOXF1 cause alveolar capillary dysplasia and other malformations.
Dept of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. firstname.lastname@example.org
Alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV) is a rare, neonatally lethal developmental disorder of the lung with defining histologic abnormalities typically associated with multiple congenital anomalies (MCA). Using array CGH analysis, we have identified six overlapping microdeletions encompassing the FOX transcription factor gene cluster in chromosome 16q24.1q24.2 in patients with ACD/MPV and MCA. Subsequently, we have identified four different heterozygous mutations (frameshift, nonsense, and no-stop) in the candidate FOXF1 gene in unrelated patients with sporadic ACD/MPV and MCA. Custom-designed, high-resolution microarray analysis of additional ACD/MPV samples revealed one microdeletion harboring FOXF1 and two distinct microdeletions upstream of FOXF1, implicating a position effect. DNA sequence analysis revealed that in six of nine deletions, both breakpoints occurred in the portions of Alu elements showing eight to 43 base pairs of perfect microhomology, suggesting replication error Microhomology-Mediated Break-Induced Replication (MMBIR)/Fork Stalling and Template Switching (FoSTeS) as a mechanism of their formation. In contrast to the association of point mutations in FOXF1 with bowel malrotation, microdeletions of FOXF1 were associated with hypoplastic left heart syndrome and gastrointestinal atresias, probably due to haploinsufficiency for the neighboring FOXC2 and FOXL1 genes. These differences reveal the phenotypic consequences of gene alterations in cis.
Funded by: Wellcome Trust
American journal of human genetics 2009;84;6;780-91
Common variants conferring risk of schizophrenia.
deCODE genetics, Sturlugata 8, IS-101 Reykjavik, Iceland.
Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.
Funded by: Department of Health: PDA/02/06/016; NHLBI NIH HHS: 1R01HL087679-01; NIMH NIH HHS: R01 MH078075; Wellcome Trust: 089061
Loci at chromosomes 13, 19 and 20 influence age at natural menopause.
Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands.
We conducted a genome-wide association study for age at natural menopause in 2,979 European women and identified six SNPs in three loci associated with age at natural menopause: chromosome 19q13.4 (rs1172822; -0.4 year per T allele (39%); P = 6.3 × 10(-11)), chromosome 20p12.3 (rs236114; +0.5 year per A allele (21%); P = 9.7 × 10(-11)) and chromosome 13q34 (rs7333181; +0.5 year per A allele (12%); P = 2.5 × 10(-8)). These common genetic variants regulate timing of ovarian aging, an important risk factor for breast cancer, osteoporosis and cardiovascular disease.
Funded by: Wellcome Trust: 077011
Nature genetics 2009;41;6;645-7
The cancer genome.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. email@example.com
All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed.
Funded by: Wellcome Trust: 077012, 088340
Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK. firstname.lastname@example.org
Genome sequences are essential tools for comparative and mutational analyses. Here we present the short read sequence of mouse chromosome 17 from the Mus musculus domesticus derived strain A/J, and the Mus musculus castaneus derived strain CAST/Ei. We describe approaches for the accurate identification of nucleotide and structural variation in the genomes of vertebrate experimental organisms, and show how these techniques can be applied to help prioritize candidate genes within quantitative trait loci.
Funded by: Cancer Research UK; Medical Research Council: G0800024; NIAAA NIH HHS: P20 AA017837; Wellcome Trust
Genome biology 2009;10;10;R112
Human Molecular Genetics
3rd Edition;Pearson Educational;ISBN978-0132051576 2009
A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Large-scale systematic resequencing has been proposed as the key future strategy for the discovery of rare, disease-causing sequence variants across the spectrum of human complex disease. We have sequenced the coding exons of the X chromosome in 208 families with X-linked mental retardation (XLMR), the largest direct screen for constitutional disease-causing mutations thus far reported. The screen has discovered nine genes implicated in XLMR, including SYP, ZNF711 and CASK reported here, confirming the power of this strategy. The study has, however, also highlighted issues confronting whole-genome sequencing screens, including the observation that loss of function of 1% or more of X-chromosome genes is compatible with apparently normal existence.
Funded by: Cancer Research UK: 10118; NICHD NIH HHS: HD26202; Wellcome Trust: 077012
Nature genetics 2009;41;5;535-43
Microarray-based cytogenetic profiling reveals recurrent and subtype-associated genomic copy number aberrations in feline sarcomas.
Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA. email@example.com
Injection-site-associated sarcomas (ISAS), commonly arising at the site of routine vaccine administration, afflict as many as 22,000 domestic cats annually in the USA. These tumors are typically more aggressive and prone to recurrence than spontaneous sarcomas (non-ISAS), generally receiving a poorer long-term prognosis and warranting a more aggressive therapeutic approach. Although certain clinical and histological factors are highly suggestive of ISAS, timely diagnosis and optimal clinical management may be hindered by the absence of definitive markers that can distinguish between tumors with underlying injection-related etiology and their spontaneous counterpart. Specific nonrandom chromosome copy number aberrations (CNAs) have been associated with the clinical behavior of a vast spectrum of human tumors, providing an extensive resource of potential diagnostic and prognostic biomarkers. Although similar principles are now being applied with great success in other species, their relevance to feline molecular oncology has not yet been investigated in any detail. We report the construction of a genomic microarray platform for detection of recurrent CNAs in feline tumors through cytogenetic assignment of 210 large-insert DNA clones selected at intervals of approximately 15 Mb from the feline genome sequence assembly. Microarray-based profiling of 19 ISAS and 27 non-ISAS cases identified an extensive range of genomic imbalances that were highly recurrent throughout the combined panel of 46 sarcomas. Deletions of two specific regions were significantly associated with the non-ISAS phenotype. Further characterization of these regions may ultimately permit molecular distinction between ISAS and non-ISAS, as a tool for predicting tumor behavior and prognosis, as well as refining means for therapeutic intervention.
Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;8;987-1000
Influence of genetic background on tumor karyotypes: evidence for breed-associated cytogenetic aberrations in canine appendicular osteosarcoma.
Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough Street, Raleigh, NC 27606, USA.
Recurrent chromosomal aberrations in solid tumors can reveal the genetic pathways involved in the evolution of a malignancy and in some cases predict biological behavior. However, the role of individual genetic backgrounds in shaping karyotypes of sporadic tumors is unknown. The genetic structure of purebred dog breeds, coupled with their susceptibility to spontaneous cancers, provides a robust model with which to address this question. We tested the hypothesis that there is an association between breed and the distribution of genomic copy number imbalances in naturally occurring canine tumors through assessment of a cohort of Golden Retrievers and Rottweilers diagnosed with spontaneous appendicular osteosarcoma. Our findings reveal significant correlations between breed and tumor karyotypes that are independent of gender, age at diagnosis, and histological classification. These data indicate for the first time that individual genetic backgrounds, as defined by breed in dogs, influence tumor karyotypes in a cancer with extensive genomic instability.
Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2009;17;3;365-77
Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease.
Institut National de la Santé Et de la Recherche Médicale (INSERM) Unité Mixte de Recherche (UMR_S) 525, Université Pierre et Marie Curie (UPMC). Paris 06, Paris 75013, France. firstname.lastname@example.org
We identify the SLC22A3-LPAL2-LPA gene cluster as a strong susceptibility locus for coronary artery disease (CAD) through a genome-wide haplotype association (GWHA) study. This locus was not identified from previous genome-wide association (GWA) studies focused on univariate analyses of SNPs. The proposed approach may have wide utility for analyzing GWA data for other complex traits.
Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust
Nature genetics 2009;41;3;283-5
Next-generation sequencing of vertebrate experimental organisms.
Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.
Next-generation sequencing technologies are revolutionizing biology by allowing for genome-wide transcription factor binding-site profiling, transcriptome sequencing, and more recently, whole-genome resequencing. While it is currently not possible to generate complete de novo assemblies of higher-vertebrate genomes using next-generation sequencing, improvements in sequence read lengths and throughput, coupled with new assembly algorithms for large data sets, will soon make this a reality. These developments will in turn spawn a revolution in how genomic data are used to understand genetics and how model organisms are used for disease gene discovery. This review provides an overview of the current next-generation sequencing platforms and the newest computational tools for the analysis of next-generation sequencing data. We also describe how next-generation sequencing may be applied in the context of vertebrate model organism genetics.
Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust
Mammalian genome : official journal of the International Mammalian Genome Society 2009;20;6;327-38
A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites.
Division of Molecular Genetics, Cancer Genomics Centre, Netherlands Cancer Institute, Plesmanlaan, Amsterdam, The Netherlands.
Insertional mutagens such as viruses and transposons are a useful tool for performing forward genetic screens in mice to discover cancer genes. These screens are most effective when performed using hundreds of mice; however, until recently, the cost-effective isolation and sequencing of insertion sites has been a major limitation to performing screens on this scale. Here we present a method for the high-throughput isolation of insertion sites using a highly efficient splinkerette-PCR method coupled with capillary or 454 sequencing. This protocol includes a description of the procedure for DNA isolation, DNA digestion, linker or splinkerette ligation, primary and secondary PCR amplification, and sequencing. This method, which takes about 1 week to perform, has allowed us to isolate hundreds of thousands of insertion sites from mouse tumors and, unlike other methods, has been specifically optimized for the murine leukemia virus (MuLV), and can easily be performed in a 96-well plate format for the efficient multiplex isolation of insertion sites.
Funded by: Cancer Research UK: A6542; Wellcome Trust: 098051
Nature protocols 2009;4;5;789-98
Megaoesophagus in Rassf1a-null mice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
Megaoesophagus, or oesophageal achalasia, is a neuromuscular disorder characterized by an absence of peristalsis and flaccid dilatation of the oesophagus, resulting in the retention of ingesta in the dilated segment. The aetiology and pathogenesis of idiopathic (or primary) megaoesophagus are still poorly understood and very little is known about the genetic causes of megaoesophagus in humans. Attempts to develop animal models of this condition have been largely unsuccessful and although the ICRC/HiCri strain of mice spontaneously develop megaoesophagus, the underlying genetic cause remains unknown. In this report, we show that aged Rassf1a-null mice have an enhanced susceptibility to megaoesophagus compared with wild-type littermates (approximately 20%vs. approximately 2% incidence respectively; P = 0.01). Histological examination of the dilated oesophaguses shows a reduction in the numbers of nerve cells (both ganglia and nerve fibres) in the myenteric plexus of the dilated mid and lower oesophagus that was confirmed by S100 immunohistochemistry. There was also a chronic inflammatory infiltrate and subsequent fibrosis of the myenteric plexus and the muscle layers. These appearances closely mimic the gross and histopathological findings in human cases of megaoesophagus/achalasia, thus demonstrating that this is a representative mouse model of the disease. Thus, we have identified a genetic cause of the development of megaoesophagus/achalasia that could be screened for in patients, and may eventually facilitate the development of therapies that could prevent further progression of the disease once it is diagnosed at an early stage.
Funded by: Biotechnology and Biological Sciences Research Council: BB/C515412/1; Cancer Research UK; Wellcome Trust
International journal of experimental pathology 2009;90;2;101-8
Somatic mutations of the histone H3K27 demethylase gene UTX in human cancer.
Wellcome Trust Sanger Institute, Hinxton, UK.
Somatically acquired epigenetic changes are present in many cancers. Epigenetic regulation is maintained via post-translational modifications of core histones. Here, we describe inactivating somatic mutations in the histone lysine demethylase gene UTX, pointing to histone H3 lysine methylation deregulation in multiple tumor types. UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes. These data identify UTX as a new human cancer gene.
Funded by: Wellcome Trust: 077012, 088340
Nature genetics 2009;41;5;521-3
Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms.
Department of Forensic Molecular Biology, Erasmus University Medical Center Rotterdam, 3000 CA Rotterdam, The Netherlands.
We analyzed 67 short tandem repeat polymorphisms from the non-recombining part of the Y-chromosome (Y-STRs), including 49 rarely studied simple single-copy (ss)Y-STRs and 18 widely used Y-STRs, in 590 males from 51 populations belonging to 8 worldwide regions (HGDP-CEPH panel). Although autosomal DNA profiling provided no evidence for close relationship, we found 18 Y-STR haplotypes (defined by 67 Y-STRs) that were shared by two to five men in 13 worldwide populations, revealing high and widespread levels of cryptic male relatedness. Maximal (95.9%) haplotype resolution was achieved with the best 25 out of 67 Y-STRs in the global dataset, and with the best 3-16 markers in regional datasets (89.6-100% resolution). From the 49 rarely studied ssY-STRs, the 25 most informative markers were sufficient to reach the highest possible male lineage differentiation in the global (92.2% resolution), and 3-15 markers in the regional datasets (85.4-100%). Considerably lower haplotype resolutions were obtained with the three commonly used Y-STR sets (Minimal Haplotype, PowerPlex Y, and AmpFlSTR Yfiler. Six ssY-STRs (DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643) were most informative to supplement the existing Y-STR kits for increasing haplotype resolution, or - together with additional ssY-STRs - as a new set for maximizing male lineage differentiation. Mutation rates of the 49 ssY-STRs were estimated from 403 meiotic transfers in deep-rooted pedigrees, and ranged from approximately 4.8 x 10(-4) for 31 ssY-STRs with no mutations observed to 1.3 x 10(-2) and 1.5 x 10(-2) for DYS570 and DYS576, respectively, the latter representing the highest mutation rates reported for human Y-STRs so far. Our findings thus demonstrate that ssY-STRs are useful for maximizing global and regional resolution of male lineages, either as a new set, or when added to commonly used Y-STR sets, and support their application to forensic, genealogical and anthropological studies.
Funded by: Wellcome Trust: 077009
Forensic science international. Genetics 2009;3;4;205-13
Milk and two oligosaccharides.
Nature reviews. Microbiology 2009;7;7;483
Single domain antibodies against the collagen signalling receptor glycoprotein VI are inhibitors of collagen induced thrombus formation.
Domantis Ltd., 315 Cambridge Science Park, Cambridge, UK. Adam.Walker@Domantis.com
Human Domain Antibodies (dAbs) that bind to and inhibit the function of platelet glycoprotein VI (GPVI) have been isolated from phage display libraries and their efficacy demonstrated using in vitro models of platelet activation. Here we describe the properties of one such antibody, BLO8-1, which has been shown to specifically inhibit the binding of recombinant human GPVI to cross-linked collagen related peptide (CRP-XL) in vitro. BLO8-1 specifically binds to the platelet cell surface and prevents CRP-XL induced platelet aggregation in platelet-rich plasma, as well as inhibiting thrombus formation in whole blood under arterial shear conditions. Using a series of mutant GPVI molecules, BLO8-1 was shown to recognize an epitope within the collagen binding domain of GPVI, therefore the anti-thrombotic effect of this dAb is predicted to be due to direct blocking of the collagen-GPVI interaction. These data, together with the desirable properties of Domain Antibodies, show that dAbs could potentially be used to generate novel biopharmaceuticals with anti-thrombotic properties.
Funded by: British Heart Foundation: RG/09/003/27122; Medical Research Council: G0500707
CLIP: construction of cDNA libraries for high-throughput sequencing from RNAs cross-linked to proteins in vivo.
MRC-Laboratory of Molecular Biology, Hills Road, Cambridge CB20QH, UK.
UV cross-linking and immunoprecipitation assay (CLIP) can identify direct interaction sites between RNA-binding proteins and RNAs in vivo, and has been used to study several proteins in tissues and cell cultures. The main challenge of the method is to specifically amplify the low amount of isolated RNA. The current protocol is optimised for efficient RNA purification and ligation of barcoded RNA adapters. High-throughput sequencing of the multiplexed cDNA library allows for a comprehensive coverage of the target sequences.
Funded by: Medical Research Council: MC_U105185858; Wellcome Trust: 089701
Methods (San Diego, Calif.) 2009;48;3;287-93
Comparative genomics of the emerging human pathogen Photorhabdus asymbiotica with the insect pathogen Photorhabdus luminescens.
School of Biosciences, University of Exeter in Cornwall, Penryn TR10 9EZ, UK. P.A.Wilkinson@exeter.ac.uk
Background: The Gram-negative bacterium Photorhabdus asymbiotica (Pa) has been recovered from human infections in both North America and Australia. Recently, Pa has been shown to have a nematode vector that can also infect insects, like its sister species the insect pathogen P. luminescens (Pl). To understand the relationship between pathogenicity to insects and humans in Photorhabdus we have sequenced the complete genome of Pa strain ATCC43949 from North America. This strain (formerly referred to as Xenorhabdus luminescens strain 2) was isolated in 1977 from the blood of an 80 year old female patient with endocarditis, in Maryland, USA. Here we compare the complete genome of Pa ATCC43949 with that of the previously sequenced insect pathogen P. luminescens strain TT01 which was isolated from its entomopathogenic nematode vector collected from soil in Trinidad and Tobago.
Results: We found that the human pathogen Pa had a smaller genome (5,064,808 bp) than that of the insect pathogen Pl (5,688,987 bp) but that each pathogen carries approximately one megabase of DNA that is unique to each strain. The reduced size of the Pa genome is associated with a smaller diversity in insecticidal genes such as those encoding the Toxin complexes (Tc's), Makes caterpillars floppy (Mcf) toxins and the Photorhabdus Virulence Cassettes (PVCs). The Pa genome, however, also shows the addition of a plasmid related to pMT1 from Yersinia pestis and several novel pathogenicity islands including a novel Type Three Secretion System (TTSS) encoding island. Together these data suggest that Pa may show virulence against man via the acquisition of the pMT1-like plasmid and specific effectors, such as SopB, that promote its persistence inside human macrophages. Interestingly the loss of insecticidal genes in Pa is not reflected by a loss of pathogenicity towards insects.
Conclusion: Our results suggest that North American isolates of Pa have acquired virulence against man via the acquisition of a plasmid and specific virulence factors with similarity to those shown to play roles in pathogenicity against humans in other bacteria.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E021328/1
BMC genomics 2009;10;302
Signal initiation in biological systems: the properties and detection of transient extracellular protein interactions.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. firstname.lastname@example.org
Individual cells within biological systems frequently coordinate their functions through signals initiated by specific extracellular protein interactions involving receptors that bridge the cellular membrane. Due to their biochemical nature, these membrane-embedded receptor proteins are difficult to manipulate and their interactions are characterised by very weak binding strengths that cannot be detected using popular high throughput assays. This review will provide a general outline of the biochemical attributes of receptor proteins focussing in particular on the biophysical properties of their transient interactions. Methods that are able to detect these weak extracellular binding events and especially those that can be used for identifying novel interactions will be compared. Finally, I discuss the feasibility of constructing a complete and accurate extracellular protein interaction map, and the methods that are likely to be useful in achieving this goal.
Molecular bioSystems 2009;5;12;1405-12
CARM1 is required in embryonic stem cells to maintain pluripotency and resist differentiation.
Wellcome Trust and Cancer Research UK Gurdon Institute, Cambridge, United Kingdom.
Histone H3 methylation at R17 and R26 recently emerged as a novel epigenetic mechanism regulating pluripotency in mouse embryos. Blastomeres of four-cell embryos with high H3 methylation at these sites show unrestricted potential, whereas those with lower levels cannot support development when aggregated in chimeras of like cells. Increasing histone H3 methylation, through expression of coactivator-associated-protein-arginine-methyltransferase 1 (CARM1) in embryos, elevates expression of key pluripotency genes and directs cells to the pluripotent inner cell mass. We demonstrate CARM1 is also required for the self-renewal and pluripotency of embryonic stem (ES) cells. In ES cells, CARM1 depletion downregulates pluripotency genes leading to their differentiation. CARM1 associates with Oct4/Pou5f1 and Sox2 promoters that display detectable levels of R17/26 histone H3 methylation. In CARM1 overexpressing ES cells, histone H3 arginine methylation is also at the Nanog promoter to which CARM1 now associates. Such cells express Nanog at elevated levels and delay their response to differentiation signals. Thus, like in four-cell embryo blastomeres, histone H3 arginine methylation by CARM1 in ES cells allows epigenetic modulation of pluripotency.
Funded by: Medical Research Council: G0300723, G0800784; Wellcome Trust: 064421, 079643
Stem cells (Dayton, Ohio) 2009;27;11;2637-45
Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree.
The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1SA, UK. email@example.com
Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. In the past, estimates of mutation rates have generally been inferred from phenotypic observations or comparisons of homologous sequences among closely related species. Here, we apply new sequencing technology to measure directly one mutation rate, that of base substitutions on the human Y chromosome. The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 11x or 20x, respectively. Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in approximately 10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 x 10(-8) mutations/nucleotide/generation (95% CI: 8.9 x 10(-9)-7.0 x 10(-8)), consistent with estimates of 2.3 x 10(-8)-6.3 x 10(-8) mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons depending on the generation and split times assumed.
Funded by: Wellcome Trust
Current biology : CB 2009;19;17;1453-7
Generation of Paint Probes by Flow-Sorted and Mocrodissected Chromosomes
Fluorescence In Situ Hybridization (FISH) - Application Guide. 2009;35-52
Generation of transgene-free induced pluripotent mouse stem cells by the piggyBac transposon.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Induced pluripotent stem cells (iPSCs) have been generated from somatic cells by transgenic expression of Oct4 (Pou5f1), Sox2, Klf4 and Myc. A major difficulty in the application of this technology for regenerative medicine, however, is the delivery of reprogramming factors. Whereas retroviral transduction increases the risk of tumorigenicity, transient expression methods have considerably lower reprogramming efficiencies. Here we describe an efficient piggyBac transposon-based approach to generate integration-free iPSCs. Transposons carrying 2A peptide-linked reprogramming factors induced reprogramming of mouse embryonic fibroblasts with equivalent efficiencies to retroviral transduction. We removed transposons from these primary iPSCs by re-expressing transposase. Transgene-free iPSCs could be identified by negative selection. piggyBac excised without a footprint, leaving the iPSC genome without any genetic alteration. iPSCs fulfilled all criteria of pluripotency, such as pluripotency gene expression, teratoma formation and contribution to chimeras. piggyBac transposon-based reprogramming may be used to generate therapeutically applicable iPSCs.
Funded by: Wellcome Trust: 077187, WT077187
Nature methods 2009;6;5;363-9