Sanger Institute - Publications 2007
Number of papers published in 2007: 131
Predicted functions and linkage specificities of the products of the Streptococcus pneumoniae capsular biosynthetic loci.
Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary's Hospital, Norfolk Place, London W2 1PG, United Kingdom.
The sequences of the capsular biosynthetic (cps) loci of 90 serotypes of Streptococcus pneumoniae have recently been determined. Bioinformatic procedures were used to predict the general functions of 1,973 of the 1,999 gene products and to identify proteins within the same homology group, Pfam family, and CAZy glycosyltransferase family. Correlating cps gene content with the 54 known capsular polysaccharide (CPS) structures provided tentative assignments of the specific functions of the different homology groups of each functional class (regulatory proteins, enzymes for synthesis of CPS constituents, polymerases, flippases, initial sugar transferases, glycosyltransferases [GTs], phosphotransferases, acetyltransferases, and pyruvyltransferases). Assignment of the glycosidic linkages catalyzed by the 342 GTs (92 homology groups) is problematic, but tentative assignments could be made by using this large set of cps loci and CPS structures to correlate the presence of particular GTs with specific glycosidic linkages, by correlating inverting or retaining linkages in CPS repeat units with the inverting or retaining mechanisms of the GTs predicted from their CAZy family membership, and by comparing the CPS structures of serotypes that have very similar cps gene contents. These large-scale comparisons between structure and gene content assigned the linkages catalyzed by 72% of the GTs, and all linkages were assigned in 32 of the serotypes with known repeat unit structures. Clear examples where very similar initial sugar transferases or glycosyltransferases catalyze different linkages in different serotypes were also identified. These assignments should provide a stimulus for biochemical studies to evaluate the reactions that are proposed.
Funded by: Wellcome Trust
Journal of bacteriology 2007;189;21;7856-76
WebACT: an online genome comparison suite.
Centre for Bioinformatics, Imperial College London, UK.
Comparison of related genomes is an enormously powerful technique for explaining phenotypic differences and revealing recent evolutionary events. Genomes evolve through a host of mechanisms including long- and short-range intragenomic rearrangements, insertion of laterally acquired DNA, gene loss, and single-nucleotide polymorphisms. The Artemis Comparison Tool (ACT) was developed to enable the intuitive visualization of the consequences of such events in the context of two or more aligned genomes. WebACT is an online resource designed to allow the alignment of up to five genomic sequences within the ACT environment without the need for local software installation. Comparisons can be carried out between uploaded sequences, or those selected from the EMBL or RefSeq databases, using BLASTZ, MUMmer, or Basic Local Alignment Search Tool (BLAST). Precomputed comparisons can be selected from a database covering all the completed bacterial chromosome and plasmid sequences in the Genome Reviews database (1). This allows the rapid visualization of regions of interest, without the need to handle the full genome sequences. Here, we describe the process of using WebACT to prepare comparisons for visualization, and the selection of precomputed comparisons from the database. The use of ACT to view the selected comparison is then explored using examples from bacterial genomes.
Funded by: Wellcome Trust
Methods in molecular biology (Clifton, N.J.) 2007;395;57-74
BCL11B is required for positive selection and survival of double-positive thymocytes.
Center for Cell Biology and Cancer Research, Albany Medical College, Albany, NY 12208, USA.
Transcriptional control of gene expression in double-positive (DP) thymocytes remains poorly understood. We show that the transcription factor BCL11B plays a critical role in DP thymocytes by controlling positive selection of both CD4 and CD8 lineages. BCL11B-deficient DP thymocytes rearrange T cell receptor (TCR) alpha; however, they display impaired proximal TCR signaling and attenuated extracellular signal-regulated kinase phosphorylation and calcium flux, which are all required for initiation of positive selection. Further, provision of transgenic TCRs did not improve positive selection of BCL11B-deficient DP thymocytes. BCL11B-deficient DP thymocytes have altered expression of genes with a role in positive selection, TCR signaling, and other signaling pathways intersecting the TCR, which may account for the defect. BCL11B-deficient DP thymocytes also presented increased susceptibility to spontaneous apoptosis associated with high levels of cleaved caspase-3 and an altered balance of proapoptotic/prosurvival factors. This latter susceptibility was manifested even in the absence of TCR signaling and was only partially rescued by provision of the BCL2 transgene, indicating that control of DP thymocyte survival by BCL11B is nonredundant and, at least in part, independent of BCL2 prosurvival factors.
Funded by: NHLBI NIH HHS: T32 HL007194, T32-HL-07194; NIAID NIH HHS: R01 AI067846, R01 AI067846-01A2; NIAMS NIH HHS: K01 AR-02194, K01 AR002194
The Journal of experimental medicine 2007;204;12;3003-15
SISYPHUS--structural alignments for proteins with non-trivial relationships.
MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK. firstname.lastname@example.org
With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Sigmaomeganuphiomicronzeta]--in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from http://sisyphus.mrc-cpe.cam.ac.uk.
Funded by: Medical Research Council: G0100305, MC_U105192716; Wellcome Trust: 077198
Nucleic acids research 2007;35;Database issue;D253-9
The genome of Salmonella enterica serovar Typhi.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
The generation of complete genome sequences provides a blueprint that facilitates the genetic characterization of pathogens and their hosts. The genome of Salmonella enterica serovar Typhi (S. Typhi) harbors ~5 million base pairs encoding some 4000 genes, of which >200 are functionally inactive. Comparison of S. Typhi isolates from around the world indicates that they are highly related (clonal) and that they emerged from a single point of origin ~30,000-50,000 years ago. Evidence suggests that, as well as undergoing gene degradation, S. Typhi has also recently acquired genes, such as those encoding the Vi antigen, by horizontal transfer events.
Funded by: Wellcome Trust
Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2007;45 Suppl 1;S29-33
A linear plasmid truncation induces unidirectional flagellar phase change in H:z66 positive Salmonella Typhi.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. email@example.com
The process by which bacteria regulate flagellar expression is known as phase variation and in Salmonella enterica this process permits the expression of one of two flagellin genes, fliC or fljB, at any one time. Salmonella Typhi (S. Typhi) is normally not capable of phase variation of flagellar antigen expression as isolates only harbour the fliC gene (H:d) and lacks an equivalent fljB locus. However, some S. Typhi isolates, exclusively from Indonesia, harbour an fljB equivalent encoded on linear plasmid, pBSSB1 that drives the expression of a novel flagellin named H:z66. H:z66+S. Typhi isolates were stimulated to change flagellar phase and genetically analysed for the mechanism of variation. The phase change was demonstrated to be unidirectional, reverting to expression from the resident chromosomal fliC gene. DNA sequencing demonstrated that pBSSB1 linear DNA was still detectable but that these derivatives had undergone deletion and were lacking fljA(z66) (encoding a flagellar repressor) and fljB(z66). The deletion end-point was found to involve one of the plasmid termini and a palindromic repeat sequence within fljB(z66), distinct to that found at the terminus of pBSSB1. These data demonstrate that, like some Streptomyces linear elements, at least one of the terminal inverted repeats of pBSSB1 is non-essential, but that a palindromic repeat sequence may be necessary for replication.
Funded by: Wellcome Trust: 076962
Molecular microbiology 2007;66;5;1207-18
SCOOP: a simple method for identification of novel protein superfamily relationships.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK. firstname.lastname@example.org
Motivation: Profile searches of sequence databases are a sensitive way to detect sequence relationships. Sophisticated profile-profile comparison algorithms that have been recently introduced increase search sensitivity even further.
Results: In this article, a simpler approach than profile-profile comparison is presented that has a comparable performance to state-of-the-art tools such as COMPASS, HHsearch and PRC. This approach is called SCOOP (Simple Comparison Of Outputs Program), and is shown to find known relationships between families in the Pfam database as well as detect novel distant relationships between families. Several novel discoveries are presented including the discovery that a domain of unknown function (DUF283) found in Dicer proteins is related to double-stranded RNA-binding domains.
Availability: SCOOP is freely available under a GNU GPL license from http://www.sanger.ac.uk/Users/agb/SCOOP/.
Supplementary information: Supplementary data are available at Bioinformatics online.
Funded by: Wellcome Trust: 087656
Bioinformatics (Oxford, England) 2007;23;7;809-14
Nature reviews. Microbiology 2007;5;3;170-1
Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18.
Wellcome Trust Sanger Institute, Hinxton, United Kingdom. email@example.com
The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements) provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an impact on the interaction with the host tissues, and understanding these mechanisms is important to aid our understanding of the intimate and complex relationship between the human nasopharynx and the meningococcus.
Funded by: Wellcome Trust
PLoS genetics 2007;3;2;e23
Variety is the spice of eukaryotic life.
Nature reviews. Microbiology 2007;5;9;660-1
Genome plasticity of BCG and impact on vaccine efficacy.
Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, 28 Rue du Docteur Roux, 75724 Paris Cedex 15, France.
To understand the evolution, attenuation, and variable protective efficacy of bacillus Calmette-Guérin (BCG) vaccines, Mycobacterium bovis BCG Pasteur 1173P2 has been subjected to comparative genome and transcriptome analysis. The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. DU1 is restricted to BCG Pasteur, although four forms of DU2 exist; DU2-I is confined to early BCG vaccines, like BCG Japan, whereas DU2-III and DU2-IV occur in the late vaccines. The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. An evolutionary scheme for BCG vaccines was established by analyzing DU2 and other markers. Lesions in genes encoding sigma-factors and pleiotropic transcriptional regulators, like PhoR and Crp, were also uncovered in various BCG strains; together with gene amplification, these affect gene expression levels, immunogenicity, and, possibly, protection against tuberculosis. Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.
Funded by: Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2007;104;13;5596-601
Generation of an inducible and optimized piggyBac transposon system.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
Genomic studies in the mouse have been slowed by the lack of transposon-mediated mutagenesis. However, since the resurrection of Sleeping Beauty (SB), the possibility of performing forward genetics in mice has been reinforced. Recently, piggyBac (PB), a functional transposon from insects, was also described to work in mammals. As the activity of PB is higher than that of SB11 and SB12, two hyperactive SB transposases, we have characterized and improved the PB system in mouse ES cells. We have generated a mouse codon-optimized version of the PB transposase coding sequence (CDS) which provides transposition levels greater than the original. We have also found that the promoter sequence predicted in the 5'-terminal repeat of the PB transposon is active in the mammalian context. Finally, we have engineered inducible versions of the optimized piggyBac transposase fused with ERT2. One of them, when induced, provides higher levels of transposition than the native piggyBac CDS, whereas in the absence of induction its activity is indistinguishable from background. We expect that these tools, adaptable to perform mouse-germline mutagenesis, will facilitate the identification of genes involved in pathological and physiological processes, such as cancer or ES cell differentiation.
Funded by: Wellcome Trust
Nucleic acids research 2007;35;12;e87
Methods and strategies for analyzing copy number variation using DNA microarrays.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
The association of DNA copy-number variation (CNV) with specific gene function and human disease has been long known, but the wide scope and prevalence of this form of variation has only recently been fully appreciated. The latest studies using microarray technology have demonstrated that as much as 12% of the human genome and thousands of genes are variable in copy number, and this diversity is likely to be responsible for a significant proportion of normal phenotypic variation. Current challenges involve developing methods not only for detecting and cataloging CNVs in human populations at increasingly higher resolution but also for determining the association of CNVs with biological function, recent human evolution, and common and complex human disease.
Funded by: Wellcome Trust: 077008
Nature genetics 2007;39;7 Suppl;S16-21
A recombineering based approach for high-throughput conditional knockout targeting vector construction.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
Functional analysis of mammalian genes in vivo is primarily achieved through analysing knockout mice. Now that the sequencing of several mammalian genomes has been completed, understanding functions of all the genes represents the next major challenge in the post-genome era. Generation of knockout mutant mice has currently been achieved by many research groups but only by making individual knockouts, one by one. New technological advances and the refinements of existing technologies are critical for genome-wide targeted mutagenesis in the mouse. We describe here new recombineering reagents and protocols that enable recombineering to be carried out in a 96-well format. Consequently, we are able to construct 96 conditional knockout targeting vectors simultaneously. Our new recombineering system makes it a reality to generate large numbers of precisely engineered DNA constructs for functional genomics studies.
Funded by: Intramural NIH HHS; Wellcome Trust
Nucleic acids research 2007;35;8;e64
Serodiagnosis of Salmonella enterica serovar Typhi and S. enterica serovars Paratyphi A, B and C human infections.
Laboratory of Enteric Pathogens, Department of Gastrointestinal Infections, Centre for Infections, Health Protection Agency, 61 Colindale Avenue, London NW9 5EQ, UK. Henrik.Chart@hpa.org.uk
The aim of this study was to evaluate an immunoassay for the detection of human serum antibodies to the LPS and flagellar antigens of Salmonella Typhi and Salmonella Paratyphi A, B and C, and to the Vi capsular polysaccharide of S. Typhi and S. Paratyphi C. A total of 330 sera were used; these originated from 15 patients who were culture-positive for S. Typhi and 15 healthy controls, together with 300 sera submitted to the Laboratory of Enteric Pathogens for Salmonella serodiagnosis. By SDS-PAGE/immunoblotting, all 15 sera from culture-positive patients had serum antibodies to the 9,12 LPS antigens and 10 had antibodies to the 'd' flagellar antigens. Of the 300 reference sera, 22 had antibodies to the 9,12 LPS antigens, one to the 1,4,5,12 LPS antigens and 12 to the 6,7 LPS antigens. Only two sera had antibodies to flagellar antigens, one of which bound to the 'b' and the other to the 'd' antigen. An ELISA was developed that successfully detected serum antibodies to the Vi capsular polysaccharides, but because of the kinetics of serum antibody production to the Vi, these antibodies may be of limited value in the serodiagnosis of acute infection with S. Typhi and S. Paratyphi C. The immunoassays described here provide a sensitive means of detecting serum antibodies to the LPS, flagellar and Vi antigens of S. Typhi and S. Paratyphi, and constitute a viable replacement for the Widal assay for the screening of sera. The Salmonella serodiagnosis protocols described here are the new standard operating procedures used by the Health Protection Agency's National Salmonella Reference Centre based in the Laboratory of Enteric Pathogens, Colindale, UK.
Journal of medical microbiology 2007;56;Pt 9;1161-6
Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.
Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.
Availability: Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix
Funded by: Wellcome Trust: 077192
Bioinformatics (Oxford, England) 2007;23;12;1468-75
Adiponectin receptor genes: mutation screening in syndromes of insulin resistance and association studies for type 2 diabetes and metabolic traits in UK populations.
Metabolic Disease Group, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
Aims/hypothesis: Adiponectin is an adipokine with insulin-sensitising and anti-atherogenic properties. Several reports suggest that genetic variants in the adiponectin gene are associated with circulating levels of adiponectin, insulin sensitivity and type 2 diabetes risk. Recently two receptors for adiponectin have been cloned. Genetic studies have yielded conflicting results on the role of these genes and type 2 diabetes predisposition. In this study we aimed to evaluate the potential role of genetic variation in these genes in syndromes of severe insulin resistance, type 2 diabetes and in related metabolic traits in UK Europid populations.
Materials and methods: Exons and splice junctions of the adiponectin receptor 1 and 2 genes (ADIPOR1; ADIPOR2) were sequenced in patients from our severe insulin resistance cohort (n=129). Subsequently, 24 polymorphisms were tested for association with type 2 diabetes in population-based type 2 diabetes case-control studies (n=2,127) and with quantitative traits in a population-based longitudinal study (n=1,721).
Results: No missense or nonsense mutations in ADIPOR1 and ADIPOR2 were detected in the cohort of patients with severe insulin resistance. None of the 24 polymorphisms (allele frequency 2.3-48.3%) tested was associated with type 2 diabetes in the case-control study. Similarly, none of the polymorphisms was associated with fasting plasma insulin, fasting and 2-h post-load plasma glucose, 30-min insulin increment or BMI.
Conclusions/interpretation: Genetic variation in ADIPOR1 and ADIPOR2 is not a major cause of extreme insulin resistance in humans, nor does it contribute in a significant manner to type 2 diabetes risk and related traits in UK Europid populations.
Funded by: Medical Research Council: MC_U106179471; Wellcome Trust
The population genetics of structural variation.
Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.
Population genetics is central to our understanding of human variation, and by linking medical and evolutionary themes, it enables us to understand the origins and impacts of our genomic differences. Despite current limitations in our knowledge of the locations, sizes and mutational origins of structural variants, our characterization of their population genetics is developing apace, bringing new insights into recent human adaptation, genome biology and disease. We summarize recent dramatic advances, describe the diverse mutational origins of chromosomal rearrangements and argue that their complexity necessitates a re-evaluation of existing population genetic methods.
Funded by: Wellcome Trust: 077014
Nature genetics 2007;39;7 Suppl;S30-6
Sink or swim.
Nature reviews. Microbiology 2007;5;11;834-5
Tissue-specific histone modification and transcription factor binding in alpha globin gene expression.
Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK.
To address the mechanism by which the human globin genes are activated during erythropoiesis, we have used a tiled microarray to analyze the pattern of transcription factor binding and associated histone modifications across the telomeric region of human chromosome 16 in primary erythroid and nonerythroid cells. This 220-kb region includes the alpha globin genes and 9 widely expressed genes flanking the alpha globin locus. This un-biased, comprehensive analysis of transcription factor binding and histone modifications (acetylation and methylation) described here not only identified all known cis-acting regulatory elements in the human alpha globin cluster but also demonstrated that there are no additional erythroid-specific regulatory elements in the 220-kb region tested. In addition, the pattern of histone modification distinguished promoter elements from potential enhancer elements across this region. Finally, comparison of the human and mouse orthologous regions in a unique mouse model, with both regions coexpressed in the same animal, showed significant differences that may explain how these 2 clusters are regulated differently in vivo.
Funded by: Medical Research Council: MC_U137961145, MC_U137961147; NHGRI NIH HHS: U01 HG003168; Wellcome Trust
Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.
Grup de Recerca en Informática Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.
Funded by: NCI NIH HHS: N01CO12400; NHGRI NIH HHS: U01 HG003147, U01 HG003150, U01HG03147, U01HG03150; PHS HHS: N01C012400; Wellcome Trust: 077198
Genome research 2007;17;6;746-59
Large-scale discovery of promoter motifs in Drosophila melanogaster.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. firstname.lastname@example.org
A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.
Funded by: Wellcome Trust: 077198
PLoS computational biology 2007;3;1;e7
An H-NS-like stealth protein aids horizontal DNA transmission in bacteria.
Department of Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin 2, Ireland. email@example.com
The Sfh protein is encoded by self-transmissible plasmids involved in human typhoid and is closely related to the global regulator H-NS. We have found that Sfh provides a stealth function that allows the plasmids to be transmitted to new bacterial hosts with minimal effects on their fitness. Introducing the plasmid without the sfh gene imposes a mild H-NS(-) phenotype and a severe loss of fitness due to titration of the cellular pool of H-NS by the A+T-rich plasmid. This stealth strategy seems to be used widely to aid horizontal DNA transmission and has important implications for bacterial evolution.
Funded by: Wellcome Trust
Science (New York, N.Y.) 2007;315;5809;251-2
Evolution of genes and genomes on the Drosophila phylogeny.
Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA. firstname.lastname@example.org
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
Funded by: Intramural NIH HHS: Z01 DK015600-12; Medical Research Council: MC_U105161047, MC_U137761446; NHGRI NIH HHS: R01 HG000747, R01 HG000747-16, R01 HG002779-05, R01 HG002779-06, R01 HG004037; NIGMS NIH HHS: F32 GM067504, R01 GM074813-04; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09
Genome-wide association study identifies novel breast cancer susceptibility loci.
CR-UK Genetic Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK. email@example.com
Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls from 22 studies. We used 227,876 SNPs that were estimated to correlate with 77% of known common SNPs in Europeans at r2 > 0.5. SNPs in five novel independent loci exhibited strong and consistent evidence of association with breast cancer (P < 10(-7)). Four of these contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1). At the second stage, 1,792 SNPs were significant at the P < 0.05 level compared with an estimated 1,343 that would be expected by chance, indicating that many additional common susceptibility alleles may be identifiable by this approach.
Funded by: Breast Cancer Now: 2004NOV49, BREAST CANCER NOW RESEARCH CENTRE; Cancer Research UK: A3353
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Funded by: NCI NIH HHS: F32 CA108313; NHGRI NIH HHS: K22 HG003169, K22 HG003169-01A1, P41 HG002371, P41 HG002371-03S1, R01 HG002238, R01 HG002238-15, R01 HG003110, R01 HG003110-03, R01 HG003129-03, R01 HG003143, R01 HG003143-04, R01 HG003521, R01 HG003521-01, R01 HG003532, R01 HG003532-01, R01 HG003541, R01 HG003541-03, U01 HG002523, U01 HG002523-01, U01 HG003147, U01 HG003147-02, U01 HG003150, U01 HG003150-03, U01 HG003151, U01 HG003151-03, U01 HG003156, U01 HG003156-03, U01 HG003157, U01 HG003157-03, U01 HG003161, U01 HG003161-03, U01 HG003162, U01 HG003162-03, U01 HG003168-02, U54 HG003067, U54 HG003067-01, U54 HG003079, U54 HG003079-01, U54 HG003273, U54 HG003273-01; Wellcome Trust: 062023, 077198
Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
Cambridge Institute for Medical Research, University Department of Clinical Biochemistry, Addenbrooke's Hospital, Cambridge, United Kingdom. firstname.lastname@example.org
Background: A single family has been described in which obesity results from a mutation in the leptin-receptor gene (LEPR), but the prevalence of such mutations in severe, early-onset obesity has not been systematically examined.
Methods: We sequenced LEPR in 300 subjects with hyperphagia and severe early-onset obesity, including 90 probands from consanguineous families, and investigated the extent to which mutations cosegregated with obesity and affected receptor function. We evaluated metabolic, endocrine, and immune function in probands and affected relatives.
Results: Of the 300 subjects, 8 (3%) had nonsense or missense LEPR mutations--7 were homozygotes, and 1 was a compound heterozygote. All missense mutations resulted in impaired receptor signaling. Affected subjects were characterized by hyperphagia, severe obesity, alterations in immune function, and delayed puberty due to hypogonadotropic hypogonadism. Serum leptin levels were within the range predicted by the elevated fat mass in these subjects. Their clinical features were less severe than those of subjects with congenital leptin deficiency.
Conclusions: The prevalence of pathogenic LEPR mutations in a cohort of subjects with severe, early-onset obesity was 3%. Circulating levels of leptin were not disproportionately elevated, suggesting that serum leptin cannot be used as a marker for leptin-receptor deficiency. Congenital leptin-receptor deficiency should be considered in the differential diagnosis in any child with hyperphagia and severe obesity in the absence of developmental delay or dysmorphism.
Funded by: Medical Research Council: G0502115; Telethon: GJT04008; Wellcome Trust: 067457, 068086, 077016
The New England journal of medicine 2007;356;3;237-47
High resolution array-CGH analysis of single cells.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Heterogeneity in the genome copy number of tissues is of particular importance in solid tumor biology. Furthermore, many clinical applications such as pre-implantation and non-invasive prenatal diagnosis would benefit from the ability to characterize individual single cells. As the amount of DNA from single cells is so small, several PCR protocols have been developed in an attempt to achieve unbiased amplification. Many of these approaches are suitable for subsequent cytogenetic analyses using conventional methodologies such as comparative genomic hybridization (CGH) to metaphase spreads. However, attempts to harness array-CGH for single-cell analysis to provide improved resolution have been disappointing. Here we describe a strategy that combines single-cell amplification using GenomePlex library technology (GenomePlex) Single Cell Whole Genome Amplification Kit, Sigma-Aldrich, UK) and detailed analysis of genomic copy number changes by high-resolution array-CGH. We show that single copy changes as small as 8.3 Mb in single cells are detected reliably with single cells derived from various tumor cell lines as well as patients presenting with trisomy 21 and Prader-Willi syndrome. Our results demonstrate the potential of this technology for studies of tumor biology and for clinical diagnostics.
Funded by: Wellcome Trust
Nucleic acids research 2007;35;3;e15
ProServer: a simple, extensible Perl DAS server.
Wellcome Trust Sanger Institute, Wellcome Trust Geome Campus, Hinxton, Cambridge, UK.
Summary: The increasing size and complexity of biological databases has led to a growing trend to federate rather than duplicate them. In order to share data between federated databases, protocols for the exchange mechanism must be developed. One such data exchange protocol that is widely used is the Distributed Annotation System (DAS). For example, DAS has enabled small experimental groups to integrate their data into the Ensembl genome browser. We have developed ProServer, a simple, lightweight, Perl-based DAS server that does not depend on a separate HTTP server. The ProServer package is easily extensible, allowing data to be served from almost any underlying data model. Recent additions to the DAS protocol have enabled both structure and alignment (sequence and structural) data to be exchanged. ProServer allows both of these data types to be served.
Availability: ProServer can be downloaded from http://www.sanger.ac.uk/proserver/ or CPAN http://search.cpan.org/~rpettett/. Details on the system requirements and installation of ProServer can be found at http://www.sanger.ac.uk/proserver/.
Funded by: Medical Research Council: G0100305; Wellcome Trust
Bioinformatics (Oxford, England) 2007;23;12;1568-70
PPARGC1A coding variation may initiate impaired NEFA clearance during glucose challenge.
Genetic Epidemiology and Clinical Research Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University Hospital, Umeå, Sweden. firstname.lastname@example.org
Aims/hypothesis: The peroxisome proliferator-activated receptor gamma coactivator 1-alpha protein, encoded by the PPARGC1A gene, transcriptionally activates a complex pathway of lipid and glucose metabolism and is expressed primarily in tissues of high metabolic activity such as liver, heart and exercising oxidative skeletal muscle fibre. Ppargc1a-null mice develop systemic dyslipidaemia and hepatic steatosis. In humans, NEFAs downregulate PPARGC1A expression in skeletal muscle. Furthermore, a common non-synonymous coding variant at PPARGC1A (Gly482Ser, rs8192678) is associated with decreased PPARGC1A mRNA levels and increased type 2 diabetes risk.
Materials and methods: In a population-based sample of 691 healthy middle-aged Europids we assessed whether Gly482Ser is associated with levels of NEFA when fasting and in response to an oral glucose challenge. We also assessed the potential effect-modifying role of adipose tissue mass on these phenotypes.
Results: After adjustment for age, sex, fat mass and fat-free mass, the Ser482 allele associated with higher NEFA at 30 min and 2 h and with NEFA AUC (all values p<or=0.02). Furthermore, suggestive evidence of interaction between fat mass and Gly482Ser was observed for fasting NEFA (p=0.059). After stratification by level of obesity, genotype associations were observed in the obese for fasting NEFA (p=0.028) and NEFA at 30 min (p=0.013) and 2 h (p=0.002), and with NEFA AUC (p=0.005), but no significant associations were observed in lean individuals (all values p>0.6).
Conclusions/interpretation: Our observations indicate that NEFA clearance is blunted following a glucose load in carriers of the PPARCG1A Ser482 allele. This association is augmented by obesity.
Funded by: Medical Research Council: MC_U106179471, MC_U106179473; Wellcome Trust: 077016
A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.
Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter, UK.
Obesity is a serious international health problem that increases the risk of several common diseases. The genetic factors predisposing to obesity are poorly understood. A genome-wide search for type 2 diabetes-susceptibility genes identified a common variant in the FTO (fat mass and obesity associated) gene that predisposes to diabetes through an effect on body mass index (BMI). An additive association of the variant with BMI was replicated in 13 cohorts with 38,759 participants. The 16% of adults who are homozygous for the risk allele weighed about 3 kilograms more and had 1.67-fold increased odds of obesity when compared with those not inheriting a risk allele. This association was observed from age 7 years upward and reflects a specific increase in fat mass.
Funded by: Intramural NIH HHS: Z99 AG999999; Medical Research Council: G0000934, G0500070, G0600705, G9815508, MC_U106179471, MC_U106188470; Wellcome Trust: 079557, 090532
Science (New York, N.Y.) 2007;316;5826;889-94
Definition of the zebrafish genome using flow cytometry and cytogenetic mapping.
Background: The zebrafish (Danio rerio) is an important vertebrate model organism system for biomedical research. The syntenic conservation between the zebrafish and human genome allows one to investigate the function of human genes using the zebrafish model. To facilitate analysis of the zebrafish genome, genetic maps have been constructed and sequence annotation of a reference zebrafish genome is ongoing. However, the duplicative nature of teleost genomes, including the zebrafish, complicates accurate assembly and annotation of a representative genome sequence. Cytogenetic approaches provide "anchors" that can be integrated with accumulating genomic data.
Results: Here, we cytogenetically define the zebrafish genome by first estimating the size of each linkage group (LG) chromosome using flow cytometry, followed by the cytogenetic mapping of 575 bacterial artificial chromosome (BAC) clones onto metaphase chromosomes. Of the 575 BAC clones, 544 clones localized to apparently unique chromosomal locations. 93.8% of these clones were assigned to a specific LG chromosome location using fluorescence in situ hybridization (FISH) and compared to the LG chromosome assignment reported in the zebrafish genome databases. Thirty-one BAC clones localized to multiple chromosomal locations in several different hybridization patterns. From these data, a refined second generation probe panel for each LG chromosome was also constructed.
Conclusion: The chromosomal mapping of the 575 large-insert DNA clones allows for these clones to be integrated into existing zebrafish mapping data. An accurately annotated zebrafish reference genome serves as a valuable resource for investigating the molecular basis of human diseases using zebrafish mutant models.
Funded by: NCI NIH HHS: R01 CA111560, R01-CA111560; Wellcome Trust
BMC genomics 2007;8;195
The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase.
Chemistry Research Laboratory and Oxford Centre for Integrative Systems Biology, University of Oxford, 12 Mansfield Road, Oxford, Oxon OX1 3TA, UK.
Variants in the FTO (fat mass and obesity associated) gene are associated with increased body mass index in humans. Here, we show by bioinformatics analysis that FTO shares sequence motifs with Fe(II)- and 2-oxoglutarate-dependent oxygenases. We find that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide. Consistent with a potential role in nucleic acid demethylation, Fto localizes to the nucleus in transfected cells. Studies of wild-type mice indicate that Fto messenger RNA (mRNA) is most abundant in the brain, particularly in hypothalamic nuclei governing energy balance, and that Fto mRNA levels in the arcuate nucleus are regulated by feeding and fasting. Studies can now be directed toward determining the physiologically relevant FTO substrate and how nucleic acid methylation status is linked to increased fat mass.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D011523/1; Medical Research Council: G108/617, G9824984, MC_U137761446; NIGMS NIH HHS: U54 GM064346; Wellcome Trust: 068086, 077016
Science (New York, N.Y.) 2007;318;5855;1469-72
Draft genome of the filarial nematode parasite Brugia malayi.
Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA. GhedinE@dom.pitt.edu
Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the approximately 90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict approximately 11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during approximately 350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.
Funded by: NIAID NIH HHS: R01 AI048562, R01 AI048562-09, U01-AI50903; NIEHS NIH HHS: R15 ES013128, R15 ES013128-01; NLM NIH HHS: R01 LM006845, R01 LM006845-08, R01 LM007938, R01 LM007938-04
Science (New York, N.Y.) 2007;317;5845;1756-60
Ultra-high resolution array painting facilitates breakpoint sequencing.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Objective: To describe a considerably advanced method of array painting, which allows the rapid, ultra-high resolution mapping of translocation breakpoints such that rearrangement junction fragments can be amplified directly and sequenced.
Method: Ultra-high resolution array painting involves the hybridisation of probes generated by the amplification of small numbers of flow-sorted derivative chromosomes to oligonucleotide arrays designed to tile breakpoint regions at extremely high resolution.
Results and discussion: How ultra-high resolution array painting of four balanced translocation cases rapidly and efficiently maps breakpoints to a point where junction fragments can be amplified easily and sequenced is demonstrated. With this new development, breakpoints can be mapped using just two array experiments: the first using whole-genome array painting to tiling resolution large insert clone arrays, the second using ultra-high-resolution oligonucleotide arrays targeted to the breakpoint regions. In this way, breakpoints can be mapped and then sequenced in a few weeks.
Funded by: Wellcome Trust: 077008
Journal of medical genetics 2007;44;1;51-8
Improving the power to detect differentially expressed genes in comparative microarray experiments by including information from self-self hybridizations.
Medical Research Council-Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK. Arief.Gusnanto@mrc-bsu.cam.ac.uk
Our ability to detect differentially expressed genes in a microarray experiment can be hampered when the number of biological samples of interest is limited. In this situation, we propose the use of information from self-self hybridizations to acuminate our inference of differential expression. A unified modelling strategy is developed to allow better estimation of the error variance. This principle is similar to the use of a pooled variance estimate in the two-sample t-test. The results from real dataset examples suggest that we can detect more genes that are differentially expressed in the combined models. Our simulation study provides evidence that this method increases sensitivity compared to using the information from comparative hybridizations alone, given the same control for false discovery rate. The largest increase in sensitivity occurs when the amount of information in the comparative hybridization is limited.
Funded by: Medical Research Council: MC_U105260799, MC_U105261167
Computational biology and chemistry 2007;31;3;178-85
Schistosoma mansoni genome: closing in on a final gene set.
The J.C. Venter Institute, Rockville, MD 20850, USA.
The Schistosoma mansoni genome sequencing consortium has recently released the latest versions of the genome assembly as well as an automated preliminary gene structure annotation. The combined datasets constitute a vast resource for researchers to exploit in a variety of post-genomic studies with an emphasis of transcriptomic and proteomic tools. Here we present an innovative method used for combining diverse sources of evidence including ab initio gene predictions, protein and transcript sequence homologies, and cross-genome sequence homologies between S. mansoni and Schistosoma japonicum to define a comprehensive list of protein-coding genes.
Funded by: NIAID NIH HHS: AI48828; Wellcome Trust: 13557021
Experimental parasitology 2007;117;3;225-8
Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17.
Wellcome Trust Sanger Institute, Wellcome Tust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Background: We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage.
Results: Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs.
Conclusion: We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS.
Funded by: Biotechnology and Biological Sciences Research Council: BBE0116401; Wellcome Trust: 077198
Genome biology 2007;8;8;R168
Specialist fungi, versatile genomes.
Nature reviews. Microbiology 2007;5;5;332-3
Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome.
Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, PA 15212, USA.
The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strain's genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.
Funded by: NIDCD NIH HHS: DC02148, DC04173, DC05659, R01 DC002148, R01 DC004173, R01 DC005659; Wellcome Trust
Journal of bacteriology 2007;189;22;8186-95
Complete genome of acute rheumatic fever-associated serotype M5 Streptococcus pyogenes strain manfredo.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. firstname.lastname@example.org
Comparisons of the 1.84-Mb genome of serotype M5 Streptococcus pyogenes strain Manfredo with previously sequenced genomes emphasized the role of prophages in diversification of S. pyogenes and the close relationship between strain Manfredo and MGAS8232, another acute rheumatic fever-associated strain.
Funded by: Wellcome Trust
Journal of bacteriology 2007;189;4;1473-7
Multidrug-resistant Salmonella enterica serovar paratyphi A harbors IncHI1 plasmids similar to those found in serovar typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Salmonella enterica serovars Typhi and Paratyphi A cause systemic infections in humans which are referred to as enteric fever. Multidrug-resistant (MDR) serovar Typhi isolates emerged in the 1980s, and in recent years MDR serovar Paratyphi A infections have become established as a significant problem across Asia. MDR in serovar Typhi is almost invariably associated with IncHI1 plasmids, but the genetic basis of MDR in serovar Paratyphi A has remained predominantly undefined. The DNA sequence of an IncHI1 plasmid, pAKU_1, encoding MDR in a serovar Paratyphi A strain has been determined. Significantly, this plasmid shares a common IncHI1-associated DNA backbone with the serovar Typhi plasmid pHCM1 and an S. enterica serovar Typhimurium plasmid pR27. Plasmids pAKU_1 and pHCM1 share 14 antibiotic resistance genes encoded within similar mobile elements, which appear to form a 24-kb composite transposon that has transferred as a single unit into different positions into their IncHI1 backbones. Thus, these plasmids have acquired similar antibiotic resistance genes independently via the horizontal transfer of mobile DNA elements. Furthermore, two IncHI1 plasmids from a Vietnamese isolate of serovar Typhi were found to contain features of the backbone sequence of pAKU_1 rather than pHCM1, with the composite transposon inserted in the same location as in the pAKU_1 sequence. Our data show that these serovar Typhi and Paratyphi A IncHI1 plasmids share highly conserved core DNA and have acquired similar mobile elements encoding antibiotic resistance genes in past decades.
Funded by: Medical Research Council: G0600805; Wellcome Trust
Journal of bacteriology 2007;189;11;4257-64
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. email@example.com
The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E010768/1, BB/E011640/1, BBS/B/13438, BBS/B/13446, BBS/B/13462, BBS/B/13470; Wellcome Trust: 062023
Nucleic acids research 2007;35;Database issue;D610-7
Identification of common genetic variation that modulates alternative splicing.
University Department of Paediatrics, John Radcliffe Hospital, Oxford, United Kingdom. firstname.lastname@example.org
Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron-exon boundary, although the distance between these SNPs and the intron-exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.
Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 074318
PLoS genetics 2007;3;6;e99
Completing the map of human genetic variation.
Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
Funded by: Wellcome Trust: 077008
A high utility integrated map of the pig genome.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA UK. email@example.com
Background: The domestic pig is being increasingly exploited as a system for modeling human disease. It also has substantial economic importance for meat-based protein production. Physical clone maps have underpinned large-scale genomic sequencing and enabled focused cloning efforts for many genomes. Comparative genetic maps indicate that there is more structural similarity between pig and human than, for example, mouse and human, and we have used this close relationship between human and pig as a way of facilitating map construction.
Results: Here we report the construction of the most highly continuous bacterial artificial chromosome (BAC) map of any mammalian genome, for the pig (Sus scrofa domestica) genome. The map provides a template for the generation and assembly of high-quality anchored sequence across the genome. The physical map integrates previous landmark maps with restriction fingerprints and BAC end sequences from over 260,000 BACs derived from 4 BAC libraries and takes advantage of alignments to the human genome to improve the continuity and local ordering of the clone contigs. We estimate that over 98% of the euchromatin of the 18 pig autosomes and the X chromosome along with localized coverage on Y is represented in 172 contigs, with chromosome 13 (218 Mb) represented by a single contig. The map is accessible through pre-Ensembl, where links to marker and sequence data can be found.
Conclusion: The map will enable immediate electronic positional cloning of genes, benefiting the pig research community and further facilitating use of the pig as an alternative animal model for human disease. The clone map and BAC end sequence data can also help to support the assembly of maps and genome sequences of other artiodactyls.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E010520/1, BB/E010520/2, BBE0116401; Wellcome Trust: 077198
Genome biology 2007;8;7;R139
A second generation human haplotype map of over 3.1 million SNPs.
The Scripps Research Institute, 10550 North Torrey Pines Road MEM275, La Jolla, California 92037, USA.
We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.
Funded by: Wellcome Trust: 077008, 077011, 077046, 081682
Genome variation and evolution of the malaria parasite Plasmodium falciparum.
Informatics Division, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Hinxton, UK.
Infections with the malaria parasite Plasmodium falciparum result in more than 1 million deaths each year worldwide. Deciphering the evolutionary history and genetic variation of P. falciparum is critical for understanding the evolution of drug resistance, identifying potential vaccine candidates and appreciating the effect of parasite variation on prevalence and severity of malaria in humans. Most studies of natural variation in P. falciparum have been either in depth over small genomic regions (up to the size of a small chromosome) or genome wide but only at low resolution. In an effort to complement these studies with genome-wide data, we undertook shotgun sequencing of a Ghanaian clinical isolate (with fivefold coverage), the IT laboratory isolate (with onefold coverage) and the chimpanzee parasite P. reichenowi (with twofold coverage). We compared these sequences with the fully sequenced P. falciparum 3D7 isolate genome. We describe the most salient features of P. falciparum polymorphism and adaptive evolution with relation to gene function, transcript and protein expression and cellular localization. This analysis uncovers the primary evolutionary changes that have occurred since the P. falciparum-P. reichenowi speciation and changes that are occurring within P. falciparum.
Funded by: Wellcome Trust: 077046, 079643
Nature genetics 2007;39;1;120-5
In silico functional and structural characterisation of ferlin proteins by mapping disease-causing mutations and evolutionary information onto three-dimensional models of their C2 domains.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Ferlins are C2 domain proteins involved in membrane fusion events, including membrane repair and synaptic exocytosis, and their deficiency can result in muscular dystrophy and deafness. We have undertaken a structural study of their C2 domains by sequence comparison and homology modelling to understand the function of these poorly characterised proteins and to predict the molecular impact of disease-causing mutations. We observe that non-conservative mutations affecting buried residues tend to result in detrimental phenotypes, likely because of decreased protein stability, whereas most variants with replacements in surface residues do not. The few cases of exposed residues altered in variants known to cause diseases are found in conserved areas of functional importance, including essential calcium-binding regions, as deduced by analogy to other characterised C2 domains. Furthermore, we report distinct features of some C2 domains in the two known ferlin subfamilies that correlates with the presence or absence of the DysF domains. Taken altogether, our results highlight potential targets for further experimental analyses to understand the function of ferlin proteins. We believe our modelling data will aid the diagnosis of diseases associated with ferlin mutations and the development of therapeutic strategies.
Funded by: Wellcome Trust
Journal of the neurological sciences 2007;260;1-2;114-23
Immunohistochemical characterization of cytokeratins in the abnormal corneal endothelium of posterior polymorphous corneal dystrophy patients.
Ocular Tissue Bank, General Teaching Hospital and Charles University, U Nemocnice 2, Prague 128 08, Czech Republic. email@example.com
Posterior polymorphous corneal dystrophy (PPCD) is a hereditary bilateral disorder affecting Descemet's membrane and the endothelium. The aim of the present study was to determine the spectrum of cytokeratin (CK) expression in cells on the posterior surface of the cornea in PPCD patients. Ten corneal buttons and one specimen of the trabecular meshwork (TM) from PPCD patients who underwent graft or glaucoma surgery were used, as well as six corneal buttons and two TM specimens obtained from healthy donors as controls. Cryosections were fixed and indirect immunofluorescent staining was performed using antibodies directed against a wide spectrum of cytokeratins (CKs). The number of positive cells and the intensity of the staining were assessed using fluorescent microscopy. All 10 PPCD corneal specimens had areas of endothelium displaying typical endothelial morphology as well as areas consisting of layers two to six cells thick with both flat endothelial-like cells and polygonal cells with round nuclei and a large cytoplasm. Both of these morphologically distinct cell types showed strong immunostaining for CK7, CK19, CK8 and CK18, while weaker positive signals were observed for CK1, CK3/12, CK4, CK5/6, CK10, CK10/13, CK14, CK16 and CK17. PPCD endothelium was completely negative for CK2e, CK9, CK15, and CK20. Focal positivity was detected in PPCD TM for CK4, CK7 and CK19. CK8 and CK18 were the only CKs expressed in control endothelium. PPCD and control epithelium displayed similar staining patterns. The distinct positivity for CK3/12, CK4, CK5/6, CK10/13, CK14, CK16 and CK17 was observed in aberrant PPCD endothelium for the first time. We demonstrate that the abnormal endothelium of PPCD patients expresses a mixture of CKs, with CK7 and CK19 predominating. In terms of CK composition, the aberrant PPCD endothelium shares features of both simple and squamous stratified epithelium with a proliferative capacity. The wide spectrum of CK expression is most probably not indicative of the transformation of endothelial cells to a distinct epithelial phenotype, but more likely reflects the modified differentiation of metaplastic epithelium.
Experimental eye research 2007;84;4;680-6
Mapping the platelet profile for functional genomic studies and demonstration of the effect size of the GP6 locus.
Department of Cardiovascular Sciences, University of Leicester, Leicester, UK.
Background: Evidence suggests the wide variation in platelet response within the population is genetically controlled. Unraveling the complex relationship between sequence variation and platelet phenotype requires accurate and reproducible measurement of platelet response.
Objective: To develop a methodology suitable for measuring signaling pathway-specific platelet phenotype, to use this to measure platelet response in a large cohort, and to demonstrate the effect size of sequence variation in a relevant model gene.
Methods: Three established platelet assays were evaluated: mobilization of [Ca(2+)](i), aggregometry and flow cytometry, each in response to adenosine 5'-diphosphate (ADP) or the glycoprotein (GP) VI-specific crosslinked collagen-related peptide (CRP). Flow cytometric measurement of fibrinogen binding and P-selectin expression in response to a single, intermediate dose of each agonist gave the best combination of reproducibility and inter-individual variability and was used to measure the platelet response in 506 healthy volunteers. Pathway specificity was ensured by blocking the main subsidiary signaling pathways.
Results: Individuals were identified who were hypo- or hyper-responders for both pathways, or who had differential responses to the two agonists, or between outcomes. 89 individuals, retested three months later using the same methodology, showed high concordance between the two visits in all four assays (r(2) = 0.872, 0.868, 0.766 and 0.549); all subjects retaining their phenotype at recall. The effect of sequence variation at the GP6 locus accounted for approximately 35% of the variation in the CRP-XL response.
Conclusion: Genotyping-phenotype association studies in a well-characterized, large cohort provides a powerful strategy to measure the effect of sequence variation in genes regulating the platelet response.
Funded by: Medical Research Council: G0500707, MC_U105260799, MC_U105261167
Journal of thrombosis and haemostasis : JTH 2007;5;8;1756-65
Reduced ENaC protein abundance contributes to the lower blood pressure observed in pendrin-null mice.
Department of Medicine, Emory University, Atlanta, Georgia, USA.
Pendrin (encoded by Pds, Slc26a4) is a Cl(-)/HCO(3)(-) exchanger expressed in the apical regions of type B and non-A, non-B intercalated cells of kidney and mediates renal Cl(-) absorption, particularly when upregulated. Aldosterone increases blood pressure by increasing absorption of both Na(+) and Cl(-) through increased protein abundance and function of Na(+) transporters, such as the epithelial Na(+) channel (ENaC) and the Na(+)-Cl(-) cotransporter (NCC), as well as Cl(-) transporters, such as pendrin. Because aldosterone analogs do not increase blood pressure in Slc26a4(-/-) mice, we asked whether Na(+) excretion and Na(+) transporter protein abundance are altered in kidneys from these mutant mice. Thus wild-type and Slc26a4-null mice were given a NaCl-replete, a NaCl-restricted, or NaCl-replete diet and aldosterone or aldosterone analogs. Abundance of the major renal Na(+) transporters was examined with immunoblots and immunohistochemistry. Slc26a4-null mice showed an impaired ability to conserve Na(+) during dietary NaCl restriction. Under treatment conditions in which circulating aldosterone is increased, alpha-, beta-, and 85-kDa gamma-ENaC subunit protein abundances were reduced 15-35%, whereas abundance of the 70-kDa fragment of gamma-ENaC was reduced approximately 70% in Slc26a4-null relative to wild-type mice. Moreover, ENaC-dependent changes in transepithelial voltage were much lower in cortical collecting ducts from Slc26a4-null than from wild-type mice. Thus, in kidney, ENaC protein abundance and function are modulated by pendrin or through a pendrin-dependent downstream event. The reduced ENaC protein abundance and function observed in Slc26a4-null mice contribute to their lower blood pressure and reduced ability to conserve Na(+) during NaCl restriction.
Funded by: PHS HHS: P01 061521
American journal of physiology. Renal physiology 2007;293;4;F1314-24
Arginine methylation at histone H3R2 controls deposition of H3K4 trimethylation.
Gurdon Institute and Department of Pathology, Tennis Court Road, Cambridge CB2 1QN, UK.
Modifications on histones control important biological processes through their effects on chromatin structure. Methylation at lysine 4 on histone H3 (H3K4) is found at the 5' end of active genes and contributes to transcriptional activation by recruiting chromatin-remodelling enzymes. An adjacent arginine residue (H3R2) is also known to be asymmetrically dimethylated (H3R2me2a) in mammalian cells, but its location within genes and its function in transcription are unknown. Here we show that H3R2 is also methylated in budding yeast (Saccharomyces cerevisiae), and by using an antibody specific for H3R2me2a in a chromatin immunoprecipitation-on-chip analysis we determine the distribution of this modification on the entire yeast genome. We find that H3R2me2a is enriched throughout all heterochromatic loci and inactive euchromatic genes and is present at the 3' end of moderately transcribed genes. In all cases the pattern of H3R2 methylation is mutually exclusive with the trimethyl form of H3K4 (H3K4me3). We show that methylation at H3R2 abrogates the trimethylation of H3K4 by the Set1 methyltransferase. The specific effect on H3K4me3 results from the occlusion of Spp1, a Set1 methyltransferase subunit necessary for trimethylation. Thus, the inability of Spp1 to recognize H3 methylated at R2 prevents Set1 from trimethylating H3K4. These results provide the first mechanistic insight into the function of arginine methylation on chromatin.
Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118, 092096
Paired-end mapping reveals extensive structural variation in the human genome.
Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA.
Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.
Funded by: NCRR NIH HHS: RR19895; Wellcome Trust: 077008, 077014
Science (New York, N.Y.) 2007;318;5849;420-6
New tools and expanded data analysis capabilities at the Protein Structure Prediction Center.
Genome Center, University of California, Davis, California 95616, USA.
We outline the main tasks performed by the Protein Structure Prediction Center in support of the CASP7 experiment and provide a brief review of the major measures used in the automatic evaluation of predictions. We describe in more detail the software developed to facilitate analysis of modeling success over and beyond the available templates and the adopted Java-based tool enabling visualization of multiple structural superpositions between target and several models/templates. We also give an overview of the CASP infrastructure provided by the Center and discuss the organization of the results web pages available through http://predictioncenter.org.
Funded by: NLM NIH HHS: LM07085-01; Wellcome Trust: 077198
Proteins 2007;69 Suppl 8;19-26
An RNA G-quadruplex in the 5' UTR of the NRAS proto-oncogene modulates translation.
University Chemical Laboratory, Lensfield Road, Cambridge CB2 1EW, UK.
Guanine-rich nucleic acid sequences can adopt noncanonical four-stranded secondary structures called guanine (G)-quadruplexes. Bioinformatics analysis suggests that G-quadruplex motifs are prevalent in genomes, which raises the need to elucidate their function. There is now evidence for the existence of DNA G-quadruplexes at telomeres with associated biological function. A recent hypothesis supports the notion that gene promoter elements contain DNA G-quadruplex motifs that control gene expression at the transcriptional level. We discovered a highly conserved, thermodynamically stable RNA G-quadruplex in the 5' untranslated region (UTR) of the gene transcript of the human NRAS proto-oncogene. Using a cell-free translation system coupled to a reporter gene assay, we have demonstrated that this NRAS RNA G-quadruplex modulates translation. This is the first example of translational repression by an RNA G-quadruplex. Bioinformatics analysis has revealed 2,922 other 5' UTR RNA G-quadruplex elements in the human genome. We propose that RNA G-quadruplexes in the 5' UTR modulate gene expression at the translational level.
Funded by: Cancer Research UK: A4081
Nature chemical biology 2007;3;4;218-21
A comprehensive antibody panel for immunohistochemical analysis of formalin-fixed, paraffin-embedded hematopoietic neoplasms of mice: analysis of mouse specific and human antibodies cross-reactive with murine tissue.
GSF Research Center for Environment and Health, Institute of Pathology, Neuherberg 85764, Germany.
Immunohistochemistry is an indispensable tool in human pathology enabling immunophenotypic characterization of tumor cells. Immunohistochemical analyses of mouse models of human hematopoietic neoplasias have become an important aspect for comparison of murine entities with their human counterparts. The aim of this study was to establish a diagnostic antibody panel for analysis of murine lymphomas/leukemias, useful in formalin-fixed/paraffin-embedded tissue. Overall, 48 antibodies (4 rabbit monoclonal, 12 rabbit polyclonal, 2 goat polyclonal, 11 rat, and 19 mouse monoclonal), which were either mouse-specific (14) or cross-reactive with murine tissue (34) were tested for staining quality and diagnostic value in 468 murine hematopoietic neoplasms. Specific staining was achieved with 29 antibodies, of which 18 were human antibodies cross-reactive with murine tissue. Only 23 (B220, BCL-2, BCL-6, CD117, CD138 (2x), CD3 (2x), CD43, CD45, CD5, CD79 alpha cy, cyclin D1, Ki-67 (2x), Mac-3, Mac-2, lysozyme, mast cell tryptase, MPO, Pax-5, TdT, and TER-119) were regarded as valuable for diagnostic evaluation. Immunohistochemistry was also established in an automated immunostainer for high throughput analysis. The antibody panel developed is useful for the classification of murine lymphomas and leukemias analyzed, and a valuable tool for human and veterinary pathologists involved in the diagnostic interpretation of murine models of hematopoietic neoplasias.
Toxicologic pathology 2007;35;3;366-75
hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes.
Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
Complete sets of cloned protein-encoding open reading frames (ORFs), or ORFeomes, are essential tools for large-scale proteomics and systems biology studies. Here we describe human ORFeome version 3.1 (hORFeome v3.1), currently the largest publicly available resource of full-length human ORFs (available at ). Generated by Gateway recombinational cloning, this collection contains 12,212 ORFs, representing 10,214 human genes, and corresponds to a 51% expansion of the original hORFeome v1.1. An online human ORFeome database, hORFDB, was built and serves as the central repository for all cloned human ORFs (http://horfdb.dfci.harvard.edu). This expansion of the original ORFeome resource greatly increases the potential experimental search space for large-scale proteomics studies, which will lead to the generation of more comprehensive datasets.
Common ABCB1 polymorphisms are not associated with multidrug resistance in epilepsy using a gene-wide tagging approach.
Imperial College London, London, UK.
P-glycoprotein, the product of the ABCB1 gene, is a proposed mechanism of pharmacoresistance in epilepsy. Previous attempts to correlate the ABCB1 C3435T SNP, or a three-SNP haplotype containing C3435T with epilepsy pharmacoresistance have produced discordant findings. We analysed these single nucleotide polymorphisms (SNPs), plus a more comprehensive set of tagging SNPs describing common variation in ABCB1 in a case-control study. No significant association of C3435T (P=0.55), the three-SNP haplotype (lowest P=0.14) or any gene-wide tagging SNP (lowest P=0.17) with multidrug resistance in epilepsy was identified. Meta-analysis of studies using the same definition of multidrug resistance (n=1064) also demonstrated no significant association of C3435T with multidrug resistance (P=0.31). These findings suggest that C3435T is unlikely to be a marker for epilepsy multidrug resistance. In addition, no evidence for a role of other common ABCB1 polymorphisms was found using a potentially more powerful gene-wide tagging approach.
Funded by: Wellcome Trust
Pharmacogenetics and genomics 2007;17;3;217-20
The association between polymorphisms in RLIP76 and drug response in epilepsy.
Imperial College London, Division of Neuroscience, Charing Cross Campus, Room 10E07, St Dunstan's Road, London W6 8RF, UK. firstname.lastname@example.org
Introduction: Approximately 30% of patients with epilepsy are resistant to treatment with anti-epileptic drugs (AEDs). The ABC drug transporter proteins are hypothesized to mediate drug resistance in epilepsy. More recently, a non-ABC putative transporter, RLIP76, has also been proposed to be involved in the mechanism of pharmacoresistance. One previous association study of six polymorphisms in RLIP76 failed to find any association with drug resistance in a retrospective cohort of epilepsy patients. We aimed to look for an association with outcomes reflecting drug response in a larger prospective cohort, with gene-wide coverage.
Patients and methods: We investigated the role of common polymorphisms in RLIP76 in epilepsy pharmacoresistance by genotyping 23 common RLIP76 polymorphisms in a prospective cohort of 503 epilepsy patients, from the standard and new anti-epileptic drugs (SANAD) prospective study of new and old AEDs. A total of 13 of these were tested for association with four outcomes reflecting response to drugs: time to first seizure, time to 12-month remission, time to withdrawal due to inadequate seizure control, and time to withdrawal due to unacceptable adverse drug events.
Results: No significant associations, allowing for multiple testing, were found in the whole cohort. There was also no effect in a subgroup of patients on carbamazepine, which is thought to be a RLIP76 substrate, although two polymorphisms were associated with time to first seizure (p = 0.007).
Discussion: We failed to demonstrate any association between RLIP76 polymorphisms and four different measures of drug response in the larger cohort, but a subgroup analysis of patients receiving carbamazepine suggested an association that should be investigated further.
Conclusions: Our data suggest that common variants in RLIP76 are unlikely to contribute to epilepsy drug response.
Sequencing and analysis of chromosome 1 of Eimeria tenella reveals a unique segmental organization.
Malaysia Genome Institute, UKM-MTDC Smart Technology Centre, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia.
Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysis of the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resistance and known to differ between virulent and attenuated strains of the parasite. The chromosome--which appears to be representative of the genome--is gene-dense and rich in simple-sequence repeats, many of which appear to give rise to repetitive amino acid tracts in the predicted proteins. Most striking is the segmentation of the chromosome into repeat-rich regions peppered with transposon-like elements and telomere-like repeats, alternating with repeat-free regions. Predicted genes differ in character between the two types of segment, and the repeat-rich regions appear to be associated with strain-to-strain variation.
Funded by: Biotechnology and Biological Sciences Research Council: S19705; Medical Research Council: MC_U105131672; Wellcome Trust
Genome research 2007;17;3;311-9
Comment on "A common genetic variant is associated with adult and childhood obesity".
Medical Research Council Epidemiology Unit, Cambridge, UK. email@example.com
Herbert et al. (Reports, 14 April 2006, p. 279) found that the rs7566605 genetic variant, located upstream of the INSIG2 gene, was consistently associated with increased body mass index. However, we found no evidence of association between rs7566605 and body mass index in two large ethnically homogeneous population-based cohorts. On the contrary, an opposite tendency was observed.
Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106188470; Wellcome Trust: 077016
Science (New York, N.Y.) 2007;315;5809;187; author reply 187
TCF7L2 polymorphisms modulate proinsulin levels and beta-cell function in a British Europid population.
Medical Research Council Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK. firstname.lastname@example.org
Rapidly accumulating evidence shows that common T-cell transcription factor (TCF)7L2 polymorphisms confer risk of type 2 diabetes through unknown mechanisms. We examined the association between four TCF7L2 single nucleotide polymorphisms (SNPs), including rs7903146, and measures of insulin sensitivity and insulin secretion in 1,697 Europid men and women of the population-based MRC (Medical Research Council)-Ely study. The T-(minor) allele of rs7903146 was strongly and positively associated with fasting proinsulin (P = 4.55 x 10(-9)) and 32,33 split proinsulin (P = 1.72 x 10(-4)) relative to total insulin levels; i.e., differences between T/T and C/C homozygotes amounted to 21.9 and 18.4% respectively. Notably, the insulin-to-glucose ratio (IGR) at 30-min oral glucose tolerance test (OGTT), a frequently used surrogate of first-phase insulin secretion, was not associated with the TCF7L2 SNP (P > 0.7). However, the insulin response (IGR) at 60-min OGTT was significantly lower in T-allele carriers (P = 3.5 x 10(-3)). The T-allele was also associated with higher A1C concentrations (P = 1.2 x 10(-2)) and reduced beta-cell function, assessed by homeostasis model assessment of beta-cell function (P = 2.8 x 10(-2)). Similar results were obtained for the other TCF7L2 SNPs. Of note, both major genes involved in proinsulin processing (PC1, PC2) contain TCF-binding sites in their promoters. Our findings suggest that the TCF7L2 risk allele may predispose to type 2 diabetes by impairing beta-cell proinsulin processing. The risk allele increases proinsulin levels and diminishes the 60-min but not 30-min insulin response during OGTT. The strong association between the TCF7L2 risk allele and fasting proinsulin but not insulin levels is notable, as, in this unselected and largely normoglycemic population, external influences on beta-cell stress are unlikely to be major factors influencing the efficiency of proinsulin processing.
Funded by: Medical Research Council: G9824984, MC_U106179471, MC_U106179472, MC_U106188470; Wellcome Trust: 071187, 077016
Comparative gene expression profiling of in vitro differentiated megakaryocytes and erythroblasts identifies novel activatory and inhibitory platelet membrane proteins.
Department of Haematology, University of Cambridge, Cambridge, UK.
To identify previously unknown platelet receptors we compared the transcriptomes of in vitro differentiated megakaryocytes (MKs) and erythroblasts (EBs). RNA was obtained from purified, biologically paired MK and EB cultures and compared using cDNA microarrays. Bioinformatical analysis of MK-up-regulated genes identified 151 transcripts encoding transmembrane domain-containing proteins. Although many of these were known platelet genes, a number of previously unidentified or poorly characterized transcripts were also detected. Many of these transcripts, including G6b, G6f, LRRC32, LAT2, and the G protein-coupled receptor SUCNR1, encode proteins with structural features or functions that suggest they may be involved in the modulation of platelet function. Immunoblotting on platelets confirmed the presence of the encoded proteins, and flow cytometric analysis confirmed the expression of G6b, G6f, and LRRC32 on the surface of platelets. Through comparative analysis of expression in platelets and other blood cells we demonstrated that G6b, G6f, and LRRC32 are restricted to the platelet lineage, whereas LAT2 and SUCNR1 were also detected in other blood cells. The identification of the succinate receptor SUCNR1 in platelets is of particular interest, because physiologically relevant concentrations of succinate were shown to potentiate the effect of low doses of a variety of platelet agonists.
Funded by: Medical Research Council: MC_U105260799
Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.
Computational Biology Group, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK. J.Marioni@damtp.cam.ac.uk
Background: Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.
Results: We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.
Conclusion: Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.
Funded by: Wellcome Trust
Genome biology 2007;8;10;R228
Renin enhancer is crucial for full response in Renin expression to an in vivo stimulus.
Basic & Clinical Genomics Laboratory, School of Medical Sciences and Bosch Institute, Building F13, University of Sydney, NSW 2006, Australia.
We showed recently that deletion of a strong enhancer located 2.7 kb upstream of the renin gene in mice produces a strain with mild hypotension and salt-sensitivity. Here we set out to compare responses in renin expression in kidney and extrarenal tissues in these "REKO" mice. REKO and wild-type mice were placed on a low NaCl/enalapril regimen for 1 week, and then Ren-1(c) mRNA and renin enzyme activities were measured in tissues and plasma. In untreated REKO mice, renin and Ren-1(c) mRNA were reduced significantly in kidney, submandibular gland, adrenal, heart, and brain. In situ hybridization indicated a marked reduction in Ren-1(c) mRNA in juxtaglomerular cells and granular ducts of submandibular gland. After the chronic stimulus response in renal Ren-1(c) mRNA in REKO mice was blunted by 54% compared with wild-type mice, and was accompanied by almost complete exhaustion of renin stores. Response in plasma renin was blunted by 47%, this being mirrored in heart (54% decline), in which renin is derived mostly from the bloodstream. In adrenal a 55% reduction was seen. These data are consistent with inability of REKO mice to adequately replenish renal renin stores during chronic stimulation of renin secretion. In conclusion, the renin enhancer is critical for replenishment of renin stores and response in renin to a chronic in vivo stimulus.
Hypertension (Dallas, Tex. : 1979) 2007;50;5;933-8
Chromosomally unstable mouse tumours have genomic alterations similar to diverse human cancers.
Department of Medical Oncology, Dana Farber Cancer Institute, Boston, Massachusetts 02115, USA.
Highly rearranged and mutated cancer genomes present major challenges in the identification of pathogenetic events driving the neoplastic transformation process. Here we engineered lymphoma-prone mice with chromosomal instability to assess the usefulness of mouse models in cancer gene discovery and the extent of cross-species overlap in cancer-associated copy number aberrations. Along with targeted re-sequencing, our comparative oncogenomic studies identified FBXW7 and PTEN to be commonly deleted both in murine lymphomas and in human T-cell acute lymphoblastic leukaemia/lymphoma (T-ALL). The murine cancers acquire widespread recurrent amplifications and deletions targeting loci syntenic to those not only in human T-ALL but also in diverse human haematopoietic, mesenchymal and epithelial tumours. These results indicate that murine and human tumours experience common biological processes driven by orthologous genetic events in their malignant evolution. The highly concordant nature of genomic events encourages the use of genomically unstable murine cancer models in the discovery of biological driver events in the human oncogenome.
Funded by: Medical Research Council: G0500389; Wellcome Trust: 077012, 088340
Genetic relatedness of the Streptococcus pneumoniae capsular biosynthetic loci.
Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary's Hospital, Norfolk Place, London W2 1PG, United Kingdom.
Streptococcus pneumoniae (the pneumococcus) produces 1 of 91 capsular polysaccharides (CPS) that define the serotype. The cps loci of 88 pneumococcal serotypes whose CPS is synthesized by the Wzy-dependent pathway were compared with each other and with additional streptococcal polysaccharide biosynthetic loci and were clustered according to the proportion of shared homology groups (HGs), weighted for the sequence similarities between the genes encoding the shared HGs. The cps loci of the 88 pneumococcal serotypes were distributed into eight major clusters and 21 subclusters. All serotypes within the same serogroup fell into the same major cluster, but in six cases, serotypes within the same serogroup were in different subclusters and, conversely, nine subclusters included completely different serotypes. The closely related cps loci within a subcluster were compared to the known CPS structures to relate gene content to structure. The Streptococcus oralis and Streptococcus mitis polysaccharide biosynthetic loci clustered within the pneumococcal cps loci and were in a subcluster that also included the cps locus of pneumococcal serotype 21, whereas the Streptococcus agalactiae cps loci formed a single cluster that was not closely related to any of the pneumococcal cps clusters.
Funded by: Wellcome Trust
Journal of bacteriology 2007;189;21;7841-55
Lamin A/C polymorphisms, type 2 diabetes, and the metabolic syndrome: case-control and quantitative trait studies.
Medical Research Center Epidemiology Unit, Cambridge, U.K.
Mutations in the LMNA gene, encoding the nuclear envelope protein lamin A/C, are responsible for a number of distinct disease entities including Dunnigan-type familial partial lipodystrophy. Dunningan-type lipodystrophy is characterized by loss of subcutaneous adipose tissue, insulin resistance, dyslipidemia, and type 2 diabetes and shares many of the features of the metabolic syndrome. Furthermore, several genome-wide linkage scans for type 2 diabetes have found evidence of linkage at chromosome 1q21.2, the region that harbors the LMNA gene. Therefore, LMNA is a biological and positional candidate for type 2 diabetes susceptibility. Previous studies have reported association between a common LMNA variant (1908C>T; rs4641) and adverse metabolic traits in ethnically diverse populations from Asia and North America. In the present study, we characterized the common variation across the LMNA gene (including rs4641) and tested for association with type 2 diabetes in two large case-control studies (n = 2,052) and with features of the metabolic syndrome in a separate cohort study (n = 1,572). Despite our study being sufficiently powered to detect effects similar and even smaller in magnitude than those previously reported, none of the LMNA single nucleotide polymorphisms were statistically significantly associated with type 2 diabetes or the metabolic syndrome. Thus, it appears unlikely that variation at LMNA substantially increases the risk of type 2 diabetes or related traits in U.K. Europids.
Funded by: Medical Research Council: MC_U106179471, MC_U106179472, MC_U106188470; Wellcome Trust: 077016
Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.
Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA. email@example.com
We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
Funded by: Medical Research Council: MC_U137761446; Wellcome Trust: 062023
Critical assessment of methods of protein structure prediction-Round VII.
Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA. firstname.lastname@example.org
This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.
Funded by: NIGMS NIH HHS: GM072354; NLM NIH HHS: LM07085; Wellcome Trust: 077198
Proteins 2007;69 Suppl 8;3-9
Mouse Phenotype Database Integration Consortium: integration [corrected] of mouse phenome data resources.
Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterize the phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first-line phenotyping data on novel mutations, data on the normal features of inbred lines) at many sites worldwide. For the most efficient use of these data sets, we have initiated a process to develop standards for the description of phenotypes (using ontologies) and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.
Funded by: Medical Research Council: MC_U127527203, MC_U142684171, MC_U142684172, MC_U142684175
Mammalian genome : official journal of the International Mammalian Genome Society 2007;18;3;157-63
New developments in the InterPro database.
EMBL Outstation-European Bioinformatics Institute Hinxton, Cambridge, UK. email@example.com
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Medical Research Council: G0100305; Wellcome Trust: 087656
Nucleic acids research 2007;35;Database issue;D224-8
Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A.
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute of Medical Research, University of Cambridge, CB2 0XY, UK.
The major histocompatibility complex (MHC) on chromosome 6 is associated with susceptibility to more common diseases than any other region of the human genome, including almost all disorders classified as autoimmune. In type 1 diabetes the major genetic susceptibility determinants have been mapped to the MHC class II genes HLA-DQB1 and HLA-DRB1 (refs 1-3), but these genes cannot completely explain the association between type 1 diabetes and the MHC region. Owing to the region's extreme gene density, the multiplicity of disease-associated alleles, strong associations between alleles, limited genotyping capability, and inadequate statistical approaches and sample sizes, which, and how many, loci within the MHC determine susceptibility remains unclear. Here, in several large type 1 diabetes data sets, we analyse a combined total of 1,729 polymorphisms, and apply statistical methods-recursive partitioning and regression-to pinpoint disease susceptibility to the MHC class I genes HLA-B and HLA-A (risk ratios >1.5; P(combined) = 2.01 x 10(-19) and 2.35 x 10(-13), respectively) in addition to the established associations of the MHC class II genes. Other loci with smaller and/or rarer effects might also be involved, but to find these, future searches must take into account both the HLA class II and class I genes and use even larger samples. Taken together with previous studies, we conclude that MHC-class-I-mediated events, principally involving HLA-B*39, contribute to the aetiology of type 1 diabetes.
Funded by: Medical Research Council: G0000934, G0600681; Wellcome Trust: 076113
Modeling insertional mutagenesis using gene length and expression in murine embryonic stem cells.
Department of Medicine, MacDonald Medical Research Laboratories, University of California at Los Angeles, California, USA. firstname.lastname@example.org
Background: High-throughput mutagenesis of the mammalian genome is a powerful means to facilitate analysis of gene function. Gene trapping in embryonic stem cells (ESCs) is the most widely used form of insertional mutagenesis in mammals. However, the rules governing its efficiency are not fully understood, and the effects of vector design on the likelihood of gene-trapping events have not been tested on a genome-wide scale.
Methodology/principal findings: In this study, we used public gene-trap data to model gene-trap likelihood. Using the association of gene length and gene expression with gene-trap likelihood, we constructed spline-based regression models that characterize which genes are susceptible and which genes are resistant to gene-trapping techniques. We report results for three classes of gene-trap vectors, showing that both length and expression are significant determinants of trap likelihood for all vectors. Using our models, we also quantitatively identified hotspots of gene-trap activity, which represent loci where the high likelihood of vector insertion is controlled by factors other than length and expression. These formalized statistical models describe a high proportion of the variance in the likelihood of a gene being trapped by expression-dependent vectors and a lower, but still significant, proportion of the variance for vectors that are predicted to be independent of endogenous gene expression.
Conclusions/significance: The findings of significant expression and length effects reported here further the understanding of the determinants of vector insertion. Results from this analysis can be applied to help identify other important determinants of this important biological phenomenon and could assist planning of large-scale mutagenesis efforts.
Funded by: NHGRI NIH HHS: HG002766; NHLBI NIH HHS: HL66621, U01 HL066621
PloS one 2007;2;7;e617
Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility.
Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 2QQ, UK. email@example.com
A genome-wide association scan in individuals with Crohn's disease by the Wellcome Trust Case Control Consortium detected strong association at four novel loci. We tested 37 SNPs from these and other loci for association in an independent case-control sample. We obtained replication for the autophagy-inducing IRGM gene on chromosome 5q33.1 (replication P = 6.6 x 10(-4), combined P = 2.1 x 10(-10)) and for nine other loci, including NKX2-3, PTPN2 and gene deserts on chromosomes 1q and 5p13.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 072029
Nature genetics 2007;39;7;830-2
Interaction analysis of the CBLB and CTLA4 genes in type 1 diabetes.
Juvenile Diabetes Research Foundation/Wellcome Trust, Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge, UK.
Gene-gene interaction analyses have been suggested as a potential strategy to help identify common disease susceptibility genes. Recently, evidence of a statistical interaction between polymorphisms in two negative immunoregulatory genes, CBLB and CTLA4, has been reported in type 1 diabetes (T1D). This study, in 480 Danish families, reported an association between T1D and a synonymous coding SNP in exon 12 of the CBLB gene (rs3772534 G>A; minor allele frequency, MAF=0.24; derived relative risk, RR for G allele=1.78; P=0.046). Furthermore, evidence of a statistical interaction with the known T1D susceptibility-associated CTLA4 polymorphism rs3087243 (laboratory name CT60, G>A) was reported (P<0.0001), such that the CBLB SNP rs3772534 G allele was overtransmitted to offspring with the CTLA4 rs3087243 G/G genotype. We have, therefore, attempted to obtain additional support for this finding in both large family and case-control collections. In a primary analysis, no evidence for an association of the CBLB SNP rs3772534 with disease was found in either sample set (2162 parent-child trios, P=0.33; 3453 cases and 3655 controls, P=0.69). In the case-only statistical interaction analysis between rs3772534 and rs3087243, there was also no support for an effect (1994 T1D affected offspring, and 3215 cases, P=0.92). These data highlight the need for large, well-characterized populations, offering the possibility of obtaining additional support for initial observations owing to the low prior probability of identifying reproducible evidence of gene-gene interactions in the analysis of common disease-associated variants in human populations.
Funded by: Medical Research Council: G0000934; Wellcome Trust
Journal of leukocyte biology 2007;81;3;581-3
Comparative genomic analysis of three Leishmania species that cause diverse human disease.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org
Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
Funded by: Medical Research Council: G0000508; Wellcome Trust: 076355, 085775
Nature genetics 2007;39;7;839-47
Environmental and genetic modifiers of squint penetrance during zebrafish embryogenesis.
Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
The Nodal-related subgroup of the TGFbeta superfamily of secreted cytokines regulates the specification of the mesodermal and endodermal germ layers during gastrulation. Two Nodal-related proteins - Squint (Sqt) and Cyclops (Cyc) - are expressed during germ-layer specification in zebrafish. Genetic sqt mutant phenotypes have defined a variable requirement for zygotic Sqt, but not for maternal Sqt, in midline mesendoderm development. However a comparison of phenotypes arising from oocytes or zygotes injected with Sqt antisense morpholinos has suggested a novel requirement for maternal Sqt in dorsal specification. In this study we examined maternal-zygotic mutants for each of two sqt alleles and we also compared phenotypes of closely related zygotic and maternal-zygotic sqt mutants. Each of these approaches indicated there is no general requirement for maternal Sqt. To better understand the dispensability of maternal and zygotic Sqt, we sought out developmental contexts that more rigorously demand intact Sqt signalling. We found that sqt penetrance is influenced by genetic modifiers, by environmental temperature, by levels of residual Activin-like activity and by Heat-Shock Protein 90 (HSP90) activity. Therefore, Sqt may confer an evolutionary advantage by protecting early-stage embryos against detrimental interacting alleles and environmental challenges.
Funded by: Intramural NIH HHS: Z01 HG200309-05, Z99 HG999999; Wellcome Trust
Developmental biology 2007;308;2;368-78
Diet and the evolution of human amylase gene copy number variation.
School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona 85287, USA.
Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists consume much less starch. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number-variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.
Funded by: NCRR NIH HHS: C06 RR014491, C06 RR014491-01, C06 RR016483, C06 RR016483-01, RR014491, RR015087, RR016483, U42 RR015087, U42 RR015087-01; Wellcome Trust
Nature genetics 2007;39;10;1256-60
Integrating sequence and structural biology with DAS.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. email@example.com
Background: The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence.
Results: Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources.
Conclusion: Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at http://www.dasregistry.org.
Funded by: Wellcome Trust: 062023, 077198
BMC bioinformatics 2007;8;333
Wnt5a functions in planar cell polarity regulation in mice.
Department of Cell Biology, Emory University School of Medicine, Atlanta, GA 30322, USA.
Planar cell polarity (PCP) refers to the polarization of cells within the plane of a cell sheet. A distinctive epithelial PCP in vertebrates is the uniform orientation of stereociliary bundles of the sensory hair cells in the mammalian cochlea. In addition to establishing epithelial PCP, planar polarization is also required for convergent extension (CE); a polarized cellular movement that occurs during neural tube closure and cochlear extension. Studies in Drosophila and vertebrates have revealed a conserved PCP pathway, including Frizzled (Fz) receptors. Here we use the cochlea as a model system to explore the involvement of known ligands of Fz, Wnt morphogens, in PCP regulation. We show that Wnt5a forms a reciprocal expression pattern with a Wnt antagonist, the secreted frizzled-related protein 3 (Sfrp3 or Frzb), along the axis of planar polarization in the cochlear epithelium. We further demonstrate that Wnt5a antagonizes Frzb in regulating cochlear extension and stereociliary bundle orientation in vitro, and that Wnt5a(-/-) animals have a shortened and widened cochlea. Finally, we show that Wnt5a is required for proper subcellular distribution of a PCP protein, Ltap/Vangl2, and that Wnt5a interacts genetically with Ltap/Vangl2 for uniform orientation of stereocilia, cochlear extension, and neural tube closure. Together, these findings demonstrate that Wnt5a functions in PCP regulation in mice.
Funded by: Medical Research Council: G0300212, MC_QA137918; NIDCD NIH HHS: R01 DC005213, R01 DC005213-06, R01 DC007423, R01 DC007423-01A2; Wellcome Trust
Developmental biology 2007;306;1;121-33
PALB2, which encodes a BRCA2-interacting protein, is a breast cancer susceptibility gene.
Section of Cancer Genetics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK. firstname.lastname@example.org
PALB2 interacts with BRCA2, and biallelic mutations in PALB2 (also known as FANCN), similar to biallelic BRCA2 mutations, cause Fanconi anemia. We identified monoallelic truncating PALB2 mutations in 10/923 individuals with familial breast cancer compared with 0/1,084 controls (P = 0.0004) and show that such mutations confer a 2.3-fold higher risk of breast cancer (95% confidence interval (c.i.) = 1.4-3.9, P = 0.0025). The results show that PALB2 is a breast cancer susceptibility gene and further demonstrate the close relationship of the Fanconi anemia-DNA repair pathway and breast cancer predisposition.
Funded by: Wellcome Trust: 068545/Z/02, 077012
Nature genetics 2007;39;2;165-7
Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia.
Department of Molecular Virology, Immunology, and Medical Genetics, Human Cancer Genetics Program, The Comprehensive Cancer Center at The Ohio State University, Columbus, OH 43214, USA.
The heritability of B cell chronic lymphocytic leukemia (CLL) is relatively high; however, no predisposing mutation has been convincingly identified. We show that loss or reduced expression of death-associated protein kinase 1 (DAPK1) underlies cases of heritable predisposition to CLL and the majority of sporadic CLL. Epigenetic silencing of DAPK1 by promoter methylation occurs in almost all sporadic CLL cases. Furthermore, we defined a disease haplotype, which segregates with the CLL phenotype in a large family. DAPK1 expression of the CLL allele is downregulated by 75% in germline cells due to increased HOXB7 binding. In the blood cells from affected family members, promoter methylation results in additional loss of DAPK1 expression. Thus, reduced expression of DAPK1 can result from germline predisposition, as well as epigenetic or somatic events causing or contributing to the CLL phenotype.
Funded by: NCI NIH HHS: 5U01 CA86389, CA101956, CA110496, CA81534, P30 CA16058, T32 CA106196; Wellcome Trust
Mutations in ZDHHC9, which encodes a palmitoyltransferase of NRAS and HRAS, cause X-linked mental retardation associated with a Marfanoid habitus.
Cambridge Institute of Medical Research, University of Cambridge, Cambridge, CB2 2XY, UK. email@example.com.
We have identified one frameshift mutation, one splice-site mutation, and two missense mutations in highly conserved residues in ZDHHC9 at Xq26.1 in 4 of 250 families with X-linked mental retardation (XLMR). In three of the families, the mental retardation phenotype is associated with a Marfanoid habitus, although none of the affected individuals meets the Ghent criteria for Marfan syndrome. ZDHHC9 is a palmitoyltransferase that catalyzes the posttranslational modification of NRAS and HRAS. The degree of palmitoylation determines the temporal and spatial location of these proteins in the plasma membrane and Golgi complex. The finding of mutations in ZDHHC9 suggests that alterations in the concentrations and cellular distribution of target proteins are sufficient to cause disease. This is the first XLMR gene to be reported that encodes a posttranslational modification enzyme, palmitoyltransferase. Furthermore, now that the first palmitoyltransferase that causes mental retardation has been identified, defects in other palmitoylation transferases become good candidates for causing other mental retardation syndromes.
Funded by: NICHD NIH HHS: HD26202, R01 HD026202; Wellcome Trust
American journal of human genetics 2007;80;5;982-7
Evolutionary and biomedical insights from the rhesus macaque genome.
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA. firstname.lastname@example.org
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
Funded by: NHGRI NIH HHS: R01 HG002939, U54 HG003068, U54 HG003079, U54 HG003273; Wellcome Trust: 062023
Science (New York, N.Y.) 2007;316;5822;222-34
Requirement of bic/microRNA-155 for normal immune function.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
MicroRNAs are a class of small RNAs that are increasingly being recognized as important regulators of gene expression. Although hundreds of microRNAs are present in the mammalian genome, genetic studies addressing their physiological roles are at an early stage. We have shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling. We demonstrate a requirement of bic/microRNA-155 for the function of B and T lymphocytes and dendritic cells. Transcriptome analysis of bic/microRNA-155-deficient CD4+ T cells identified a wide spectrum of microRNA-155-regulated genes, including cytokines, chemokines, and transcription factors. Our work suggests that bic/microRNA-155 plays a key role in the homeostasis and function of the immune system.
Funded by: Medical Research Council: G117/424; Wellcome Trust: 077187
Science (New York, N.Y.) 2007;316;5824;608-11
Genome-wide detection and characterization of positive selection in human populations.
Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02139, USA. email@example.com
With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.
Funded by: Wellcome Trust: 077008, 077011, 077046, 081682
Genomewide association analysis of coronary artery disease.
University of Leicester, Leicester, United Kingdom. firstname.lastname@example.org
Background: Modern genotyping platforms permit a systematic search for inherited components of complex diseases. We performed a joint analysis of two genomewide association studies of coronary artery disease.
Methods: We first identified chromosomal loci that were strongly associated with coronary artery disease in the Wellcome Trust Case Control Consortium (WTCCC) study (which involved 1926 case subjects with coronary artery disease and 2938 controls) and looked for replication in the German MI [Myocardial Infarction] Family Study (which involved 875 case subjects with myocardial infarction and 1644 controls). Data on other single-nucleotide polymorphisms (SNPs) that were significantly associated with coronary artery disease in either study (P<0.001) were then combined to identify additional loci with a high probability of true association. Genotyping in both studies was performed with the use of the GeneChip Human Mapping 500K Array Set (Affymetrix).
Results: Of thousands of chromosomal loci studied, the same locus had the strongest association with coronary artery disease in both the WTCCC and the German studies: chromosome 9p21.3 (SNP, rs1333049) (P=1.80x10(-14) and P=3.40x10(-6), respectively). Overall, the WTCCC study revealed nine loci that were strongly associated with coronary artery disease (P<1.2x10(-5) and less than a 50% chance of being falsely positive). In addition to chromosome 9p21.3, two of these loci were successfully replicated (adjusted P<0.05) in the German study: chromosome 6q25.1 (rs6922269) and chromosome 2q36.3 (rs2943634). The combined analysis of the two studies identified four additional loci significantly associated with coronary artery disease (P<1.3x10(-6)) and a high probability (>80%) of a true association: chromosomes 1p13.3 (rs599839), 1q41 (rs17465637), 10q11.21 (rs501120), and 15q22.33 (rs17228212).
Conclusions: We identified several genetic loci that, individually and in aggregate, substantially affect the risk of development of coronary artery disease.
Funded by: Medical Research Council: G0501942, G9806740; Wellcome Trust: 076113, 077011
The New England journal of medicine 2007;357;5;443-53
Common variants in WFS1 confer risk of type 2 diabetes.
UK Medical Research Council (MRC) Epidemiology Unit, Strangeways Research Laboratory, Cambridge CB1 8RN, UK. email@example.com
We studied genes involved in pancreatic beta cell function and survival, identifying associations between SNPs in WFS1 and diabetes risk in UK populations that we replicated in an Ashkenazi population and in additional UK studies. In a pooled analysis comprising 9,533 cases and 11,389 controls, SNPs in WFS1 were strongly associated with diabetes risk. Rare mutations in WFS1 cause Wolfram syndrome; using a gene-centric approach, we show that variation in WFS1 also predisposes to common type 2 diabetes.
Funded by: Medical Research Council: G0500070, MC_U106179471; Wellcome Trust: 068545/z/02, 077016
Nature genetics 2007;39;8;951-3
Challenges and standards in integrating surveys of structural variation.
The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, 101 College Street, Room 14-701, Ontario M5G 1L7, Canada. firstname.lastname@example.org
There has been an explosion of data describing newly recognized structural variants in the human genome. In the flurry of reporting, there has been no standard approach to collecting the data, assessing its quality or describing identified features. This risks becoming a rampant problem, in particular with respect to surveys of copy number variation and their application to disease studies. Here, we consider the challenges in characterizing and documenting genomic structural variants. From this, we derive recommendations for standards to be adopted, with the aim of ensuring the accurate presentation of this form of genetic variation to facilitate ongoing research.
Funded by: Wellcome Trust: 077008, 077014
Nature genetics 2007;39;7 Suppl;S7-15
Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom;
Clostridium botulinum is a heterogeneous Gram-positive species that comprises four genetically and physiologically distinct groups of bacteria that share the ability to produce botulinum neurotoxin, the most poisonous toxin known to man, and the causative agent of botulism, a severe disease of humans and animals. We report here the complete genome sequence of a representative of Group I (proteolytic) C. botulinum (strain Hall A, ATCC 3502). The genome consists of a chromosome (3,886,916 bp) and a plasmid (16,344 bp), which carry 3650 and 19 predicted genes, respectively. Consistent with the proteolytic phenotype of this strain, the genome harbors a large number of genes encoding secreted proteases and enzymes involved in uptake and metabolism of amino acids. The genome also reveals a hitherto unknown ability of C. botulinum to degrade chitin. There is a significant lack of recently acquired DNA, indicating a stable genomic content, in strong contrast to the fluid genome of Clostridium difficile, which can form longer-term relationships with its host. Overall, the genome indicates that C. botulinum is adapted to a saprophytic lifestyle both in soil and aquatic environments. This pathogen relies on its toxin to rapidly kill a wide range of prey species, and to gain access to nutrient sources, it releases a large number of extracellular enzymes to soften and destroy rotting or decayed tissues.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D522797/1; Medical Research Council: G0700837; Wellcome Trust
Genome research 2007;17;7;1082-92
A more convenient truth.
Nature reviews. Microbiology 2007;5;4;248-50
Different evolutionary histories of the two classical class I genes BF1 and BF2 illustrate drift and selection within the stable MHC haplotypes of chickens.
Institute for Animal Health, Compton, Berkshire, United Kingdom.
Compared with the MHC of typical mammals, the chicken MHC (BF/BL region) of the B12 haplotype is smaller, simpler, and rearranged, with two classical class I genes of which only one is highly expressed. In this study, we describe the development of long-distance PCR to amplify some or all of each class I gene separately, allowing us to make the following points. First, six other haplotypes have the same genomic organization as B12, with a poorly expressed (minor) BF1 gene between DMB2 and TAP2 and a well-expressed (major) BF2 gene between TAP2 and C4. Second, the expression of the BF1 gene is crippled in three different ways in these haplotypes: enhancer A deletion (B12, B19), enhancer A divergence and transcription start site deletion (B2, B4, B21), and insertion/rearrangement leading to pseudogenes (B14, B15). Third, the three kinds of alterations in the BF1 gene correspond to dendrograms of the BF1 and poorly expressed class II B (BLB1) genes reflecting mostly neutral changes, while the dendrograms of the BF2 and well-expressed class II (BLB2) genes each have completely different topologies reflecting selection. The common pattern for the poorly expressed genes reflects the fact the BF/BL region undergoes little recombination and allows us to propose a pattern of descent for these chicken MHC haplotypes from a common ancestor. Taken together, these data explain how stable MHC haplotypes predominantly express a single class I molecule, which in turn leads to striking associations of the chicken MHC with resistance to infectious pathogens and response to vaccines.
Funded by: Wellcome Trust
Journal of immunology (Baltimore, Md. : 1950) 2007;178;9;5744-52
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.
The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA.
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
Funded by: NHGRI NIH HHS: R01 HG002779-05, R01 HG002779-06, R01 HG004037, R01 HG004037-01A1; NIGMS NIH HHS: R01 GM067031, R01 GM067031-04, R01 GM083300
Salmonella enterica serovar typhimurium exploits inflammation to compete with the intestinal microbiota.
Institute of Microbiology, Swiss Institute of Technology Zurich, Zurich, Switzerland.
Most mucosal surfaces of the mammalian body are colonized by microbial communities ("microbiota"). A high density of commensal microbiota inhabits the intestine and shields from infection ("colonization resistance"). The virulence strategies allowing enteropathogenic bacteria to successfully compete with the microbiota and overcome colonization resistance are poorly understood. Here, we investigated manipulation of the intestinal microbiota by the enteropathogenic bacterium Salmonella enterica subspecies 1 serovar Typhimurium (S. Tm) in a mouse colitis model: we found that inflammatory host responses induced by S. Tm changed microbiota composition and suppressed its growth. In contrast to wild-type S. Tm, an avirulent invGsseD mutant failing to trigger colitis was outcompeted by the microbiota. This competitive defect was reverted if inflammation was provided concomitantly by mixed infection with wild-type S. Tm or in mice (IL10(-/-), VILLIN-HA(CL4-CD8)) with inflammatory bowel disease. Thus, inflammation is necessary and sufficient for overcoming colonization resistance. This reveals a new concept in infectious disease: in contrast to current thinking, inflammation is not always detrimental for the pathogen. Triggering the host's immune defence can shift the balance between the protective microbiota and the pathogen in favour of the pathogen.
PLoS biology 2007;5;10;2177-89
Relative impact of nucleotide and copy number variation on gene expression phenotypes.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely, single-nucleotide polymorphisms (SNPs). In recent years, another type of common genetic variation has been characterized, namely, structural variation, including copy number variants (CNVs). To determine the overall contribution of CNVs to complex phenotypes, we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.
Funded by: Wellcome Trust: 065535, 076113, 077009, 077014, 077046
Science (New York, N.Y.) 2007;315;5813;848-53
Population genomics of human gene expression.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.
Funded by: Wellcome Trust: 077011, 077046
Nature genetics 2007;39;10;1217-24
Replication timing profile reflects the distinct functional and genomic features of the MHC class II region.
Human Cytogenetics Laboratory, Cancer Research, UK London Research Institute, London, UK.
The timing of DNA replication generally correlates with transcription, gene density and sequence composition. How is the timing affected if a genomic region has a combination of features that individually correlate with either early or late replication? The major histocompatibility complex (MHC) class II region is an AT-rich isochore that would be expected to replicate late, but it also contains coordinately regulated genes that are highly expressed in antigen-presenting cells and are strongly inducible in other cell types. Using cytological and biochemical assays, we find that the entire MHC replicates within the first half of S-phase, and that the class II region replicates slightly later than the adjacent regions irrespective of gene expression. These data suggest that despite AT-richness, an early-to-middle replication time in the class II region is defined by an open chromatin conformation that allows rapid transcriptional activation as a defence against pathogens.
Funded by: Cancer Research UK: A8318, C5321/A8318; Medical Research Council: MC_U120027516
Cell cycle (Georgetown, Tex.) 2007;6;19;2393-8
Analysis of genetic variation in Akt2/PKB-beta in severe insulin resistance, lipodystrophy, type 2 diabetes, and related metabolic phenotypes.
Metabolic Disease Group, Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, U.K.
We previously reported a family in which a heterozygous missense mutation in Akt2 led to a dominantly inherited syndrome of insulin-resistant diabetes and partial lipodystrophy. To determine whether genetic variation in AKT2 plays a broader role in human metabolic disease, we sequenced the entire coding region and splice junctions of AKT2 in 94 unrelated patients with severe insulin resistance, 35 of whom had partial lipodystrophy. Two rare missense mutations (R208K and R467W) were identified in single individuals. However, insulin-stimulated kinase activities of these variants were indistinguishable from wild type. In two large case-control studies (total number of participants 2,200), 0 of 11 common single nucleotide polymorphism (SNPs) in AKT2 showed significant association with type 2 diabetes. In a quantitative trait study of 1,721 extensively phenotyped individuals from the U.K., no association was found with any relevant intermediate metabolic trait. In summary, although heterozygous loss-of- function mutations in AKT2 can cause a syndrome of severe insulin resistance and lipodystrophy in humans, such mutations are uncommon causes of these syndromes. Furthermore, genetic variation in and around the AKT2 locus is unlikely to contribute significantly to the risk of type 2 diabetes or related intermediate metabolic traits in U.K. populations.
Funded by: Medical Research Council: MC_U106179471; Wellcome Trust: 077016
Mutations in UPF3B, a member of the nonsense-mediated mRNA decay complex, cause syndromic and nonsyndromic mental retardation.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Nonsense-mediated mRNA decay (NMD) is of universal biological significance. It has emerged as an important global RNA, DNA and translation regulatory pathway. By systematically sequencing 737 genes (annotated in the Vertebrate Genome Annotation database) on the human X chromosome in 250 families with X-linked mental retardation, we identified mutations in the UPF3 regulator of nonsense transcripts homolog B (yeast) (UPF3B) leading to protein truncations in three families: two with the Lujan-Fryns phenotype and one with the FG phenotype. We also identified a missense mutation in another family with nonsyndromic mental retardation. Three mutations lead to the introduction of a premature termination codon and subsequent NMD of mutant UPF3B mRNA. Protein blot analysis using lymphoblastoid cell lines from affected individuals showed an absence of the UPF3B protein in two families. The UPF3B protein is an important component of the NMD surveillance machinery. Our results directly implicate abnormalities of NMD in human disease and suggest at least partial redundancy of NMD pathways.
Funded by: NICHD NIH HHS: HD26202, R01 HD026202; Wellcome Trust: 077012
Nature genetics 2007;39;9;1127-33
A genotype calling algorithm for the Illumina BeadArray platform.
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK. email@example.com
Motivation: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes.
Results: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy.
Availability: The C++ executable for the algorithm described here is available by request from the authors.
Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 077011, 082370
Bioinformatics (Oxford, England) 2007;23;20;2741-6
Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet.
Centre for Ecology and Hydrology-Oxford, Oxford, UK.
The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other gamma-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood.
Funded by: Wellcome Trust: 082372
The ISME journal 2007;1;4;331-40
Rheumatoid arthritis association at 6q23.
Arthritis Research Campaign (arc)-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester M13 9PT, UK.
The Wellcome Trust Case Control Consortium (WTCCC) identified nine single SNPs putatively associated with rheumatoid arthritis at P = 1 x 10(-5) - 5 x 10(-7) in a genome-wide association screen. One, rs6920220, was unequivocally replicated (trend P = 1.1 x 10(-8)) in a validation study, as described here. This SNP maps to 6q23, between the genes oligodendrocyte lineage transcription factor 3 (OLIG3) and tumor necrosis factor-alpha-induced protein 3 (TNFAIP3).
Funded by: Arthritis Research UK: 17552; Medical Research Council: G0000934, G0000934(68341); Wellcome Trust: 068545, 068545/Z/02, 076113, 090532
Nature genetics 2007;39;12;1431-3
Convergent adaptation of human lactase persistence in Africa and Europe.
Department of Biology, University of Maryland, College Park, Maryland 20742, USA. Tishkoff@umd.edu
A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, but the genetic basis of lactase persistence in Africans was previously unknown. We conducted a genotype-phenotype association study in 470 Tanzanians, Kenyans and Sudanese and identified three SNPs (G/C-14010, T/G-13915 and C/G-13907) that are associated with lactase persistence and that have derived alleles that significantly enhance transcription from the LCT promoter in vitro. These SNPs originated on different haplotype backgrounds from the European C/T-13910 SNP and from each other. Genotyping across a 3-Mb region demonstrated haplotype homozygosity extending >2.0 Mb on chromosomes carrying C-14010, consistent with a selective sweep over the past approximately 7,000 years. These data provide a marked example of convergent evolution due to strong selective pressure resulting from shared cultural traits-animal domestication and adult milk consumption.
Funded by: NHGRI NIH HHS: F32 HG003801, F32HG03801, HG002772-1, R01 HG002772; NIGMS NIH HHS: R01 GM076637, R01GM076637; Wellcome Trust: 076113
Nature genetics 2007;39;1;31-40
Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes.
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0XY, UK. firstname.lastname@example.org
The Wellcome Trust Case Control Consortium (WTCCC) primary genome-wide association (GWA) scan on seven diseases, including the multifactorial autoimmune disease type 1 diabetes (T1D), shows associations at P < 5 x 10(-7) between T1D and six chromosome regions: 12q24, 12q13, 16p13, 18p11, 12p13 and 4q27. Here, we attempted to validate these and six other top findings in 4,000 individuals with T1D, 5,000 controls and 2,997 family trios independent of the WTCCC study. We confirmed unequivocally the associations of 12q24, 12q13, 16p13 and 18p11 (P(follow-up) <or= 1.35 x 10(-9); P(overall) <or= 1.15 x 10(-14)), leaving eight regions with small effects or false-positive associations. We also obtained evidence for chromosome 18q22 (P(overall) = 1.38 x 10(-8)) from a GWA study of nonsynonymous SNPs. Several regions, including 18q22 and 18p11, showed association with autoimmune thyroid disease. This study increases the number of T1D loci with compelling evidence from six to at least ten.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 061858, 061859, 089989
Nature genetics 2007;39;7;857-64
Look who's talking too: graduates developing skills through communication.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. email@example.com
Greater opportunities for young scientists to present their doctoral research to large general audiences will encourage development of transferable skills and involvement in the scientific community. We look at ways students communicate their research and explore the benefits of student-led meetings. The organization of the first Sanger-Cambridge Ph.D. Symposium provides an example of how students can act to establish forums for their work and we call on other young scientists to do the same.
Nature reviews. Genetics 2007;8;9;724-6
The implications of alternative splicing in the ENCODE protein complement.
Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain. firstname.lastname@example.org
Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.
Funded by: Wellcome Trust: 062023, 077198
Proceedings of the National Academy of Sciences of the United States of America 2007;104;13;5495-500
Network activity-independent coordinated gene expression program for synapse assembly.
Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.
Global biological datasets generated by genomics, transcriptomics, and proteomics provide new approaches to understanding the relationship between the genome and the synapse. Combined transcriptome analysis and multielectrode recordings of neuronal network activity were used in mouse embryonic primary neuronal cultures to examine synapse formation and activity-dependent gene regulation. Evidence for a coordinated gene expression program for assembly of synapses was observed in the expression of 642 genes encoding postsynaptic and plasticity proteins. This synaptogenesis gene expression program preceded protein expression of synapse markers and onset of spiking activity. Continued expression was followed by maturation of morphology and electrical neuronal networks, which was then followed by the expression of activity-dependent genes. Thus, two distinct sequentially active gene expression programs underlie the genomic programs of synapse function.
Funded by: Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2007;104;11;4658-63
The Ras-association domain family (RASSF) members and their role in human tumourigenesis.
Experimental Cancer Genetics Laboratory, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK. email@example.com
Ras proteins play a direct causal role in human cancer with activating mutations in Ras occurring in approximately 30% of tumours. Ras effectors also contribute to cancer, as mutations occur in Ras effectors, notably B-Raf and PI3-K, and drugs blocking elements of these pathways are in clinical development. In 2000, a new Ras effector was identified, RAS-association domain family 1 (RASSF1), and expression of the RASSF1A isoform of this gene is silenced in tumours by methylation of its promoter. Since methylation is reversible and demethylating agents are currently being used in clinical trials, detection of RASSF1A silencing by promoter hypermethylation has potential clinical uses in cancer diagnosis, prognosis and treatment. RASSF1A belongs to a new family of RAS effectors, of which there are currently 8 members (RASSF1-8). RASSF1-6 each contain a variable N-terminal segment followed by a Ras-association (RA) domain of the Ral-GDS/AF6 type, and a specialised coiled-coil structure known as a SARAH domain extending to the C-terminus. RASSF7-8 contain an N-terminal RA domain and a variable C-terminus. Members of the RASSF family are thought to function as tumour suppressors by regulating the cell cycle and apoptosis. This review will summarise our current knowledge of each member of the RASSF family and in particular what role they play in tumourigenesis, with a special focus on RASSF1A, whose promoter methylation is one of the most frequent alterations found in human tumours.
Funded by: Cancer Research UK: A6997; Wellcome Trust
Biochimica et biophysica acta 2007;1776;1;58-85
A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21.
Centre for Gastroenterology, Institute of Cell and Molecular Science, Queen Mary University of London, London E1 2AT, UK. firstname.lastname@example.org
We tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls. Outside the HLA region, the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block. We independently confirmed association in two further collections (strongest association at rs6822844, 24 kb 5' of IL21; meta-analysis P = 1.3 x 10(-14), odds ratio = 0.63), suggesting that genetic variation in this region predisposes to celiac disease.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068094, 068545/Z/02, GR068094MA
Nature genetics 2007;39;7;827-9
Definition of a minimal region of deletion of chromosome 7 in uterine leiomyomas by tiling-path microarray CGH and mutation analysis of known genes in this region.
Department of Medical Genetics, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland.
Somatic interstitial deletions of chromosome segment 7q22-q31 in uterine leiomyomas are a frequent event, thought to be indicative of a tumor suppressor gene in the region. Previous LOH and CGH studies have refined this region to 7q22.3-q31, although the target gene has not been identified. Here, we have used tiling-path resolution microarray CGH to further refine the region and to identify homozygous deletions in fibroids. Furthermore, we have screened all manually annotated genes in the region for mutations. We have refined the minimum deleted region at 7q22.3-q31 to 2.79 Mbp and identified a second region of deletion at 7q34. However, we identified no pathogenic coding variation.
Genes, chromosomes & cancer 2007;46;5;451-8
Distinct cytokine-driven responses of activated blood gammadelta T cells: insights into unconventional T cell pleiotropy.
Peter Gorer Department of Immunobiology, Guy's, King's and St. Thomas' Medical School, King's College London, London, UK.
Human Vgamma9/Vdelta2 T cells comprise a small population of peripheral blood T cells that in many infectious diseases respond to the microbial metabolite, (E)-4-hydroxy-3-methyl-but-2-enyl pyrophosphate (HMB-PP), expanding to up to 50% of CD3(+) cells. This "transitional response," occurring temporally between the rapid innate and slower adaptive response, is widely viewed as proinflammatory and/or cytolytic. However, increasing evidence that different cytokines drive widely different effector functions in alphabeta T cells provoked us to apply cDNA microarrays to explore the potential pleiotropy of HMB-PP-activated Vgamma9/Vdelta2 T cells. The data and accompanying validations show that the related cytokines, IL-2, IL-4, or IL-21, each drive proliferation and comparable CD69 up-regulation but induce distinct effector responses that differ from prototypic alphabeta T cell responses. For example, the Th1-like response to IL-2 also includes expression of IL-5 and IL-13 that conversely are not induced by IL-4. The data identify specific molecules that may mediate gammadelta T cell effects. Thus, IL-21 induces a lymphoid-homing phenotype and high, unexpected expression of the follicular B cell-attracting chemokine CXCL13/BCA-1, suggesting a novel follicular B-helper-like T cell that may play a hitherto underappreciated role in humoral immunity early in infection. Such broad plasticity emphasizes the capacity of gammadelta T cells to influence the nature of the immune response to different challenges and has implications for the ongoing clinical application of cytokines together with Vgamma9/Vdelta2 TCR agonists.
Funded by: Wellcome Trust: 071534
Journal of immunology (Baltimore, Md. : 1950) 2007;178;7;4304-14
microRNA-155 regulates the generation of immunoglobulin class-switched plasma cells.
Laboratory of Lymphocyte Signalling and Development, The Babraham Institute, Cambridge, CB22 3AT, UK. email@example.com
microRNA-155 (miR-155) is expressed by cells of the immune system after activation and has been shown to be required for antibody production after vaccination with attenuated Salmonella. Here we show the intrinsic requirement for miR-155 in B cell responses to thymus-dependent and -independent antigens. B cells lacking miR-155 generated reduced extrafollicular and germinal center responses and failed to produce high-affinity IgG1 antibodies. Gene-expression profiling of activated B cells indicated that miR-155 regulates an array of genes with diverse function, many of which are predicted targets of miR-155. The transcription factor Pu.1 is validated as a direct target of miR155-mediated inhibition. When Pu.1 is overexpressed in wild-type B cells, fewer IgG1 cells are produced, indicating that loss of Pu.1 regulation is a contributing factor to the miR-155-deficient phenotype. Our results implicate post-transcriptional regulation of gene expression for establishing the terminal differentiation program of B cells.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/00001206, BBS/E/B/0000C223, BBS/E/B/0000M206; Medical Research Council: G0700287, G117/424, G8402371, MC_U105178806; Wellcome Trust: 079643
Say hello to our little friends.
Nature reviews. Microbiology 2007;5;8;572-3
This place is big enough for both of us.
Nature reviews. Microbiology 2007;5;2;90-2
Nature reviews. Microbiology 2007;5;10;748-9
A recessive genetic screen for host factors required for retroviral infection in a library of insertionally mutated Blm-deficient embryonic stem cells.
Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, PR China. firstname.lastname@example.org
Background: Host factors required for retroviral infection are potential targets for the modulation of diseases caused by retroviruses. During the retroviral life cycle, numerous cellular factors interact with the virus and play an essential role in infection. Cultured embryonic stem (ES) cells are susceptible to retroviral infection, therefore providing access to all of the genes required for this process to take place. In order to identify the host factors involved in retroviral infection, we designed and implemented a scheme for identifying ES cells that are resistant to retroviral infection and subsequent cloning of the mutated gene.
Results: A library of mutant ES cells was established by genome-wide insertional mutagenesis in Blm-deficient ES cells, and a screen was performed by superinfection of the library at high multiplicity with a recombinant retrovirus carrying a positive and negative selection cassette. Stringent negative selection was then used to exclude the infected ES cells. We successfully recovered five independent clones of ES cells that are resistant to retroviral infection. Analysis of the mutations in these clones revealed four different homozygous and one compound heterozygous mutation in the mCat-1 locus, which confirms that mCat-1 is the ecotropic murine leukemia virus receptor in ES cells.
Conclusion: We have demonstrated the feasibility and reliability of this recessive genetic approach to identifying critical genes required for retroviral infection in ES cells; the approach provides a unique opportunity to recover other cellular factors required for retroviral infection. The resulting insertionally mutated Blm-deficient ES cell library might also provide access to essential host cell components that are required for infection and replication for other types of virus.
Funded by: Wellcome Trust
Genome biology 2007;8;4;R48
A Sall4 mutant mouse model useful for studying the role of Sall4 in early embryonic development and organogenesis.
SALL4 is a homologue of the Drosophila homeotic gene spalt, a zinc finger transcription factor, required for inner cell mass proliferation in early embryonic development. It also interacts with other transcription factors to control the development of the anorectal region, kidney, heart, limbs, and brain. Truncating mutations in SALL4 cause Okihiro syndrome, manifest as Duane anomaly, radial ray defects and sensorineural and conductive deafness. We report the characterization of a novel murine Sall4 null allele created by bacterial recombineering in ES cells. Homozygous mutant mice exhibit early embryonic lethality. Heterozygous mutant mice recapitulate phenotypic features of Okihiro syndrome including deafness, lower anogenital tract abnormalities, renal hypoplasia, anencephaly, Hirschprung's disease, and skeletal defects. This phenotype shows important differences in cardiac and ear manifestations to previously characterized Sall4 mutant alleles and should prove useful for the investigation of the influence of modifier alleles and protein interactions on the transcriptional regulatory function of Sall4.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust: 077187
Genesis (New York, N.Y. : 2000) 2007;45;1;51-8
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.
Funded by: Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0100594, G0501942, G0600329, G0600705, G0800759, G0901461, G19/9, G90/106, G9806740, G9810900; Wellcome Trust: 076113, 077011, 090532
Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.
Genetic Epidemiology Group, Department of Health Sciences, University of Leicester, Adrian Building, University Road, Leicester LE1 7RH, UK.
We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.
Funded by: Arthritis Research UK: 17552; Cancer Research UK: A4994; Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0501942, G0600329, G0600705, G0701003, G0800759, G19/9, G90/106, G9810900; Multiple Sclerosis Society: 730; NCRR NIH HHS: M01 RR000425, UL1 RR024148; NIAMS NIH HHS: R01 AR046208, R01 AR048465; Wellcome Trust: 057097, 076113, 081682, 089989, 090532
Nature genetics 2007;39;11;1329-37
Esophageal atresia, hypoplasia of zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation--new MCA/MR syndrome in two affected sibs and a mildly affected mother?
Institut für Humangenetik, Universitätsklinikum Essen, Germany, and Department of Medical Genetics, Addenbrooke's Hospital, Cambridge, UK. email@example.com
The previously undescribed combination of esophageal atresia, hypoplasia of the zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation was diagnosed in two siblings of different sexes, with the brother being more severely affected. The mother presented with zygomatic arch hypoplasia of the right side only. We discuss major differential diagnoses: Goldenhar, Feingold, CHARGE, and Treacher Collins syndromes show a few overlapping clinical features, but these diagnoses are unlikely as the clinical findings are unusual for Goldenhar syndrome and mutational screening of the MYCN, the CHD7, and the TCOF1 genes did not reveal any abnormalities. Autosomal recessive oto-facial syndrome, hypomandibular faciocranial dysostosis, and Ozkan syndromes were clinically excluded. A microdeletion 22q11.2 was excluded by FISH analysis, a microdeletion 2p23-p24 by microsatellite analyses, a subtelomeric chromosomal aberration by MLPA, and a small genomic deletion/duplication by CGH array. As X-inactivation studies did not show skewed X-inactivation in the mother, we consider X-chromosomal recessive inheritance of this condition less likely. We discuss autosomal dominant inheritance with variable expressivity or mosaicism in the mother as the likely genetic mechanism in this new multiple congenital anomaly/mental retardation (MCA/MR) syndrome.
American journal of medical genetics. Part A 2007;143A;11;1135-42
The Israeli-Palestinian Science Organization.
Science (New York, N.Y.) 2007;315;5808;39
Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models.
Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA. firstname.lastname@example.org
Motivation: Finding the regulatory modules for transcription factors binding is an important step in elucidating the complex molecular mechanisms underlying regulation of gene expression. There are numerous methods available for solving this problem, however, very few of them take advantage of the increasing availability of comparative genomic data.
Results: We develop a method for finding regulatory modules in Eukaryotic species using phylogenetic data. Using computer simulations and analysis of real data, we show that the use of phylogenetic hidden Markov model can lead to an increase in accuracy of prediction over methods that do not take advantage of the data from multiple species.
Availability: The new method is made accessible under GPL in a new publicly available JAVA program: EvoPromoter. It can be downloaded at http://sourceforge.net/projects/evopromoter/.
Funded by: Wellcome Trust
Bioinformatics (Oxford, England) 2007;23;16;2031-7
Interleukin-2 gene variation impairs regulatory T cell function and causes autoimmunity.
Julia McFarlane Diabetes Research Centre (JMDRC) and Department of Microbiology and Infectious Diseases, Institute of Inflammation, Infection and Immunity, Faculty of Medicine, The University of Calgary, Calgary, Alberta T2N 4N1, Canada.
Autoimmune diseases are thought to result from imbalances in normal immune physiology and regulation. Here, we show that autoimmune disease susceptibility and resistance alleles on mouse chromosome 3 (Idd3) correlate with differential expression of the key immunoregulatory cytokine interleukin-2 (IL-2). In order to test directly that an approximately twofold reduction in IL-2 underpins the Idd3-linked destabilization of immune homeostasis, we show that engineered haplodeficiency of Il2 gene expression not only reduces T cell IL-2 production by twofold but also mimics the autoimmune dysregulatory effects of the naturally occurring susceptibility alleles of Il2. Reduced IL-2 production achieved by either genetic mechanism correlates with reduced function of CD4(+) CD25(+) regulatory T cells, which are critical for maintaining immune homeostasis.
Funded by: Wellcome Trust: 061859
Nature genetics 2007;39;3;329-37
Genome-wide association study of prostate cancer identifies a second risk locus at 8q24.
SAIC-Frederick, National Cancer Institute (NCI)-Frederick Cancer Research and Development Center, Frederick, Maryland 21702, USA.
Recently, common variants on human chromosome 8q24 were found to be associated with prostate cancer risk. While conducting a genome-wide association study in the Cancer Genetic Markers of Susceptibility project with 550,000 SNPs in a nested case-control study (1,172 cases and 1,157 controls of European origin), we identified a new association at 8q24 with an independent effect on prostate cancer susceptibility. The most significant signal is 70 kb centromeric to the previously reported SNP, rs1447295, but shows little evidence of linkage disequilibrium with it. A combined analysis with four additional studies (total: 4,296 cases and 4,299 controls) confirms association with prostate cancer for rs6983267 in the centromeric locus (P = 9.42 x 10(-13); heterozygote odds ratio (OR): 1.26, 95% confidence interval (c.i.): 1.13-1.41; homozygote OR: 1.58, 95% c.i.: 1.40-1.78). Each SNP remained significant in a joint analysis after adjusting for the other (rs1447295 P = 1.41 x 10(-11); rs6983267 P = 6.62 x 10(-10)). These observations, combined with compelling evidence for a recombination hotspot between the two markers, indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry. We estimate that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked by rs1447295 (21% versus 9%).
Funded by: CCR NIH HHS: N01-RC-37004, N01-RC-45035; Intramural NIH HHS; NCI NIH HHS: 5U01CA098233-04, CA55075, N01-CN-45165, T32 CA 09001, U01 CA098710; Wellcome Trust
Nature genetics 2007;39;5;645-9
Insights into modern disease from our distant evolutionary past.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. email@example.com
An EMBO workshop entitled 'Human Evolution and Disease' was held recently (6-9 December 2006, Hyderabad, India) where 141 scientists from many disciplines came together to discuss recent studies of human variation, origins and dispersal, natural selection and disease susceptibility. The meeting tackled the subject of human evolution and disease from the different perspectives of archaeology, linguistics, genetics and genomics based on both new and publicly available data sets. In this report, we highlight the latest fashion crazes in the discipline, in particular, the use of large public data sets and new methods to analyse modern human variation and the links between human evolution and disease susceptibility.
European journal of human genetics : EJHG 2007;15;5;603-6
The V103I polymorphism of the MC4R gene and obesity: population based studies and meta-analysis of 29 563 individuals.
MRC Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK. firstname.lastname@example.org
Background: Previous studies have suggested that a variant in the melanocortin-4 receptor (MC4R) gene is important in protecting against common obesity. Larger studies are needed, however, to confirm this relation.
Methods: We assessed the association between the V103I polymorphism in the MC4R gene and obesity in three UK population based cohort studies, totalling 8304 individuals. We also did a meta-analysis of relevant studies, involving 10 975 cases and 18 588 controls, to place our findings in context.
Finding: In an analysis of all studies, individuals carrying the isoleucine allele had an 18% (95% confidence interval 4-30%, P=0.015) lower risk of obesity compared with non-carriers. There was no heterogeneity among studies and no apparent publication bias.
Interpretation: This study confirms that the V103I polymorphism protects against human obesity at a population level. As such it provides proof of principle that specific gene variants may, at least in part, explain susceptibility and resistance to common forms of human obesity. A better understanding of the mechanisms underlying this association will help determine whether changes in MC4R activity have therapeutic potential.
Funded by: Medical Research Council: G0100103, G9824984, MC_U106179471, MC_U106188470; Wellcome Trust: 068086, 077016
International journal of obesity (2005) 2007;31;9;1437-41
A new function for the fragile X mental retardation protein in regulation of PSD-95 mRNA stability.
Dipartimento di Biologia, Università Tor Vergata, Via della Ricerca Scientifica 1, 00133 Rome, Italy.
Fragile X syndrome (FXS) results from the loss of the fragile X mental retardation protein (FMRP), an RNA-binding protein that regulates a variety of cytoplasmic mRNAs. FMRP regulates mRNA translation and may be important in mRNA localization to dendrites. We report a third cytoplasmic regulatory function for FMRP: control of mRNA stability. In mice, we found that FMRP binds, in vivo, the mRNA encoding PSD-95, a key molecule that regulates neuronal synaptic signaling and learning. This interaction occurs through the 3' untranslated region of the PSD-95 (also known as Dlg4) mRNA, increasing message stability. Moreover, stabilization is further increased by mGluR activation. Although we also found that the PSD-95 mRNA is synaptically localized in vivo, localization occurs independently of FMRP. Through our functional analysis of this FMRP target we provide evidence that dysregulation of mRNA stability may contribute to the cognitive impairments in individuals with FXS.
Funded by: Telethon: GGP05269; Wellcome Trust: 056523, 077155
Nature neuroscience 2007;10;5;578-87
Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes.
Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford, OX3 7LJ, UK.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1924 diabetic cases and 2938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3757 additional cases and 5346 controls and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B, and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insight into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
Funded by: Medical Research Council: G0000934, G0500070; Wellcome Trust: 083948, 090532
Science (New York, N.Y.) 2007;316;5829;1336-41
Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA. email@example.com
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
Funded by: NCI NIH HHS: N01CO12400; NHGRI NIH HHS: U01 HG003147, U01 HG003150, U01 HG003156, U01HG03147, U01HG03150, U01HG03156; PHS HHS: N01C012400; Wellcome Trust: 077198
Genome research 2007;17;6;839-51