Sanger Institute - Publications 2006

Number of papers published in 2006: 93

  • Renin enhancer is critical for control of renin gene expression and cardiovascular function.

    Adams DJ, Head GA, Markus MA, Lovicu FJ, van der Weyden L, Köntgen F, Arends MJ, Thiru S, Mayorov DN and Morris BJ

    School of Medical Sciences and Bosch Institute, University of Sydney, Sydney, New South Wales 2006, Australia.

    The important cardiovascular regulator renin contains a strong in vitro enhancer 2.7 kb upstream of its gene. Here we tested the in vivo role of the mouse Ren-1c enhancer. In renin-expressing As4.1 cells stably transfected with Ren-1c promoter with or without enhancer, expression of linked beta-geo reporter, stable expression, and colony formation were dependent on the presence of the enhancer. We then generated mice carrying a targeted deletion of the enhancer (REKO mice) and found marked depletion of renin in renal juxtaglomerular and submandibular ductal cells, as well as hyperplasia of macula densa cells. Plasma creatinine was increased, but electrolytes were normal. Male REKO mice implanted with telemetry devices had 9 +/- 1 mm Hg lower mean arterial pressure (p < 0.001), which was partly normalized by a high NaCl diet. Locomotor activity was lower, and baroreflex sensitivity was normal. Markedly reduced mean arterial pressure variability in the midfrequency band indicated a contribution of reduced sympathetic vasomotor tone to the hypotension. In conclusion, the renin enhancer is critical for renin gene expression and physiological sequelae, including response to alteration in salt intake. The REKO mouse may be useful as a low renin expression model.

    The Journal of biological chemistry 2006;281;42;31753-61

  • Non-DNA binding, dominant-negative, human PPARgamma mutations cause lipodystrophic insulin resistance.

    Agostini M, Schoenmakers E, Mitchell C, Szatmari I, Savage D, Smith A, Rajanayagam O, Semple R, Luan J, Bath L, Zalin A, Labib M, Kumar S, Simpson H, Blom D, Marais D, Schwabe J, Barroso I, Trembath R, Wareham N, Nagy L, Gurnell M, O'Rahilly S and Chatterjee K

    Department of Medicine, University of Cambridge, United Kingdom.

    PPARgamma is essential for adipogenesis and metabolic homeostasis. We describe mutations in the DNA and ligand binding domains of human PPARgamma in lipodystrophic, severe insulin resistance. These receptor mutants lack DNA binding and transcriptional activity but can translocate to the nucleus, interact with PPARgamma coactivators and inhibit coexpressed wild-type receptor. Expression of PPARgamma target genes is markedly attenuated in mutation-containing versus receptor haploinsufficent primary cells, indicating that such dominant-negative inhibition operates in vivo. Our observations suggest that these mutants restrict wild-type PPARgamma action via a non-DNA binding, transcriptional interference mechanism, which may involve sequestration of functionally limiting coactivators.

    Funded by: Wellcome Trust: 080237

    Cell metabolism 2006;4;4;303-11

  • Meta-analysis of the Gly482Ser variant in PPARGC1A in type 2 diabetes and related phenotypes.

    Barroso I, Luan J, Sandhu MS, Franks PW, Crowley V, Schafer AJ, O'Rahilly S and Wareham NJ

    The Wellcome Trust Sanger Institute, Metabolic Disease Group, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ib1@sanger.ac.uk

    Peroxisome proliferator-activated receptor-gamma co-activator-1alpha (PPARGC1A) is a transcriptional co-activator with a central role in energy expenditure and glucose metabolism. Several studies have suggested that the common PPARGC1A polymorphism Gly482Ser may be associated with risk of type 2 diabetes, with conflicting results. To clarify the role of Gly482Ser in type 2 diabetes and related human metabolic phenotypes we genotyped this polymorphism in a case-control study and performed a meta-analysis of relevant published data.

    Gly482Ser was genotyped in a type 2 diabetes case-control study (N=1,096) using MassArray technology. A literature search revealed publications that examined Gly482Ser for association with type 2 diabetes and related metabolic phenotypes. Meta-analysis of the current study and relevant published data was undertaken.

    Results: In the pooled meta-analysis, including data from this study and seven published reports (3,718 cases, 4,818 controls), there was evidence of between-study heterogeneity (p<0.1). In the fixed-effects meta-analysis, the pooled odds ratio for risk of type 2 diabetes per Ser482 allele was 1.07 (95% CI 1.00-1.15, p=0.044). Elimination of one of the studies from the meta-analysis gave a summary odds ratio of 1.11 (95% CI 1.04-1.20, p=0.004), with no between-study heterogeneity (p=0.475). For quantitative metabolic traits in normoglycaemic subjects, we also found significant between-study heterogeneity. However, no significant association was observed between Gly482Ser and BMI, fasting glucose or fasting insulin.

    This meta-analysis of data from the current and published studies supports a modest role for the Gly482Ser PPARGC1A variant in type 2 diabetes risk.

    Funded by: Wellcome Trust

    Diabetologia 2006;49;3;501-5

  • Genetic analysis of the capsular biosynthetic locus from all 90 pneumococcal serotypes.

    Bentley SD, Aanensen DM, Mavroidi A, Saunders D, Rabbinowitsch E, Collins M, Donohoe K, Harris D, Murphy L, Quail MA, Samuel G, Skovsted IC, Kaltoft MS, Barrell B, Reeves PR, Parkhill J and Spratt BG

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. sdb@sanger.ac.uk

    Several major invasive bacterial pathogens are encapsulated. Expression of a polysaccharide capsule is essential for survival in the blood, and thus for virulence, but also is a target for host antibodies and the basis for effective vaccines. Encapsulated species typically exhibit antigenic variation and express one of a number of immunochemically distinct capsular polysaccharides that define serotypes. We provide the sequences of the capsular biosynthetic genes of all 90 serotypes of Streptococcus pneumoniae and relate these to the known polysaccharide structures and patterns of immunological reactivity of typing sera, thereby providing the most complete understanding of the genetics and origins of bacterial polysaccharide diversity, laying the foundations for molecular serotyping. This is the first time, to our knowledge, that a complete repertoire of capsular biosynthetic genes has been available, enabling a holistic analysis of a bacterial polysaccharide biosynthesis system. Remarkably, the total size of alternative coding DNA at this one locus exceeds 1.8 Mbp, almost equivalent to the entire S. pneumoniae chromosomal complement.

    Funded by: Wellcome Trust

    PLoS genetics 2006;2;3;e31

  • Ensembl 2006.

    Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Gräf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kähäri A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, Parker A, Proctor G, Prlic A, Rae M, Rios D, Redmond S, Schuster M, Sealy I, Searle S, Severin J, Slater G, Smedley D, Smith J, Stabenau A, Stalker J, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C and Hubbard TJ

    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. birney@ebi.ac.uk

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/13446, BBS/B/13462, BBS/B/13470; Wellcome Trust

    Nucleic acids research 2006;34;Database issue;D556-61

  • Just one cross appears capable of dramatically altering the population biology of a eukaryotic pathogen like Toxoplasma gondii.

    Boyle JP, Rajasekar B, Saeij JP, Ajioka JW, Berriman M, Paulsen I, Roos DS, Sibley LD, White MW and Boothroyd JC

    Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.

    Toxoplasma gondii, an obligate intracellular protozoan of the phylum Apicomplexa, is estimated to infect over a billion people worldwide as well as a great many other mammalian and avian hosts. Despite this ubiquity, the vast majority of human infections in Europe and North America are thought to be due to only three genotypes. Using a genome-wide analysis of single-nucleotide polymorphisms, we have constructed a genealogy for these three lines. The data indicate that types I and III are second- and first-generation offspring, respectively, of a cross between a type II strain and one of two ancestral strains. An extant T. gondii strain (P89) appears to be the modern descendant of the non-type II parent of type III, making the full genealogy of the type III clonotype known. The simplicity of this family tree demonstrates that even a single cross can lead to the emergence and dominance of a new clonal genotype that completely alters the population biology of a sexual pathogen.

    Funded by: NIAID NIH HHS: AI045806, AI05093, AI21423, AI41014, F32AI60306, R01 AI036629

    Proceedings of the National Academy of Sciences of the United States of America 2006;103;27;10514-9

  • Notch, epidermal growth factor receptor, and beta1-integrin pathways are coordinated in neural stem cells.

    Campos LS, Decker L, Taylor V and Skarnes W

    INSERM U368, Biologie Moléculaire du Développement, Ecole Normale Supérieure, Paris, France. lsc@sanger.ac.uk

    Notch1 and beta1-integrins are cell surface receptors involved in the recognition of the niche that surrounds stem cells through cell-cell and cell-extracellular matrix interactions, respectively. Notch1 is also involved in the control of cell fate choices in the developing central nervous system (Lewis, J. (1998) Semin. Cell Dev. Biol. 9, 583-589). Here we report that Notch and beta1-integrins are co-expressed and that these proteins cooperate with the epidermal growth factor receptor in neural progenitors. We describe data that suggests that beta1-integrins may affect Notch signaling through 1) physical interaction (sequestration) of the Notch intracellular domain fragment by the cytoplasmic tail of the beta1-integrin and 2) affecting trafficking of the Notch intracellular domain via caveolin-mediated mechanisms. Our findings suggest that caveolin 1-containing lipid rafts play a role in the coordination and coupling of beta1-integrin, Notch1, and tyrosine kinase receptor signaling pathways. We speculate that this will require the presence of the adequate beta1-activating extracellular matrix or growth factors in restricted regions of the central nervous system and namely in neurogenic niches.

    Funded by: Wellcome Trust

    The Journal of biological chemistry 2006;281;8;5300-9

  • Genetic and genomic prospects for Xenopus tropicalis research.

    Carruthers S and Stemple DL

    Vertebrate Development and Genetics, The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    Research using Xenopus laevis has made enormous contributions to our understanding of vertebrate development, control of the eukaryotic cell cycle and the cytoskeleton. One limitation, however, has been the lack of systematic genetic studies in Xenopus to complement molecular and cell biological investigations. Work with the closely related diploid frog Xenopus tropicalis is beginning to address this limitation. Here, we review the resources that will make genetic studies using X. tropicalis a reality.

    Funded by: NICHD NIH HHS: 1 R01 HD4 2276-01; Wellcome Trust

    Seminars in cell & developmental biology 2006;17;1;146-53

  • Animal, vegetable or mineral?

    Cerdeño-Tárraga AM and Bentley S

    Nature reviews. Microbiology 2006;4;10;725-6

  • Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration.

    Chokshi DA, Parker M and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, England. daveash@med.upenn.edu

    Genomic epidemiology is a field of research that seeks to improve the prevention and management of common diseases through an understanding of their molecular origins. It involves studying thousands of individuals, often from different populations, with exacting techniques. The scale and complexity of such research has required the formation of research consortia. Members of these consortia need to agree on policies for managing shared resources and handling genetic data. Here we consider data-sharing and intellectual property policies for an international research consortium working on the genomic epidemiology of malaria. We outline specific guidelines governing how samples and data are transferred among its members; how results are released into the public domain; when to seek protection for intellectual property; and how intellectual property should be managed. We outline some pragmatic solutions founded on the basic principles of promoting innovation and access.

    Funded by: Medical Research Council: G0200454, G19/9; Wellcome Trust

    Bulletin of the World Health Organization 2006;84;5;382-7

  • Molecular characterization and comparison of the components and multiprotein complexes in the postsynaptic proteome.

    Collins MO, Husi H, Yu L, Brandon JM, Anderson CN, Blackstock WP, Choudhary JS and Grant SG

    Genes to Cognition, The Wellcome Trust Sanger Institute, Hinxton, UK.

    Characterization of the composition of the postsynaptic proteome (PSP) provides a framework for understanding the overall organization and function of the synapse in normal and pathological conditions. We have identified 698 proteins from the postsynaptic terminal of mouse CNS synapses using a series of purification strategies and analysis by liquid chromatography tandem mass spectrometry and large-scale immunoblotting. Some 620 proteins were found in purified postsynaptic densities (PSDs), nine in AMPA-receptor immuno-purifications, 100 in isolates using an antibody against the NMDA receptor subunit NR1, and 170 by peptide-affinity purification of complexes with the C-terminus of NR2B. Together, the NR1 and NR2B complexes contain 186 proteins, collectively referred to as membrane-associated guanylate kinase-associated signalling complexes. We extracted data from six other synapse proteome experiments and combined these with our data to provide a consensus on the composition of the PSP. In total, 1124 proteins are present in the PSP, of which 466 were validated by their detection in two or more studies, forming what we have designated the Consensus PSD. These synapse proteome data sets offer a basis for future research in synaptic biology and will provide useful information in brain disease and mental disorder studies.

    Funded by: Wellcome Trust

    Journal of neurochemistry 2006;97 Suppl 1;16-23

  • A high-resolution survey of deletion polymorphism in the human genome.

    Conrad DF, Andrews TD, Carter NP, Hurles ME and Pritchard JK

    Department of Human Genetics, The University of Chicago, 920 East 58th Street, Chicago, Illinois 60637, USA.

    Recent work has shown that copy number polymorphism is an important class of genetic variation in human genomes. Here we report a new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions. We applied this method to data from the International HapMap Project to produce the first high-resolution population surveys of deletion polymorphism. Approximately 100 of these deletions have been experimentally validated using comparative genome hybridization on tiling-resolution oligonucleotide microarrays. Our analysis identifies a total of 586 distinct regions that harbor deletion polymorphisms in one or more of the families. Notably, we estimate that typical individuals are hemizygous for roughly 30-50 deletions larger than 5 kb, totaling around 550-750 kb of euchromatic sequence across their genomes. The detected deletions span a total of 267 known and predicted genes. Overall, however, the deleted regions are relatively gene-poor, consistent with the action of purifying selection against deletions. Deletion polymorphisms may well have an important role in the genetics of complex traits; however, they are not directly observed in most current gene mapping studies. Our new method will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.

    Funded by: NIGMS NIH HHS: GM07197; Wellcome Trust

    Nature genetics 2006;38;1;75-81

  • Budget genome.

    Crossman L

    Nature reviews. Microbiology 2006;4;5;326-7

  • Peddling the nitrogen cycle.

    Crossman L and Thomson N

    Nature reviews. Microbiology 2006;4;7;494-5

  • TranscriptSNPView: a genome-wide catalog of mouse coding variation.

    Cunningham F, Rios D, Griffiths M, Smith J, Ning Z, Cox T, Flicek P, Marin-Garcin P, Herrero J, Rogers J, van der Weyden L, Bradley A, Birney E and Adams DJ

    Funded by: Wellcome Trust: 062023, 077187

    Nature genetics 2006;38;8;853

  • A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC.

    de Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, Ke X, Monsuur AJ, Whittaker P, Delgado M, Morrison J, Richardson A, Walsh EC, Gao X, Galver L, Hart J, Hafler DA, Pericak-Vance M, Todd JA, Daly MJ, Trowsdale J, Wijmenga C, Vyse TJ, Beck S, Murray SS, Carrington M, Gregory S, Deloukas P and Rioux JD

    Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Seven Cambridge Center, Cambridge, Massachusetts 02142, USA.

    The proteins encoded by the classical HLA class I and class II genes in the major histocompatibility complex (MHC) are highly polymorphic and are essential in self versus non-self immune recognition. HLA variation is a crucial determinant of transplant rejection and susceptibility to a large number of infectious and autoimmune diseases. Yet identification of causal variants is problematic owing to linkage disequilibrium that extends across multiple HLA and non-HLA genes in the MHC. We therefore set out to characterize the linkage disequilibrium patterns between the highly polymorphic HLA genes and background variation by typing the classical HLA genes and >7,500 common SNPs and deletion-insertion polymorphisms across four population samples. The analysis provides informative tag SNPs that capture much of the common variation in the MHC region and that could be used in disease association studies, and it provides new insight into the evolutionary dynamics and ancestral origins of the HLA loci and their haplotypes.

    Funded by: Medical Research Council: G9800943; NCI NIH HHS: N01-CO-12400; NIAID NIH HHS: U19 AI050864; Wellcome Trust: 077011

    Nature genetics 2006;38;10;1166-72

  • A machine learning strategy to identify candidate binding sites in human protein-coding sequence.

    Down T, Leong B and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. td2@sanger.ac.uk

    Background: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins.

    Results: This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence.

    Conclusion: We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.

    Funded by: Wellcome Trust: 077198

    BMC bioinformatics 2006;7;419

  • Conserved noncoding sequences are selectively constrained and not mutation cold spots.

    Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H, Antonarakis SE, Dermitzakis ET and Hirschhorn JN

    Program in Genomics and Division of Endocrinology, Children's Hospital, Boston, Massachusetts 02115, USA.

    Noncoding genetic variants are likely to influence human biology and disease, but recognizing functional noncoding variants is difficult. Approximately 3% of noncoding sequence is conserved among distantly related mammals, suggesting that these evolutionarily conserved noncoding regions (CNCs) are selectively constrained and contain functional variation. However, CNCs could also merely represent regions with lower local mutation rates. Here we address this issue and show that CNCs are selectively constrained in humans by analyzing HapMap genotype data. Specifically, new (derived) alleles of SNPs within CNCs are rarer than new alleles in nonconserved regions (P = 3 x 10(-18)), indicating that evolutionary pressure has suppressed CNC-derived allele frequencies. Intronic CNCs and CNCs near genes show greater allele frequency shifts, with magnitudes comparable to those for missense variants. Thus, conserved noncoding variants are more likely to be functional. Allele frequency distributions highlight selectively constrained genomic regions that should be intensively surveyed for functionally important variation.

    Funded by: Wellcome Trust

    Nature genetics 2006;38;2;223-7

  • Minimizing the risk of reporting false positives in large-scale RNAi screens.

    Echeverri CJ, Beachy PA, Baum B, Boutros M, Buchholz F, Chanda SK, Downward J, Ellenberg J, Fraser AG, Hacohen N, Hahn WC, Jackson AL, Kiger A, Linsley PS, Lum L, Ma Y, Mathey-Prévôt B, Root DE, Sabatini DM, Taipale J, Perrimon N and Bernards R

    Cenix BioScience GmbH, Tatzberg 47, Dresden, 10307, Germany. echeverri@cenix-bioscience.com

    Large-scale RNA interference (RNAi)-based analyses, very much as other 'omic' approaches, have inherent rates of false positives and negatives. The variability in the standards of care applied to validate results from these studies, if left unchecked, could eventually begin to undermine the credibility of RNAi as a powerful functional approach. This Commentary is an invitation to an open discussion started among various users of RNAi to set forth accepted standards that would insure the quality and accuracy of information in the large datasets coming out of genome-scale screens.

    Nature methods 2006;3;10;777-9

  • DNA methylation profiling of human chromosomes 6, 20 and 22.

    Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA, Haefliger C, Horton R, Howe K, Jackson DK, Kunde J, Koenig C, Liddle J, Niblett D, Otto T, Pettett R, Seemann S, Thompson C, West T, Rogers J, Olek A, Berlin K and Beck S

    Epigenomics AG, Kleine Präsidentstrasse 1, 10178 Berlin, Germany.

    DNA methylation is the most stable type of epigenetic modification modulating the transcriptional plasticity of mammalian genomes. Using bisulfite DNA sequencing, we report high-resolution methylation profiles of human chromosomes 6, 20 and 22, providing a resource of about 1.9 million CpG methylation values derived from 12 different tissues. Analysis of six annotation categories showed that evolutionarily conserved regions are the predominant sites for differential DNA methylation and that a core region surrounding the transcriptional start site is an informative surrogate for promoter methylation. We find that 17% of the 873 analyzed genes are differentially methylated in their 5' UTRs and that about one-third of the differentially methylated 5' UTRs are inversely correlated with transcription. Despite the fact that our study controlled for factors reported to affect DNA methylation such as sex and age, we did not find any significant attributable effects. Our data suggest DNA methylation to be ontogenetically more stable than previously thought.

    Funded by: Wellcome Trust: 084071

    Nature genetics 2006;38;12;1378-85

  • PARL Leu262Val is not associated with fasting insulin levels in UK populations.

    Fawcett KA, Wareham NJ, Luan J, Syddall H, Cooper C, O'Rahilly S, Day IN, Sandhu MS and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    PARL, the gene encoding presenilins-associated rhomboid-like protein, maps to chromosome 3q27 within a quantitative trait locus that influences components of the metabolic syndrome. Recently, an amino acid substitution (Leu262Val, rs3732581) in PARL was associated with fasting plasma insulin levels in a US white population (N=1031). This variant was also found to modify the positive association between age and fasting insulin. The aim of this study was to test whether these findings could be replicated in two UK population-based cohorts.

    Methods: Participants from the Medical Research Council Ely and Hertfordshire cohort studies were genotyped for this variant using a SNaPshot primer extension assay and Taqman assay respectively. Full phenotypic and genotypic data were available for 3,666 study participants.

    Results: Based on a dominant model, we found no association between the Leu262Val polymorphism and fasting insulin levels (p=0.79) or BMI (p=0.98). We did not observe the previously reported interaction between age and genotype on fasting insulin (p=0.14).

    Despite having greater statistical power, our data do not support the previously reported association between PARL Leu262Val and fasting plasma insulin levels, a measure of insulin resistance. Our findings indicate that this variant is unlikely to be an important contributor to insulin resistance in UK populations.

    Funded by: Medical Research Council: MC_U106179471, MC_U147585824, MC_UP_A620_1014, U.1061.00.001 (79471); Wellcome Trust: 077016

    Diabetologia 2006;49;11;2649-52

  • Accurate and reliable high-throughput detection of copy number variation in the human genome.

    Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis P, Feuk L, French L, Hunt P, Kalaitzopoulos D, Larkin J, Montgomery L, Perry GH, Plumb BW, Porter K, Rigby RE, Rigler D, Valsesia A, Langford C, Humphray SJ, Scherer SW, Lee C, Hurles ME and Carter NP

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    This study describes a new tool for accurate and reliable high-throughput detection of copy number variation in the human genome. We have constructed a large-insert clone DNA microarray covering the entire human genome in tiling path resolution that we have used to identify copy number variation in human populations. Crucial to this study has been the development of a robust array platform and analytic process for the automated identification of copy number variants (CNVs). The array consists of 26,574 clones covering 93.7% of euchromatic regions. Clones were selected primarily from the published "Golden Path," and mapping was confirmed by fingerprinting and BAC-end sequencing. Array performance was extensively tested by a series of validation assays. These included determining the hybridization characteristics of each individual clone on the array by chromosome-specific add-in experiments. Estimation of data reproducibility and false-positive/negative rates was carried out using self-self hybridizations, replicate experiments, and independent validations of CNVs. Based on these studies, we developed a variance-based automatic copy number detection analysis process (CNVfinder) and have demonstrated its robustness by comparison with the SW-ARRAY method.

    Funded by: Wellcome Trust

    Genome research 2006;16;12;1566-74

  • COSMIC 2005.

    Forbes S, Clements J, Dawson E, Bamford S, Webb T, Dogan A, Flanagan A, Teague J, Wooster R, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    The Catalogue Of Somatic Mutations In Cancer (COSMIC) database and web site was developed to preserve somatic mutation data and share it with the community. Over the past 25 years, approximately 350 cancer genes have been identified, of which 311 are somatically mutated. COSMIC has been expanded and now holds data previously reported in the scientific literature for 28 known cancer genes. In addition, there is data from the systematic sequencing of 518 protein kinase genes. The total gene count in COSMIC stands at 538; 25 have a mutation frequency above 5% in one or more tumour type, no mutations were found in 333 genes and 180 are rarely mutated with frequencies <5% in any tumour set. The COSMIC web site has been expanded to give more views and summaries of the data and provide faster query routes and downloads. In addition, there is a new section describing mutations found through a screen of known cancer genes in 728 cancer cell lines including the NCI-60 set of cancer cell lines.

    Funded by: Wellcome Trust

    British journal of cancer 2006;94;2;318-22

  • Pseudo-messenger RNA: phantoms of the transcriptome.

    Frith MC, Wilming LG, Forrest A, Kawaji H, Tan SL, Wahlestedt C, Bajic VB, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bailey TL and Huminiecki L

    Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan.

    The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo-messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.

    PLoS genetics 2006;2;4;e23

  • Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs.

    Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ and Schier AF

    Developmental Genetics Program, Skirball Institute of Biomolecular Medicine, and Department of Cell Biology, New York University School of Medicine, New York, NY 10016, USA. giraldez@mcb.harvard.edu

    MicroRNAs (miRNAs) comprise 1 to 3% of all vertebrate genes, but their in vivo functions and mechanisms of action remain largely unknown. Zebrafish miR-430 is expressed at the onset of zygotic transcription and regulates morphogenesis during early development. By using a microarray approach and in vivo target validation, we find that miR-430 directly regulates several hundred target messenger RNA molecules (mRNAs). Most targets are maternally expressed mRNAs that accumulate in the absence of miR-430. We also show that miR-430 accelerates the deadenylation of target mRNAs. These results suggest that miR-430 facilitates the deadenylation and clearance of maternal mRNAs during early embryogenesis.

    Science (New York, N.Y.) 2006;312;5770;75-9

  • Genetic screens for mutations affecting development of Xenopus tropicalis.

    Goda T, Abu-Daya A, Carruthers S, Clark MD, Stemple DL and Zimmerman LB

    Division of Developmental Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.

    We present here the results of forward and reverse genetic screens for chemically-induced mutations in Xenopus tropicalis. In our forward genetic screen, we have uncovered 77 candidate phenotypes in diverse organogenesis and differentiation processes. Using a gynogenetic screen design, which minimizes time and husbandry space expenditures, we find that if a phenotype is detected in the gynogenetic F2 of a given F1 female twice, it is highly likely to be a heritable abnormality (29/29 cases). We have also demonstrated the feasibility of reverse genetic approaches for obtaining carriers of mutations in specific genes, and have directly determined an induced mutation rate by sequencing specific exons from a mutagenized population. The Xenopus system, with its well-understood embryology, fate map, and gain-of-function approaches, can now be coupled with efficient loss-of-function genetic strategies for vertebrate functional genomics and developmental genetics.

    Funded by: Medical Research Council: MC_U117560482; NICHD NIH HHS: 1 R01 HD4 2276-01; Wellcome Trust

    PLoS genetics 2006;2;6;e91

  • Geminin is essential to prevent endoreduplication and to form pluripotent cells during mammalian development.

    Gonzalez MA, Tachibana KE, Adams DJ, van der Weyden L, Hemberger M, Coleman N, Bradley A and Laskey RA

    Medical Research Council Cancer Cell Unit, Hutchison/MRC Research Centre, UK. mg322@hutchison-mrc.cam.ac.uk

    In multicellular eukaryotes, geminin prevents overreplication of DNA in proliferating cells. Here, we show that genetic ablation of geminin in the mouse prevents formation of inner cell mass (ICM) and causes premature endoreduplication at eight cells, rather than 32 cells. All cells in geminin-deficient embryos commit to the trophoblast cell lineage and consist of trophoblast giant cells (TGCs) only. Geminin is also down-regulated in TGCs of wild-type blastocysts during S and gap-like phases by proteasome-mediated degradation, suggesting that loss of geminin is part of the mechanism regulating endoreduplication.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/0000M100; Medical Research Council: G120/824, MC_U105359878

    Genes & development 2006;20;14;1880-4

  • The DNA sequence and biological annotation of human chromosome 1.

    Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CC, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RI, Aubin K, Babbage AK, Bagguley CL, Bailey J, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Buckley D, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dunn M, Earthrowl M, Ellington AG, Errington H, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MR, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ES, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NS, McLaren S, Milne S, Mistry S, Moore MJ, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall M, Wallis JM, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR, Banerjee R, Bryant SP, Burford DC, Burrill WD, Clegg SM, Dhami P, Dovey O, Faulkner LM, Gribble SM, Langford CF, Pandian RD, Porter KM and Prigmore E

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. sgregory@chg.duhs.duke.edu

    The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.

    Funded by: Medical Research Council: G0000107; Wellcome Trust

    Nature 2006;441;7091;315-21

  • EGASP: the human ENCODE Genome Annotation Assessment Project.

    Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE and Reese MG

    Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain. rguigo@imim.es

    Background: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment.

    Results: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.

    Conclusion: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.

    Funded by: Medical Research Council: G8225539

    Genome biology 2006;7 Suppl 1;S2.1-31

  • A shared Y-chromosomal heritage between Muslims and Hindus in India.

    Gutala R, Carvalho-Silva DR, Jin L, Yngvadottir B, Avadhanula V, Nanne K, Singh L, Chakraborty R and Tyler-Smith C

    Department of Medicine, University of Texas Health Science Center, San Antonio, TX, USA.

    Arab forces conquered the Indus Delta region in 711 AD: and, although a Muslim state was established there, their influence was barely felt in the rest of South Asia at that time. By the end of the tenth century, Central Asian Muslims moved into India from the northwest and expanded throughout the subcontinent. Muslim communities are now the largest minority religion in India, comprising more than 138 million people in a predominantly Hindu population of over one billion. It is unclear whether the Muslim expansion in India was a purely cultural phenomenon or had a genetic impact on the local population. To address this question from a male perspective, we typed eight microsatellite loci and 16 binary markers from the Y chromosome in 246 Muslims from Andhra Pradesh, and compared them to published data on 4,204 males from East Asia, Central Asia, other parts of India, Sri Lanka, Pakistan, Iran, the Middle East, Turkey, Egypt and Morocco. We find that the Muslim populations in general are genetically closer to their non-Muslim geographical neighbors than to other Muslims in India, and that there is a highly significant correlation between genetics and geography (but not religion). Our findings indicate that, despite the documented practice of marriage between Muslim men and Hindu women, Islamization in India did not involve large-scale replacement of Hindu Y chromosomes. The Muslim expansion in India was predominantly a cultural change and was not accompanied by significant gene flow, as seen in other places, such as China and Central Asia.

    Funded by: Wellcome Trust: 077009

    Human genetics 2006;120;4;543-51

  • Polymorphisms in the gene encoding sterol regulatory element-binding factor-1c are associated with type 2 diabetes.

    Harding AH, Loos RJ, Luan J, O'Rahilly S, Wareham NJ and Barroso I

    MRC Epidemiology Unit, Cambridge, UK.

    The sterol regulatory element-binding factor (SREBF)-1c is a transcription factor involved in the regulation of lipid and glucose metabolism. We have previously found evidence that a common SREBF1c single-nucleotide polymorphism (SNP), located between exons 18c and 19c, is associated with an increased risk of type 2 diabetes. The present study aimed to replicate our previously reported association in a larger case-control study and to examine an additional five SREBF1c SNPs for their association with diabetes risk and plasma glucose concentrations.

    Methods: We genotyped six SREBF1c SNPs in two case-control studies (n=1,938) and in a large cohort study (n=1,721) and tested for association with type 2 diabetes and with plasma glucose concentrations (fasting and 120-min post-glucose load), respectively.

    Results: In the case-control studies, carriers of the minor allele of the previously reported SNP (rs11868035) had a significantly increased diabetes risk (odds ratio [OR]=1.20 [95% CI 1.04-1.38], p=0.015). Also, three other SNPs (rs2236513, rs6502618 and rs1889018), located in the 5' region, were significantly associated with diabetes risk (OR > or =1.21, p< or =0.006). Furthermore, two SNPs (rs2236513 and rs1889018) in the 5' region were weakly (p<0.09) associated with plasma glucose concentrations in the cohort study. Rare homozygotes had increased (p< or =0.05) 120-min post-load glucose concentrations compared with carriers of the wild-type allele. Haplotype analyses showed significant (p=0.04) association with diabetes risk and confirmed the single SNP analyses.

    In summary, we replicated our previous finding and found evidence for SNPs in the 5' region of the SREBF1c gene to be associated with the risk of type 2 diabetes and plasma glucose concentration.

    Funded by: Medical Research Council: MC_U106179471, MC_U106188470; Wellcome Trust: 077016

    Diabetologia 2006;49;11;2642-8

  • GENCODE: producing a reference annotation for ENCODE.

    Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE and Guigo R

    Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK. jla1@sanger.ac.uk

    Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.

    Results: The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions.

    Conclusion: In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.

    Genome biology 2006;7 Suppl 1;S4.1-9

  • The grapes of wrath.

    Holden M, Lindsay J and Bentley S

    Nature reviews. Microbiology 2006;4;11;806-7

  • Insights into social insects from the genome of the honeybee Apis mellifera.

    Honeybee Genome Sequencing Consortium

    Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.

    Funded by: Medical Research Council: MC_U137761447; NIGMS NIH HHS: R01 GM058634, R01 GM058634-08, R01 GM067317, R01 GM067317-03, R37 GM041247; NINDS NIH HHS: R01 NS040296, R01 NS040296-06, R01 NS043244; Wellcome Trust: 062023

    Nature 2006;443;7114;931-49

  • The LRC haplotype project: a resource for killer immunoglobulin-like receptor-linked association studies.

    Horton R, Coggill P, Miretti MM, Sambrook JG, Traherne JA, Ward R, Sims S, Palmer S, Sehra H, Harrow J, Rogers J, Carrington M, Trowsdale J and Beck S

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    There is increasing evidence for epistatic interactions between gene products (e.g. KIR) encoded within the Leukocyte Receptor Complex (LRC) with those (e.g. HLA) of the Major Histocompatibility Complex (MHC), resulting in susceptibility to disease. Identification of such associations at the DNA level requires comprehensive knowledge of the genetic variation and haplotype structure of the underlying loci. The LRC haplotype project aims to provide this knowledge by sequencing common LRC haplotypes.

    Funded by: Medical Research Council: G0401569, G9800943; NCI NIH HHS: N01-CO-12400; Wellcome Trust: 077198

    Tissue antigens 2006;68;5;450-2

  • A hypermutation phenotype and somatic MSH6 mutations in recurrent human malignant gliomas after alkylator chemotherapy.

    Hunter C, Smith R, Cahill DP, Stephens P, Stevens C, Teague J, Greenman C, Edkins S, Bignell G, Davies H, O'Meara S, Parker A, Avis T, Barthorpe S, Brackenbury L, Buck G, Butler A, Clements J, Cole J, Dicks E, Forbes S, Gorton M, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, Kosmidou V, Laman R, Lugg R, Menzies A, Perry J, Petty R, Raine K, Richardson D, Shepherd R, Small A, Solomon H, Tofts C, Varian J, West S, Widaa S, Yates A, Easton DF, Riggins G, Roy JE, Levine KK, Mueller W, Batchelor TT, Louis DN, Stratton MR, Futreal PA and Wooster R

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    Malignant gliomas have a very poor prognosis. The current standard of care for these cancers consists of extended adjuvant treatment with the alkylating agent temozolomide after surgical resection and radiotherapy. Although a statistically significant increase in survival has been reported with this regimen, nearly all gliomas recur and become insensitive to further treatment with this class of agents. We sequenced 500 kb of genomic DNA corresponding to the kinase domains of 518 protein kinases in each of nine gliomas. Large numbers of somatic mutations were observed in two gliomas recurrent after alkylating agent treatment. The pattern of mutations in these cases showed strong similarity to that induced by alkylating agents in experimental systems. Further investigation revealed inactivating somatic mutations of the mismatch repair gene MSH6 in each case. We propose that inactivating somatic mutations of MSH6 confer resistance to alkylating agents in gliomas in vivo and concurrently unleash accelerated mutagenesis in resistant clones as a consequence of continued exposure to alkylating agents in the presence of defective mismatch repair. The evidence therefore suggests that when MSH6 is inactivated in gliomas, alkylating agents convert from induction of tumor cell death to promotion of neoplastic progression. These observations highlight the potential of large scale sequencing for revealing and elucidating mutagenic processes operative in individual human cancers.

    Funded by: Wellcome Trust

    Cancer research 2006;66;8;3987-91

  • Recombination hotspots in nonallelic homologous recombination

    HURLES,M.E. and Lupski,J.R.;

    Genomic Disorders: The Genomic Basis of Disease 2006;Chapter 24;341-355

  • Y-chromosomal rearrangements and azoospermia

    Hurles,M.E. and Tyler-Smith,C.

    Genomic Disorders: The Genomic Basis of Disease 2006;19;273-288

  • Mutation analysis of 24 known cancer genes in the NCI-60 cell line set.

    Ikediobi ON, Davies H, Bignell G, Edkins S, Stevens C, O'Meara S, Santarius T, Avis T, Barthorpe S, Brackenbury L, Buck G, Butler A, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Hunter C, Jenkinson A, Jones D, Kosmidou V, Lugg R, Menzies A, Mironenko T, Parker A, Perry J, Raine K, Richardson D, Shepherd R, Small A, Smith R, Solomon H, Stephens P, Teague J, Tofts C, Varian J, Webb T, West S, Widaa S, Yates A, Reinhold W, Weinstein JN, Stratton MR, Futreal PA and Wooster R

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. mrs@sanger.ac.uk.

    The panel of 60 human cancer cell lines (the NCI-60) assembled by the National Cancer Institute for anticancer drug discovery is a widely used resource. The NCI-60 has been characterized pharmacologically and at the molecular level more extensively than any other set of cell lines. However, no systematic mutation analysis of genes causally implicated in oncogenesis has been reported. This study reports the sequence analysis of 24 known cancer genes in the NCI-60 and an assessment of 4 of the 24 genes for homozygous deletions. One hundred thirty-seven oncogenic mutations were identified in 14 (APC, BRAF, CDKN2, CTNNB1, HRAS, KRAS, NRAS, SMAD4, PIK3CA, PTEN, RB1, STK11, TP53, and VHL) of the 24 genes. All lines have at least one mutation among the cancer genes examined, with most lines (73%) having more than one. Identification of those cancer genes mutated in the NCI-60, in combination with pharmacologic and molecular profiles of the cells, will allow for more informed interpretation of anticancer agent screening and will enhance the use of the NCI-60 cell lines for molecularly targeted screens.

    Funded by: Wellcome Trust: 077012

    Molecular cancer therapeutics 2006;5;11;2606-12

  • Genome assembly comparison identifies structural variants in the human genome.

    Khaja R, Zhang J, MacDonald JR, He Y, Joseph-George AM, Wei J, Rafiq MA, Qian C, Shago M, Pantano L, Aburatani H, Jones K, Redon R, Hurles M, Armengol L, Estivill X, Mural RJ, Lee C, Scherer SW and Feuk L

    Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto and The Centre for Applied Genomics, MaRS Centre, Toronto, Ontario, M5G 1L7, Canada.

    Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs and intermediate-sized variants (ISVs). However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (<50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed >1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.

    Funded by: Wellcome Trust: 077014

    Nature genetics 2006;38;12;1413-8

  • Common inheritance of chromosome Ia associated with clonal expansion of Toxoplasma gondii.

    Khan A, Böhme U, Kelly KA, Adlem E, Brooks K, Simmonds M, Mungall K, Quail MA, Arrowsmith C, Chillingworth T, Churcher C, Harris D, Collins M, Fosker N, Fraser A, Hance Z, Jagels K, Moule S, Murphy L, O'Neil S, Rajandream MA, Saunders D, Seeger K, Whitehead S, Mayr T, Xuan X, Watanabe J, Suzuki Y, Wakaguri H, Sugano S, Sugimoto C, Paulsen I, Mackey AJ, Roos DS, Hall N, Berriman M, Barrell B, Sibley LD and Ajioka JW

    Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.

    Toxoplasma gondii is a globally distributed protozoan parasite that can infect virtually all warm-blooded animals and humans. Despite the existence of a sexual phase in the life cycle, T. gondii has an unusual population structure dominated by three clonal lineages that predominate in North America and Europe, (Types I, II, and III). These lineages were founded by common ancestors approximately10,000 yr ago. The recent origin and widespread distribution of the clonal lineages is attributed to the circumvention of the sexual cycle by a new mode of transmission-asexual transmission between intermediate hosts. Asexual transmission appears to be multigenic and although the specific genes mediating this trait are unknown, it is predicted that all members of the clonal lineages should share the same alleles. Genetic mapping studies suggested that chromosome Ia was unusually monomorphic compared with the rest of the genome. To investigate this further, we sequenced chromosome Ia and chromosome Ib in the Type I strain, RH, and the Type II strain, ME49. Comparative genome analyses of the two chromosomal sequences revealed that the same copy of chromosome Ia was inherited in each lineage, whereas chromosome Ib maintained the same high frequency of between-strain polymorphism as the rest of the genome. Sampling of chromosome Ia sequence in seven additional representative strains from the three clonal lineages supports a monomorphic inheritance, which is unique within the genome. Taken together, our observations implicate a specific combination of alleles on chromosome Ia in the recent origin and widespread success of the clonal lineages of T. gondii.

    Funded by: NIAID NIH HHS: R01 AI059176; Wellcome Trust

    Genome research 2006;16;9;1119-25

  • Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays.

    Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, Lee C, Scherer SW, Jones KW, Shapero MH, Huang J and Aburatani H

    Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan.

    Recent reports indicate that copy number variations (CNVs) within the human genome contribute to nucleotide diversity to a larger extent than single nucleotide polymorphisms (SNPs). In addition, the contribution of CNVs to human disease susceptibility may be greater than previously expected, although a complete understanding of the phenotypic consequences of CNVs is incomplete. We have recently reported a comprehensive view of CNVs among 270 HapMap samples using high-density SNP genotyping arrays and BAC array CGH. In this report, we describe a novel algorithm using Affymetrix GeneChip Human Mapping 500K Early Access (500K EA) arrays that identified 1203 CNVs ranging in size from 960 bp to 3.4 Mb. The algorithm consists of three steps: (1) Intensity pre-processing to improve the resolution between pairwise comparisons by directly estimating the allele-specific affinity as well as to reduce signal noise by incorporating probe and target sequence characteristics via an improved version of the Genomic Imbalance Map (GIM) algorithm; (2) CNV extraction using an adapted SW-ARRAY procedure to automatically and robustly detect candidate CNV regions; and (3) copy number inference in which all pairwise comparisons are summarized to more precisely define CNV boundaries and accurately estimate CNV copy number. Independent testing of a subset of CNVs by quantitative PCR and mass spectrometry demonstrated a >90% verification rate. The use of high-resolution oligonucleotide arrays relative to other methods may allow more precise boundary information to be extracted, thereby enabling a more accurate analysis of the relationship between CNVs and other genomic features.

    Genome research 2006;16;12;1575-84

  • Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways.

    Lehner B, Crombie C, Tischler J, Fortunato A and Fraser AG

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    Most heritable traits, including disease susceptibility, are affected by interactions between multiple genes. However, we understand little about how genes interact because very few possible genetic interactions have been explored experimentally. We have used RNA interference in Caenorhabditis elegans to systematically test approximately 65,000 pairs of genes for their ability to interact genetically. We identify approximately 350 genetic interactions between genes functioning in signaling pathways that are mutated in human diseases, including components of the EGF/Ras, Notch and Wnt pathways. Most notably, we identify a class of highly connected 'hub' genes: inactivation of these genes can enhance the phenotypic consequences of mutation of many different genes. These hub genes all encode chromatin regulators, and their activity as genetic hubs seems to be conserved across animals. We propose that these genes function as general buffers of genetic variation and that these hub genes may act as modifier genes in multiple, mechanistically unrelated genetic diseases in humans.

    Funded by: Wellcome Trust

    Nature genetics 2006;38;8;896-903

  • RNAi screens in Caenorhabditis elegans in a 96-well liquid format and their application to the systematic identification of genetic interactions.

    Lehner B, Tischler J and Fraser AG

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    We describe a protocol for performing RNA interference (RNAi) screens in Caenorhabditis elegans in liquid culture in 96-well plates. The procedure allows a single researcher to set-up and score RNAi experiments at approximately 2,000 genes per day. By comparing RNAi phenotypes between wild-type worms and worms carrying a defined genetic mutation, we have used this protocol to identify synthetic lethal interactions between genes systematically. We also describe how the protocol can be adapted to target two genes simultaneously by combinatorial RNAi.

    Funded by: Wellcome Trust

    Nature protocols 2006;1;3;1617-20

  • TreeFam: a curated database of phylogenetic trees of animal gene families.

    Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J and Durbin R

    Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.

    TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; approximately 40-85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at http://www.treefam.org and http://treefam.genomics.org.cn.

    Funded by: Wellcome Trust

    Nucleic acids research 2006;34;Database issue;D572-80

  • A chromosomal rearrangement hotspot can be identified from population genetic variation and is coincident with a hotspot for allelic recombination.

    Lindsay SJ, Khajavi M, Lupski JR and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Insights into the origins of structural variation and the mutational mechanisms underlying genomic disorders would be greatly improved by a genomewide map of hotspots of nonallelic homologous recombination (NAHR). Moreover, our understanding of sequence variation within the duplicated sequences that are substrates for NAHR lags far behind that of sequence variation within the single-copy portion of the genome. Perhaps the best-characterized NAHR hotspot lies within the 24-kb-long Charcot-Marie-Tooth disease type 1A (CMT1A)-repeats (REPs) that sponsor deletions and duplications that cause peripheral neuropathies. We investigated structural and sequence diversity within the CMT1A-REPs, both within and between species. We discovered a high frequency of retroelement insertions, accelerated sequence evolution after duplication, extensive paralogous gene conversion, and a greater than twofold enrichment of SNPs in humans relative to the genome average. We identified an allelic recombination hotspot underlying the known NAHR hotspot, which suggests that the two processes are intimately related. Finally, we used our data to develop a novel method for inferring the location of an NAHR hotspot from sequence variation within segmental duplications and applied it to identify a putative NAHR hotspot within the LCR22 repeats that sponsor velocardiofacial syndrome deletions. We propose that a large-scale project to map sequence variation within segmental duplications would reveal a wealth of novel chromosomal-rearrangement hotspots.

    Funded by: Wellcome Trust

    American journal of human genetics 2006;79;5;890-902

  • Chromosome-engineered mouse models

    Liu, P.

    Genomic Disorders 2006;VI;373-387

  • Tmc1 is necessary for normal functional maturation and survival of inner and outer hair cells in the mouse cochlea.

    Marcotti W, Erven A, Johnson SL, Steel KP and Kros CJ

    School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, UK.

    The deafness (dn) and Beethoven (Bth) mutant mice are models for profound congenital deafness (DFNB7/B11) and progressive hearing loss (DFNA36), respectively, caused by recessive and dominant mutations of transmembrane cochlear-expressed gene 1 (TMC1), which encodes a transmembrane protein of unknown function. In the mouse cochlea Tmc1 is expressed in both outer (OHCs) and inner (IHCs) hair cells from early stages of development. Immature hair cells of mutant mice seem normal in appearance and biophysical properties. From around P8 for OHCs and P12 for IHCs, mutants fail to acquire (dn/dn) or show reduced expression (Bth/Bth and, to a lesser extent Bth/+) of the K+ currents which contribute to their normal functional maturation (the BK-type current IK,f in IHCs, and the delayed rectifier IK,n in both cell types). Moreover, the exocytotic machinery in mutant IHCs does not develop normally as judged by the persistence of immature features of the Ca2+ current and exocytosis into adulthood. Mutant mice exhibited progressive hair cell damage and loss. The compound action potential (CAP) thresholds of Bth/+ mice were raised and correlated with the degree of hair cell loss. Homozygous mutants (dn/dn and Bth/Bth) never showed CAP responses, even at ages where many hair cells were still present in the apex of the cochlea, suggesting their hair cells never function normally. We propose that Tmc1 is involved in trafficking of molecules to the plasma membrane or serves as an intracellular regulatory signal for differentiation of immature hair cells into fully functional auditory receptors.

    Funded by: Medical Research Council: G0100798, G0300212, MC_QA137918

    The Journal of physiology 2006;574;Pt 3;677-98

  • Two quantitative trait loci affecting progressive hearing loss in 101/H mice.

    Mashimo T, Erven AE, Spiden SL, Guénet JL and Steel KP

    Département de Biologie du Développement, Institut Pasteur, Paris, France.

    Although recent progress in identifying genes involved in deafness has been remarkable, the genetic basis of progressive hearing loss (or age-related hearing loss) is poorly understood because of the extreme difficulty in studying such a late-onset, complex disease in human populations. Several inbred strains of mice such as 129P1/ReJ, C57BL/6J, DBA/2J, and BALB/cByJ have been reported to exhibit age-related hearing loss and provide valuable models for human nonsyndromic progressive deafness. In this article we show that 101/H mice also exhibit progressive deafness with early onset. Linkage analysis of F(2) populations derived from crosses between the 101/H and the MAI/Pas and MBT/Pas wild-derived mice suggested at least two major quantitative trait loci (QTLs) that influence progressive hearing loss. A first QTL, designated Phl1, was mapped with a maximum LOD score of 6.7 to the centromeric region of Chromosome 17, where no deafness-related QTL has been mapped so far. A second QTL, designated Phl2, mapped to Chromosome 10 and exhibited a maximum LOD score of 5.3. The map position of Phl2 near the well-known QTL of age-related hearing loss (Ahl) suggested the possibility of allelism, although the Ahl mutation itself did not segregate in these crosses. Finally, we found some evidence of epistatic interaction between Phl1 and Phl2.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 2006;17;8;841-50

  • Mapping trait loci by use of inferred ancestral recombination graphs.

    Minichiello MJ and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, United Kingdom.

    Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.

    Funded by: Wellcome Trust

    American journal of human genetics 2006;79;5;910-22

  • Detection of novel Y SNPs provides further insights into Y chromosomal variation in Pakistan.

    Mohyuddin A, Ayub Q, Underhill PA, Tyler-Smith C and Mehdi SQ

    Biomedical and Genetic Engineering Laboratories, G. P. O Box 2891, 44000, Islamabad, Pakistan.

    Biallelic polymorphisms on the Y chromosome have been extensively used to study the history, evolution, and migration patterns of world populations. In this study we screened 8.5 kb of Y chromosomal DNA for single nucleotide polymorphisms (SNPs) in a panel of 95 male individuals belonging to different haplogroups. Five novel Y-SNPs (PK1-5) were identified, four in the Pakistani sample and one in an African sample. The ancestral state of each SNP was determined in two chimpanzee samples and a variety of Pakistani ethnic groups. In addition to these novel Y-SNPs 77 additional markers on the Y chromosome were analyzed to place the SNPs on the phylogenetic tree of Y chromosomal lineages and to further investigate extant human Y chromosomal variation within Pakistan. BATWING analysis gave an estimate of between 2,500 and 7,300 YBP for population expansion in Pakistan which coincides with the period of the Indus Valley civilizations.

    Journal of human genetics 2006;51;4;375-8

  • Molecular analysis of fluoroquinolone-resistant Salmonella Paratyphi A isolate, India.

    Nair S, Unnikrishnan M, Turner K, Parija SC, Churcher C, Wain J and Harish N

    The Wellcome Trust Sanger Institute, Cambridgeshire, United Kingdom.

    Salmonella enterica serovar Paratyphi A is increasingly a cause of enteric fever. Sequence analysis of an Indian isolate showed a unique strain with high-level resistance to ciprofloxacin associated with double mutations in the DNA gyrase subunit gyrA (Ser83-->Phe and Asp87-->Gly) and a mutation in topoisomerase IV subunit parC (Ser80-->Arg).

    Funded by: Wellcome Trust

    Emerging infectious diseases 2006;12;3;489-91

  • Factors affecting flow karyotype resolution.

    Ng BL and Carter NP

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. bln@sanger.ac.uk

    Background: One of the major factors which influences the chromosome purity achievable particularly during high speed sorting is the analytical resolution of individual chromosome peaks in the flow karyotype, as well as the amount of debris and fragmented chromosomes. We have investigated the factors involved in the preparation of chromosome suspensions that influence karyotype resolution.

    Methods: Chromosomes were isolated from various human and animal cell types using a series of polyamine buffer isolation protocols modified with respect to pH, salt concentration, and chromosome staining time. Each preparation was analyzed on a MoFlo sorter (DAKO) configured for high speed sorting and the resolution of the flow karyotypes compared.

    Results: High resolution flow cytometric data was obtained with chromosomes optimally isolated using hypotonic solution buffered at pH 8.0 and polyamine isolation buffer (with NaCl excluded) between pH 7.50 and 8.0. Extending staining time to more than 8 h with chromosome suspensions isolated from cell lines subjected to sufficient metaphase arrest times gave the best result with the lowest percentage of debris generated, tighter chromosome peaks with overall lower coefficients of variation, and a 1- to 5-fold increase in the yield of isolated chromosomes.

    Conclusions: Optimization of buffer pH and the length of staining improved karyotype resolution particularly for larger chromosomes and reduced the presence of chromosome fragments (debris). However, the most interesting and surprising finding was that the exclusion of NaCl in PAB buffer improved the yield and resolution of larger chromosomes.

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2006;69;9;1028-36

  • The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse.

    Nord AS, Chang PJ, Conklin BR, Cox AV, Harper CA, Hicks GG, Huang CC, Johns SJ, Kawamoto M, Liu S, Meng EC, Morris JH, Rossant J, Ruiz P, Skarnes WC, Soriano P, Stanford WL, Stryke D, von Melchner H, Wurst W, Yamamura K, Young SG, Babbitt PC and Ferrin TE

    University of California San Francisco, 600 16th Street, San Francisco, CA 94143-2240, USA.

    Gene trapping is a method of generating murine embryonic stem (ES) cell lines containing insertional mutations in known and novel genes. A number of international groups have used this approach to create sizeable public cell line repositories available to the scientific community for the generation of mutant mouse strains. The major gene trapping groups worldwide have recently joined together to centralize access to all publicly available gene trap lines by developing a user-oriented Website for the International Gene Trap Consortium (IGTC). This collaboration provides an impressive public informatics resource comprising approximately 45 000 well-characterized ES cell lines which currently represent approximately 40% of known mouse genes, all freely available for the creation of knockout mice on a non-collaborative basis. To standardize annotation and provide high confidence data for gene trap lines, a rigorous identification and annotation pipeline has been developed combining genomic localization and transcript alignment of gene trap sequence tags to identify trapped loci. This information is stored in a new bioinformatics database accessible through the IGTC Website interface. The IGTC Website (www.genetrap.org) allows users to browse and search the database for trapped genes, BLAST sequences against gene trap sequence tags, and view trapped genes within biological pathways. In addition, IGTC data have been integrated into major genome browsers and bioinformatics sites to provide users with outside portals for viewing this data. The development of the IGTC Website marks a major advance by providing the research community with the data and tools necessary to effectively use public gene trap resources for the large-scale characterization of mammalian gene function.

    Funded by: NCRR NIH HHS: P41 RR01081; NHLBI NIH HHS: U01 HL66600; Wellcome Trust

    Nucleic acids research 2006;34;Database issue;D642-8

  • Hot and sexy moulds!

    Pain A, Böhme U and Berriman M

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Nature reviews. Microbiology 2006;4;4;244-5

  • YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms.

    Penkett CJ, Morris JA, Wood V and Bähler J

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    We present YOGY a web-based resource for orthologous proteins from nine eukaryotic organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Schizosaccharomyces pombe and Saccharomyces cerevisiae. Using a gene name from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, HomoloGene, OrthoMCL and a table of curated fission and budding yeast orthologs. Associated Gene Ontology (GO) terms of orthologs can also be retrieved for functional inference. Integrating these different and complementary datasets provides a straightforward tool to identify known and predicted orthologs of proteins from a variety of species. This resource should be useful for bench scientists looking for functional clues for their genes of interest as well as for curators looking for information that can be transferred based on orthology and for rapidly identifying the relevant GO terms as an aid to literature curation. YOGY is accessible online at http://www.sanger.ac.uk/PostGenomics/S_pombe/YOGY/.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Nucleic acids research 2006;34;Web Server issue;W330-4

  • The proteomes of neurotransmitter receptor complexes form modular networks with distributed functionality underlying plasticity and behaviour.

    Pocklington AJ, Cumiskey M, Armstrong JD and Grant SG

    School of Informatics, Edinburgh University, Edinburgh, UK.

    Neuronal synapses play fundamental roles in information processing, behaviour and disease. Neurotransmitter receptor complexes, such as the mammalian N-methyl-D-aspartate receptor complex (NRC/MASC) comprising 186 proteins, are major components of the synapse proteome. Here we investigate the organisation and function of NRC/MASC using a systems biology approach. Systematic annotation showed that the complex contained proteins implicated in a wide range of cognitive processes, synaptic plasticity and psychiatric diseases. Protein domains were evolutionarily conserved from yeast, but enriched with signalling domains associated with the emergence of multicellularity. Mapping of protein-protein interactions to create a network representation of the complex revealed that simple principles underlie the functional organisation of both proteins and their clusters, with modularity reflecting functional specialisation. The known functional roles of NRC/MASC proteins suggest the complex co-ordinates signalling to diverse effector pathways underlying neuronal plasticity. Importantly, using quantitative data from synaptic plasticity experiments, our model correctly predicts robustness to mutations and drug interference. These studies of synapse proteome organisation suggest that molecular networks with simple design principles underpin synaptic signalling properties with important roles in physiology, behaviour and disease.

    Funded by: Medical Research Council: G90/93; Wellcome Trust

    Molecular systems biology 2006;2;2006.0023

  • Essential and overlapping roles for laminin alpha chains in notochord and blood vessel formation.

    Pollard SM, Parsons MJ, Kamei M, Kettleborough RN, Thomas KA, Pham VN, Bae MK, Scott A, Weinstein BM and Stemple DL

    Division of Developmental Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London, UK.

    Laminins are major constituents of basement membranes and have wide ranging functions during development and in the adult. They are a family of heterotrimeric molecules created through association of an alpha, beta and gamma chain. We previously reported that two zebrafish loci, grumpy (gup) and sleepy (sly), encode laminin beta1 and gamma1, which are important both for notochord differentiation and for proper intersegmental blood vessel (ISV) formation. In this study we show that bashful (bal) encodes laminin alpha1 (lama1). Although the strongest allele, bal(m190), is fully penetrant, when compared to gup or sly mutant embryos, bal mutants are not as severely affected, as only anterior notochord fails to differentiate and ISVs are unaffected. This suggests that other alpha chains, and hence other isoforms, act redundantly to laminin 1 in posterior notochord and ISV development. We identified cDNA sequences for lama2, lama4 and lama5 and disrupted the expression of each alone or in mutant embryos also lacking laminin alpha1. When expression of laminin alpha4 and laminin alpha1 are simultaneously disrupted, notochord differentiation and ISVs are as severely affected as sly or gup mutants. Moreover, live imaging of transgenic embryos expressing enhanced green fluorescent protein in forming ISVs reveals that the vascular defects in these embryos are due to an inability of ISV sprouts to migrate correctly along the intersegmental, normally laminin-rich regions.

    Developmental biology 2006;289;1;64-76

  • Global variation in copy number in the human genome.

    Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

    Funded by: NHLBI NIH HHS: T32 HL007627; Wellcome Trust: 077008, 077009, 077014

    Nature 2006;444;7118;444-54

  • Tectonic, a novel regulator of the Hedgehog pathway required for both activation and inhibition.

    Reiter JF and Skarnes WC

    Developmental and Stem Cell Biology Program, and Diabetes Center, University of California, San Francisco, 94143-0525, USA. jreiter@diabetes.ucsf.edu

    We report the identification of a novel protein that participates in Hedgehog-mediated patterning of the neural tube. This protein, named Tectonic, is the founding member of a previously undescribed family of evolutionarily conserved secreted and transmembrane proteins. During neural tube development, mouse Tectonic is required for formation of the most ventral cell types and for full Hedgehog (Hh) pathway activation. Epistasis analyses reveal that Tectonic modulates Hh signal transduction downstream of Smoothened (Smo) and Rab23. Interestingly, characterization of Tectonic Shh and Tectonic Smo double mutants indicates that Tectonic plays an additional role in repressing Hh pathway activity.

    Genes & development 2006;20;1;22-7

  • The genomic sequence and analysis of the swine major histocompatibility complex.

    Renard C, Hart E, Sehra H, Beasley H, Coggill P, Howe K, Harrow J, Gilbert J, Sims S, Rogers J, Ando A, Shigenari A, Shiina T, Inoko H, Chardon P and Beck S

    LREG INRA CEA, Jouy en Josas, France.

    We describe the generation and analysis of an integrated sequence map of a 2.4-Mb region of pig chromosome 7, comprising the classical class I region, the extended and classical class II regions, and the class III region of the major histocompatibility complex (MHC), also known as swine leukocyte antigen (SLA) complex. We have identified and manually annotated 151 loci, of which 121 are known genes (predicted to be functional), 18 are pseudogenes, 8 are novel CDS loci, 3 are novel transcripts, and 1 is a putative gene. Nearly all of these loci have homologues in other mammalian genomes but orthologues could be identified with confidence for only 123 genes. The 28 genes (including all the SLA class I genes) for which unambiguous orthology to genes within the human reference MHC could not be established are of particular interest with respect to porcine-specific MHC function and evolution. We have compared the porcine MHC to other mammalian MHC regions and identified the differences between them. In comparison to the human MHC, the main differences include the absence of HLA-A and other class I-like loci, the absence of HLA-DP-like loci, and the separation of the extended and classical class II regions from the rest of the MHC by insertion of the centromere. We show that the centromere insertion has occurred within a cluster of BTNL genes located at the boundary of the class II and III regions, which might have resulted in the loss of an orthologue to human C6orf10 from this region.

    Funded by: Wellcome Trust

    Genomics 2006;88;1;96-110

  • ATM mutations that cause ataxia-telangiectasia are breast cancer susceptibility alleles.

    Renwick A, Thompson D, Seal S, Kelly P, Chagtai T, Ahmed M, North B, Jayatilake H, Barfoot R, Spanova K, McGuffog L, Evans DG, Eccles D, Breast Cancer Susceptibility Collaboration (UK), Easton DF, Stratton MR and Rahman N

    Section of Cancer Genetics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK.

    We screened individuals from 443 familial breast cancer pedigrees and 521 controls for ATM sequence variants and identified 12 mutations in affected individuals and two in controls (P = 0.0047). The results demonstrate that ATM mutations that cause ataxia-telangiectasia in biallelic carriers are breast cancer susceptibility alleles in monoallelic carriers, with an estimated relative risk of 2.37 (95% confidence interval (c.i.) = 1.51-3.78, P = 0.0003). There was no evidence that other classes of ATM variant confer a risk of breast cancer.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02

    Nature genetics 2006;38;8;873-5

  • Evolutionary history of Salmonella typhi.

    Roumagnac P, Weill FX, Dolecek C, Baker S, Brisse S, Chinh NT, Le TA, Acosta CJ, Farrar J, Dougan G and Achtman M

    Max-Planck-Institut für Infektionsbiologie, Department of Molecular Biology, Charitéplatz 1, 10117 Berlin, Germany.

    For microbial pathogens, phylogeographic differentiation seems to be relatively common. However, the neutral population structure of Salmonella enterica serovar Typhi reflects the continued existence of ubiquitous haplotypes over millennia. In contrast, clinical use of fluoroquinolones has yielded at least 15 independent gyrA mutations within a decade and stimulated clonal expansion of haplotype H58 in Asia and Africa. Yet, antibiotic-sensitive strains and haplotypes other than H58 still persist despite selection for antibiotic resistance. Neutral evolution in Typhi appears to reflect the asymptomatic carrier state, and adaptive evolution depends on the rapid transmission of phenotypic changes through acute infections.

    Funded by: Wellcome Trust: 076962

    Science (New York, N.Y.) 2006;314;5803;1301-4

  • Otoferlin, defective in a human deafness form, is essential for exocytosis at the auditory ribbon synapse.

    Roux I, Safieddine S, Nouvian R, Grati M, Simmler MC, Bahloul A, Perfettini I, Le Gall M, Rostaing P, Hamard G, Triller A, Avan P, Moser T and Petit C

    Inserm UMRS587, Unité de Génétique des Déficits Sensoriels, Collège de France, Institut Pasteur, 25 rue du Dr Roux, 75015 Paris, France.

    The auditory inner hair cell (IHC) ribbon synapse operates with an exceptional temporal precision and maintains a high level of neurotransmitter release. However, the molecular mechanisms underlying IHC synaptic exocytosis are largely unknown. We studied otoferlin, a predicted C2-domain transmembrane protein, which is defective in a recessive form of human deafness. We show that otoferlin expression in the hair cells correlates with afferent synaptogenesis and find that otoferlin localizes to ribbon-associated synaptic vesicles. Otoferlin binds Ca(2+) and displays Ca(2+)-dependent interactions with the SNARE proteins syntaxin1 and SNAP25. Otoferlin deficient mice (Otof(-/-)) are profoundly deaf. Exocytosis in Otof(-/-) IHCs is almost completely abolished, despite normal ribbon synapse morphogenesis and Ca(2+) current. Thus, otoferlin is essential for a late step of synaptic vesicle exocytosis and may act as the major Ca(2+) sensor triggering membrane fusion at the IHC ribbon synapse.

    Cell 2006;127;2;277-89

  • An explosive-degrading cytochrome P450 activity and its targeted application for the phytoremediation of RDX.

    Rylott EL, Jackson RG, Edwards J, Womack GL, Seth-Smith HM, Rathbone DA, Strand SE and Bruce NC

    CNAP, Department of Biology, University of York, PO Box 373, York, YO10 5YW, UK.

    The widespread presence in the environment of hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), one of the most widely used military explosives, has raised concern owing to its toxicity and recalcitrance to degradation. To investigate the potential of plants to remove RDX from contaminated soil and water, we engineered Arabidopsis thaliana to express a bacterial gene xplA encoding an RDX-degrading cytochrome P450 (ref. 1). We demonstrate that the P450 domain of XplA is fused to a flavodoxin redox partner and catalyzes the degradation of RDX in the absence of oxygen. Transgenic A. thaliana expressing xplA removed and detoxified RDX from liquid media. As a model system for RDX phytoremediation, A. thaliana expressing xplA was grown in RDX-contaminated soil and found to be resistant to RDX phytotoxicity, producing shoot and root biomasses greater than those of wild-type plants. Our work suggests that expression of xplA in landscape plants may provide a suitable remediation strategy for sites contaminated by this class of explosives.

    Nature biotechnology 2006;24;2;216-9

  • Truncating mutations in the Fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles.

    Seal S, Thompson D, Renwick A, Elliott A, Kelly P, Barfoot R, Chagtai T, Jayatilake H, Ahmed M, Spanova K, North B, McGuffog L, Evans DG, Eccles D, Breast Cancer Susceptibility Collaboration (UK), Easton DF, Stratton MR and Rahman N

    Section of Cancer Genetics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK.

    We identified constitutional truncating mutations of the BRCA1-interacting helicase BRIP1 in 9/1,212 individuals with breast cancer from BRCA1/BRCA2 mutation-negative families but in only 2/2,081 controls (P = 0.0030), and we estimate that BRIP1 mutations confer a relative risk of breast cancer of 2.0 (95% confidence interval = 1.2-3.2, P = 0.012). Biallelic BRIP1 mutations were recently shown to cause Fanconi anemia complementation group J. Thus, inactivating truncating mutations of BRIP1, similar to those in BRCA2, cause Fanconi anemia in biallelic carriers and confer susceptibility to breast cancer in monoallelic carriers.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02

    Nature genetics 2006;38;11;1239-41

  • Colonic irritation.

    Sebaihia M and Thomson NR

    Nature reviews. Microbiology 2006;4;12;882-3

  • Comparison of the genome sequence of the poultry pathogen Bordetella avium with those of B. bronchiseptica, B. pertussis, and B. parapertussis reveals extensive diversity in surface structures associated with host interaction.

    Sebaihia M, Preston A, Maskell DJ, Kuzmiak H, Connell TD, King ND, Orndorff PE, Miyamoto DM, Thomson NR, Harris D, Goble A, Lord A, Murphy L, Quail MA, Rutter S, Squares R, Squares S, Woodward J, Parkhill J and Temple LM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Bordetella avium is a pathogen of poultry and is phylogenetically distinct from Bordetella bronchiseptica, Bordetella pertussis, and Bordetella parapertussis, which are other species in the Bordetella genus that infect mammals. In order to understand the evolutionary relatedness of Bordetella species and further the understanding of pathogenesis, we obtained the complete genome sequence of B. avium strain 197N, a pathogenic strain that has been extensively studied. With 3,732,255 base pairs of DNA and 3,417 predicted coding sequences, it has the smallest genome and gene complement of the sequenced bordetellae. In this study, the presence or absence of previously reported virulence factors from B. avium was confirmed, and the genetic bases for growth characteristics were elucidated. Over 1,100 genes present in B. avium but not in B. bronchiseptica were identified, and most were predicted to encode surface or secreted proteins that are likely to define an organism adapted to the avian rather than the mammalian respiratory tracts. These include genes coding for the synthesis of a polysaccharide capsule, hemagglutinins, a type I secretion system adjacent to two very large genes for secreted proteins, and unique genes for both lipopolysaccharide and fimbrial biogenesis. Three apparently complete prophages are also present. The BvgAS virulence regulatory system appears to have polymorphisms at a poly(C) tract that is involved in phase variation in other bordetellae. A number of putative iron-regulated outer membrane proteins were predicted from the sequence, and this regulation was confirmed experimentally for five of these.

    Funded by: NIDCR NIH HHS: T32 DE007034

    Journal of bacteriology 2006;188;16;6002-15

  • The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome.

    Sebaihia M, Wren BW, Mullany P, Fairweather NF, Minton N, Stabler R, Thomson NR, Roberts AP, Cerdeño-Tárraga AM, Wang H, Holden MT, Wright A, Churcher C, Quail MA, Baker S, Bason N, Brooks K, Chillingworth T, Cronin A, Davis P, Dowd L, Fraser A, Feltwell T, Hance Z, Holroyd S, Jagels K, Moule S, Mungall K, Price C, Rabbinowitsch E, Sharp S, Simmonds M, Stevens K, Unwin L, Whithead S, Dupuy B, Dougan G, Barrell B and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    We determined the complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain. Our analysis indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons. These mobile elements are putatively responsible for the acquisition by C. difficile of an extensive array of genes involved in antimicrobial resistance, virulence, host interaction and the production of surface structures. The metabolic capabilities encoded in the genome show multiple adaptations for survival and growth within the gut environment. The extreme genome variability was confirmed by whole-genome microarray analysis; it may reflect the organism's niche in the gut and should provide information on the evolution of virulence in this organism.

    Funded by: Wellcome Trust

    Nature genetics 2006;38;7;779-86

  • Where there's muck there's microbes.

    Seth-Smith H and Bentley S

    Nature reviews. Microbiology 2006;4;9;646-7

  • Microdeletion encompassing MAPT at chromosome 17q21.3 is associated with developmental delay and learning disability.

    Shaw-Smith C, Pittman AM, Willatt L, Martin H, Rickman L, Gribble S, Curley R, Cumming S, Dunn C, Kalaitzopoulos D, Porter K, Prigmore E, Krepischi-Santos AC, Varela MC, Koiffmann CP, Lees AJ, Rosenberg C, Firth HV, de Silva R and Carter NP

    University of Cambridge Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK. css@sanger.ac.uk

    Recently, the application of array-based comparative genomic hybridization (array CGH) has improved rates of detection of chromosomal imbalances in individuals with mental retardation and dysmorphic features. Here, we describe three individuals with learning disability and a heterozygous deletion at chromosome 17q21.3, detected in each case by array CGH. FISH analysis demonstrated that the deletions occurred as de novo events in each individual and were between 500 kb and 650 kb in size. A recently described 900-kb inversion that suppresses recombination between ancestral H1 and H2 haplotypes encompasses the deletion. We show that, in each trio, the parent of origin of the deleted chromosome 17 carries at least one H2 chromosome. This region of 17q21.3 shows complex genomic architecture with well-described low-copy repeats (LCRs). The orientation of LCRs flanking the deleted segment in inversion heterozygotes is likely to facilitate the generation of this microdeletion by means of non-allelic homologous recombination.

    Funded by: Medical Research Council: G0501560, G0501560(76517); Wellcome Trust

    Nature genetics 2006;38;9;1032-7

  • Genomic anatomy of the Tyrp1 (brown) deletion complex.

    Smyth IM, Wilming L, Lee AW, Taylor MS, Gautier P, Barlow K, Wallis J, Martin S, Glithero R, Phillimore B, Pelan S, Andrew R, Holt K, Taylor R, McLaren S, Burton J, Bailey J, Sims S, Squares J, Plumb B, Joy A, Gibson R, Gilbert J, Hart E, Laird G, Loveland J, Mudge J, Steward C, Swarbreck D, Harrow J, North P, Leaves N, Greystrong J, Coppola M, Manjunath S, Campbell M, Smith M, Strachan G, Tofts C, Boal E, Cobley V, Hunter G, Kimberley C, Thomas D, Cave-Berry L, Weston P, Botcherby MR, White S, Edgar R, Cross SH, Irvani M, Hummerich H, Simpson EH, Johnson D, Hunsicker PR, Little PF, Hubbard T, Campbell RD, Rogers J and Jackson IJ

    Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom.

    Chromosome deletions in the mouse have proven invaluable in the dissection of gene function. The brown deletion complex comprises >28 independent genome rearrangements, which have been used to identify several functional loci on chromosome 4 required for normal embryonic and postnatal development. We have constructed a 172-bacterial artificial chromosome contig that spans this 22-megabase (Mb) interval and have produced a contiguous, finished, and manually annotated sequence from these clones. The deletion complex is strikingly gene-poor, containing only 52 protein-coding genes (of which only 39 are supported by human homologues) and has several further notable genomic features, including several segments of >1 Mb, apparently devoid of a coding sequence. We have used sequence polymorphisms to finely map the deletion breakpoints and identify strong candidate genes for the known phenotypes that map to this region, including three lethal loci (l4Rn1, l4Rn2, and l4Rn3) and the fitness mutant brown-associated fitness (baf). We have also characterized misexpression of the basonuclin homologue, Bnc2, associated with the inversion-mediated coat color mutant white-based brown (B(w)). This study provides a molecular insight into the basis of several characterized mouse mutants, which will allow further dissection of this region by targeted or chemical mutagenesis.

    Funded by: Medical Research Council: MC_U123160651, MC_U127561112; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2006;103;10;3704-9

  • The Genetics of Type 2 Diabetes

    Stevenson,C., Barroso,I. and Wareham,N.;

    Nutritional Genomics: Impact on Health and Disease 2006;Chapter 13;222-65

  • Staking claims in the biotechnology Klondike.

    Sulston J

    The Human Genetics Commission, London, England. jspc@sanger.ac.uk

    Bulletin of the World Health Organization 2006;84;5;412-3

  • Mutations in FRMD7, a newly identified member of the FERM family, cause X-linked idiopathic congenital nystagmus.

    Tarpey P, Thomas S, Sarvananthan N, Mallya U, Lisgo S, Talbot CJ, Roberts EO, Awan M, Surendran M, McLean RJ, Reinecke RD, Langmann A, Lindner S, Koch M, Jain S, Woodruff G, Gale RP, Bastawrous A, Degg C, Droutsas K, Asproudis I, Zubcov AA, Pieh C, Veal CD, Machado RD, Backhouse OC, Baumber L, Constantinescu CS, Brodsky MC, Hunter DG, Hertle RW, Read RJ, Edkins S, O'Meara S, Parker A, Stevens C, Teague J, Wooster R, Futreal PA, Trembath RC, Stratton MR, Raymond FL and Gottlob I

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Idiopathic congenital nystagmus is characterized by involuntary, periodic, predominantly horizontal oscillations of both eyes. We identified 22 mutations in FRMD7 in 26 families with X-linked idiopathic congenital nystagmus. Screening of 42 singleton cases of idiopathic congenital nystagmus (28 male, 14 females) yielded three mutations (7%). We found restricted expression of FRMD7 in human embryonic brain and developing neural retina, suggesting a specific role in the control of eye movement and gaze stability.

    Funded by: Medical Research Council: G9826762; Wellcome Trust: 050211

    Nature genetics 2006;38;11;1242-4

  • Human chromosome 11 DNA sequence and analysis including novel gene identification.

    Taylor TD, Noguchi H, Totoki Y, Toyoda A, Kuroki Y, Dewar K, Lloyd C, Itoh T, Takeda T, Kim DW, She X, Barlow KF, Bloom T, Bruford E, Chang JL, Cuomo CA, Eichler E, FitzGerald MG, Jaffe DB, LaButti K, Nicol R, Park HS, Seaman C, Sougnez C, Yang X, Zimmer AR, Zody MC, Birren BW, Nusbaum C, Fujiyama A, Hattori M, Rogers J, Lander ES and Sakaki Y

    RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. taylor@gsc.riken.jp

    Chromosome 11, although average in size, is one of the most gene- and disease-rich chromosomes in the human genome. Initial gene annotation indicates an average gene density of 11.6 genes per megabase, including 1,524 protein-coding genes, some of which were identified using novel methods, and 765 pseudogenes. One-quarter of the protein-coding genes shows overlap with other genes. Of the 856 olfactory receptor genes in the human genome, more than 40% are located in 28 single- and multi-gene clusters along this chromosome. Out of the 171 disorders currently attributed to the chromosome, 86 remain for which the underlying molecular basis is not yet known, including several mendelian traits, cancer and susceptibility loci. The high-quality data presented here--nearly 134.5 million base pairs representing 99.8% coverage of the euchromatic sequence--provide scientists with a solid foundation for understanding the genetic basis of these disorders and other biological phenomena.

    Funded by: Medical Research Council: G0000107; Wellcome Trust

    Nature 2006;440;7083;497-500

  • Expression of mammalian GPCRs in C. elegans generates novel behavioural responses to human ligands.

    Teng MS, Dekkers MP, Ng BL, Rademakers S, Jansen G, Fraser AG and McCafferty J

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mst@sanger.ac.uk

    Background: G-protein-coupled receptors (GPCRs) play a crucial role in many biological processes and represent a major class of drug targets. However, purification of GPCRs for biochemical study is difficult and current methods of studying receptor-ligand interactions involve in vitro systems. Caenorhabditis elegans is a soil-dwelling, bacteria-feeding nematode that uses GPCRs expressed in chemosensory neurons to detect bacteria and environmental compounds, making this an ideal system for studying in vivo GPCR-ligand interactions. We sought to test this by functionally expressing two medically important mammalian GPCRs, somatostatin receptor 2 (Sstr2) and chemokine receptor 5 (CCR5) in the gustatory neurons of C. elegans.

    Results: Expression of Sstr2 and CCR5 in gustatory neurons allow C. elegans to specifically detect and respond to somatostatin and MIP-1alpha respectively in a robust avoidance assay. We demonstrate that mammalian heterologous GPCRs can signal via different endogenous Galpha subunits in C. elegans, depending on which cells it is expressed in. Furthermore, pre-exposure of GPCR transgenic animals to its ligand leads to receptor desensitisation and behavioural adaptation to subsequent ligand exposure, providing further evidence of integration of the mammalian GPCRs into the C. elegans sensory signalling machinery. In structure-function studies using a panel of somatostatin-14 analogues, we identified key residues involved in the interaction of somatostatin-14 with Sstr2.

    Conclusion: Our results illustrate a remarkable evolutionary plasticity in interactions between mammalian GPCRs and C. elegans signalling machinery, spanning 800 million years of evolution. This in vivo system, which imparts novel avoidance behaviour on C. elegans, thus provides a simple means of studying and screening interaction of GPCRs with extracellular agonists, antagonists and intracellular binding partners.

    Funded by: Wellcome Trust

    BMC biology 2006;4;22

  • The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Thomson NR, Howard S, Wren BW, Holden MT, Crossman L, Challis GL, Churcher C, Mungall K, Brooks K, Chillingworth T, Feltwell T, Abdellah Z, Hauser H, Jagels K, Maddison M, Moule S, Sanders M, Whitehead S, Quail MA, Dougan G, Parkhill J and Prentice MB

    The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. nrt@sanger.ac.uk

    The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B) and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common themes in the genome evolution of other human enteropathogens.

    Funded by: Wellcome Trust

    PLoS genetics 2006;2;12;e206

  • Genetic analysis of completely sequenced disease-associated MHC haplotypes identifies shuffling of segments in recent human history.

    Traherne JA, Horton R, Roberts AN, Miretti MM, Hurles ME, Stewart CA, Ashurst JL, Atrazhev AM, Coggill P, Palmer S, Almeida J, Sims S, Wilming LG, Rogers J, de Jong PJ, Carrington M, Elliott JF, Sawcer S, Todd JA, Trowsdale J and Beck S

    Department of Pathology, Immunology Division, University of Cambridge, Cambridge, United Kingdom.

    The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.

    Funded by: NCI NIH HHS: N01-CO-12400; Wellcome Trust: 048880

    PLoS genetics 2006;2;1;e9

  • Assaying chromosomal inversions by single-molecule haplotyping.

    Turner DJ, Shendure J, Porreca G, Church G, Green P, Tyler-Smith C and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Inversions are an important form of structural variation, but they are difficult to characterize, as their breakpoints often fall within inverted repeats. We have developed a method called 'haplotype fusion' in which an inversion breakpoint is genotyped by performing fusion PCR on single molecules of human genomic DNA. Fusing single-copy sequences bracketing an inversion breakpoint generates orientation-specific PCR products, exemplified by a genotyping assay for the int22 hemophilia A inversion on Xq28. Furthermore, we demonstrated that inversion events with breakpoints embedded within long (>100 kb) inverted repeats can be genotyped by haplotype-fusion PCR followed by bead-based single-molecule haplotyping on repeat-specific markers bracketing the inversion breakpoint. We illustrate this method by genotyping a Yp paracentric inversion sponsored by >300-kb-long inverted repeats. The generality of our methods to survey for, and genotype chromosomal inversions should help our understanding of the contribution of inversions to genomic variation, inherited diseases and cancer.

    Funded by: Wellcome Trust

    Nature methods 2006;3;6;439-45

  • The rise and fall of the ape Y chromosome?

    Tyler-Smith C, Howe K and Santos FR

    Nature genetics 2006;38;2;141-3

  • Loss of TSLC1 causes male infertility due to a defect at the spermatid stage of spermatogenesis.

    van der Weyden L, Arends MJ, Chausiaux OE, Ellis PJ, Lange UC, Surani MA, Affara N, Murakami Y, Adams DJ and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Tumor suppressor of lung cancer 1 (TSLC1), also known as SgIGSF, IGSF4, and SynCAM, is strongly expressed in spermatogenic cells undergoing the early and late phases of spermatogenesis (spermatogonia to zygotene spermatocytes and elongating spermatids to spermiation). Using embryonic stem cell technology to generate a null mutation of Tslc1 in mice, we found that Tslc1 null male mice were infertile. Tslc1 null adult testes showed that spermatogenesis had arrested at the spermatid stage, with degenerating and apoptotic spermatids sloughing off into the lumen. In adult mice, Tslc1 null round spermatids showed evidence of normal differentiation (an acrosomal cap and F-actin polarization indistinguishable from that of wild-type spermatids); however, the surviving spermatozoa were immature, malformed, found at very low levels in the epididymis, and rarely motile. Analysis of the first wave of spermatogenesis in Tslc1 null mice showed a delay in maturation by day 22 and degeneration of round spermatids by day 28. Expression profiling of the testes revealed that Tslc1 null mice showed increases in the expression levels of genes involved in apoptosis, adhesion, and the cytoskeleton. Taken together, these data show that Tslc1 is essential for normal spermatogenesis in mice.

    Molecular and cellular biology 2006;26;9;3595-609

  • Functional knockout of the matrilin-3 gene causes premature chondrocyte maturation to hypertrophy and increases bone mineral density and osteoarthritis.

    van der Weyden L, Wei L, Luo J, Yang X, Birk DE, Adams DJ, Bradley A and Chen Q

    Mouse Genomics Lab, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Mutations in the gene encoding matrilin-3 (MATN3), a noncollagenous extracellular matrix protein, have been reported in a variety of skeletal diseases, including multiple epiphyseal dysplasia, which is characterized by irregular ossification of the epiphyses and early-onset osteoarthritis, spondylo-epimetaphyseal dysplasia, and idiopathic hand osteoarthritis. To assess the role of matrilin-3 in the pathogenesis of these diseases, we generated Matn3 functional knockout mice using embryonic stem cell technology. In the embryonic growth plate of the developing long bones, Matn3 null chondrocytes prematurely became prehypertrophic and hypertrophic, forming an expanded zone of hypertrophy. This expansion was attenuated during the perinatal period, and Matn3 homozygous null mice were viable and showed no gross skeletal malformations at birth. However, by 18 weeks of age, Matn3 null mice had a significantly higher total body bone mineral density than Matn1 null mice or wild-type littermates. Aged Matn3 null mice were much more predisposed to develop severe osteoarthritis than their wild-type littermates. Here, we show that matrilin-3 plays a role in modulating chondrocyte differentiation during embryonic development, in controlling bone mineral density in adulthood, and in preventing osteoarthritis during aging. The lack of Matn3 does not lead to postnatal chondrodysplasia but accounts for higher incidence of osteoarthritis.

    Funded by: NIAMS NIH HHS: R01 AR044745

    The American journal of pathology 2006;169;2;515-27

  • Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands.

    Vernikos GS and Parkhill J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK. gsv@sanger.ac.uk

    Motivation: There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes.

    Results: We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events.

    Availability: The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter

    Contact: gsv@sanger.ac.uk

    Supplementary data are available at Bioinformatics online.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2006;22;18;2196-203

  • Nonrandom distribution of Burkholderia pseudomallei clones in relation to geographical location and virulence.

    Vesaratchavest M, Tumapa S, Day NP, Wuthiekanun V, Chierakul W, Holden MT, White NJ, Currie BJ, Spratt BG, Feil EJ and Peacock SJ

    Faculty of Tropical Medicine, Mahidol University, 420/6 Rajvithi Road, Bangkok 10400, Thailand, and Churchill Hospital, Oxford, UK.

    Burkholderia pseudomallei is a soil-dwelling saprophyte and the causative agent of melioidosis, a life-threatening human infection. Most cases are reported from northeast Thailand and northern Australia. Using multilocus sequence typing (MLST), we have compared (i) soil and invasive isolates from northeast Thailand and (ii) invasive isolates from Thailand and Australia. A total of 266 Thai B. pseudomallei isolates were characterized (83 soil and 183 invasive). These corresponded to 123 sequence types (STs), the most abundant being ST70 (n=21), ST167 (n=15), ST54 (n=12), and ST58 (n=11). Two clusters of related STs (clonal complexes) were identified; the larger clonal complex (CC48) did not conform to a simple pattern of radial expansion from an assumed ancestor, while a second (CC70) corresponded to a simple radial expansion from ST70. Despite the large number of STs, overall nucleotide diversity was low. Of the Thai isolates, those isolated from patients with melioidosis were overrepresented in the 10 largest clones (P<0.0001). There was a significant difference in the classification index between environmental and disease isolates (P<0.001), confirming that genotypes were not distributed randomly between the two samples. MLST profiles for 158 isolates from Australia (mainly disease associated) contained a number of STs (96) similar to that seen with the Thai invasive isolates, but no ST was found in both populations. There were also differences in diversity and allele frequency distribution between the two populations. This analysis reveals strong genetic differentiation on the basis of geographical isolation and a significant differentiation on the basis of virulence potential.

    Funded by: Wellcome Trust

    Journal of clinical microbiology 2006;44;7;2553-7

  • Single Nucleotide Polymorphism Analysis by Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry

    Whittaker,P., Bumpstead,S., Downes,K., Ghori,J. and Deloukas,P.;

    Cell Biology 2006;3;Chapter 48;463–470

  • Single Nucleotide Polymorphism Analysis by Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry

    Whittaker,P., Bumpstead,S., Downes,K., Ghori,J. and Deloukas,P.;

    Cell Biology 2006;3;Chapter 48;463–470

  • Schistosoma mansoni (Platyhelminthes, Trematoda) nuclear receptors: sixteen new members and a novel subfamily.

    Wu W, Niles EG, El-Sayed N, Berriman M and LoVerde PT

    Department of Microbiology and Immunology, and Center for Microbial Pathogenesis, School of Medicine and Biomedical Science, State University of New York, Buffalo, NY 14214, USA.

    Nuclear receptors (NRs) are important transcriptional modulators in metazoans. Sixteen new NRs were identified in the Platyhelminth trematode, Schistosoma mansoni. Three were found to possess novel tandem DNA-binding domains that identify a new subfamily of NR. Two NRs are homologues of the thyroid hormone receptor that previously were thought to be restricted to chordates. This study brings the total number of identified NR in S. mansoni to 21. Phylogenetic and comparative genomic analyses demonstrate that S. mansoni NRs share an evolutionary lineage with that of arthropods and vertebrates. Phylogenic analysis shows that more than half of the S. mansoni nuclear receptors evolved from a second gene duplication. As the second gene duplication of NRs was thought to be specific to vertebrates, our data challenge the current theory of NR evolution.

    Funded by: NIAID NIH HHS: AI046762, U01 AI48828

    Gene 2006;366;2;303-15

  • Spread of an inactive form of caspase-12 in humans is due to recent positive selection.

    Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, Sabeti P, Chen Y, Stalker J, Huckle E, Burton J, Leonard S, Rogers J and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, United Kingdom.

    The human caspase-12 gene is polymorphic for the presence or absence of a stop codon, which results in the occurrence of both active (ancestral) and inactive (derived) forms of the gene in the population. It has been shown elsewhere that carriers of the inactive gene are more resistant to severe sepsis. We have now investigated whether the inactive form has spread because of neutral drift or positive selection. We determined its distribution in a worldwide sample of 52 populations and resequenced the gene in 77 individuals from the HapMap Yoruba, Han Chinese, and European populations. There is strong evidence of positive selection from low diversity, skewed allele-frequency spectra, and the predominance of a single haplotype. We suggest that the inactive form of the gene arose in Africa approximately 100-500 thousand years ago (KYA) and was initially neutral or almost neutral but that positive selection beginning approximately 60-100 KYA drove it to near fixation. We further propose that its selective advantage was sepsis resistance in populations that experienced more infectious diseases as population sizes and densities increased.

    Funded by: Wellcome Trust

    American journal of human genetics 2006;78;4;659-70

  • Male demography in East Asia: a north-south contrast in human population expansion times.

    Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, Xu J, Du R, Fu S, Li P, Hurles ME, Yang H and Tyler-Smith C

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The human population has increased greatly in size in the last 100,000 years, but the initial stimuli to growth, the times when expansion started, and their variation between different parts of the world are poorly understood. We have investigated male demography in East Asia, applying a Bayesian full-likelihood analysis to data from 988 men representing 27 populations from China, Mongolia, Korea, and Japan typed with 45 binary and 16 STR markers from the Y chromosome. According to our analysis, the northern populations examined all started to expand in number between 34 (18-68) and 22 (12-39) thousand years ago (KYA), before the last glacial maximum at 21-18 KYA, while the southern populations all started to expand between 18 (6-47) and 12 (1-45) KYA, but then grew faster. We suggest that the northern populations expanded earlier because they could exploit the abundant megafauna of the "Mammoth Steppe," while the southern populations could increase in number only when a warmer and more stable climate led to more plentiful plant resources such as tubers.

    Funded by: Wellcome Trust

    Genetics 2006;172;4;2431-9

  • The genome of Rhizobium leguminosarum has recognizable core and accessory components.

    Young JP, Crossman LC, Johnston AW, Thomson NR, Ghazoui ZF, Hull KH, Wexler M, Curson AR, Todd JD, Poole PS, Mauchline TH, East AK, Quail MA, Churcher C, Arrowsmith C, Cherevach I, Chillingworth T, Clarke K, Cronin A, Davis P, Fraser A, Hance Z, Hauser H, Jagels K, Moule S, Mungall K, Norbertczak H, Rabbinowitsch E, Sanders M, Simmonds M, Whitehead S and Parkhill J

    Department of Biology, University of York, York, UK. jpy1@york.ac.uk.

    Background: Rhizobium leguminosarum is an alpha-proteobacterial N2-fixing symbiont of legumes that has been the subject of more than a thousand publications. Genes for the symbiotic interaction with plants are well studied, but the adaptations that allow survival and growth in the soil environment are poorly understood. We have sequenced the genome of R. leguminosarum biovar viciae strain 3841.

    Results: The 7.75 Mb genome comprises a circular chromosome and six circular plasmids, with 61% G+C overall. All three rRNA operons and 52 tRNA genes are on the chromosome; essential protein-encoding genes are largely chromosomal, but most functional classes occur on plasmids as well. Of the 7,263 protein-encoding genes, 2,056 had orthologs in each of three related genomes (Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti), and these genes were over-represented in the chromosome and had above average G+C. Most supported the rRNA-based phylogeny, confirming A. tumefaciens to be the closest among these relatives, but 347 genes were incompatible with this phylogeny; these were scattered throughout the genome but were over-represented on the plasmids. An unexpectedly large number of genes were shared by all three rhizobia but were missing from A. tumefaciens.

    Conclusion: Overall, the genome can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G+C, and located on the plasmids and chromosomal islands. The accessory genome has a different nucleotide composition from the core despite a long history of coexistence.

    Funded by: Wellcome Trust

    Genome biology 2006;7;4;R34

  • A deficiency in the region homologous to human 17q21.33-q23.2 causes heart defects in mice.

    Yu YE, Morishima M, Pao A, Wang DY, Wen XY, Baldini A and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA. yuejin.yn@roswellpark.org

    Several constitutional chromosomal rearrangements occur on human chromosome 17. Patients who carry constitutional deletions of 17q21.3-q24 exhibit distinct phenotypic features. Within the deletion interval, there is a genomic segment that is bounded by the myeloperoxidase and homeobox B1 genes. This genomic segment is syntenically conserved on mouse chromosome 11 and is bounded by the mouse homologs of the same genes (Mpo and HoxB1). To attain functional information about this syntenic segment in mice, we have generated a 6.9-Mb deletion [Df(11)18], the reciprocal duplication [Dp(11)18] between Mpo and Chad (the chondroadherin gene), and a 1.8-Mb deletion between Chad and HoxB1. Phenotypic analyses of the mutant mouse lines showed that the Dp(11)18/Dp(11)18 genotype was responsible for embryonic or adolescent lethality, whereas the Df(11)18/+ genotype was responsible for heart defects. The cardiovascular phenotype of the Df(11)18/+ fetuses was similar to those of patients who carried the deletions of 17q21.3-q24. Since heart defects were not detectable in Df(11)18/Dp(11)18 mice, the haplo-insufficiency of one or more genes located between Mpo and Chad may be responsible for the abnormal cardiovascular phenotype. Therefore, we have identified a new dosage-sensitive genomic region that may be critical for normal heart development in both mice and humans.

    Funded by: Wellcome Trust

    Genetics 2006;173;1;297-307

  • DNA sequence of human chromosome 17 and analysis of rearrangement in the human lineage.

    Zody MC, Garber M, Adams DJ, Sharpe T, Harrow J, Lupski JR, Nicholson C, Searle SM, Wilming L, Young SK, Abouelleil A, Allen NR, Bi W, Bloom T, Borowsky ML, Bugalter BE, Butler J, Chang JL, Chen CK, Cook A, Corum B, Cuomo CA, de Jong PJ, DeCaprio D, Dewar K, FitzGerald M, Gilbert J, Gibson R, Gnerre S, Goldstein S, Grafham DV, Grocock R, Hafez N, Hagopian DS, Hart E, Norman CH, Humphray S, Jaffe DB, Jones M, Kamal M, Khodiyar VK, LaButti K, Laird G, Lehoczky J, Liu X, Lokyitsang T, Loveland J, Lui A, Macdonald P, Major JE, Matthews L, Mauceli E, McCarroll SA, Mihalev AH, Mudge J, Nguyen C, Nicol R, O'Leary SB, Osoegawa K, Schwartz DC, Shaw-Smith C, Stankiewicz P, Steward C, Swarbreck D, Venkataraman V, Whittaker CA, Yang X, Zimmer AR, Bradley A, Hubbard T, Birren BW, Rogers J, Lander ES and Nusbaum C

    Broad Institute of MIT and Harvard, 7 Cambridge Center, Massachusetts 02142, USA.

    Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome. It is also enriched in segmental duplications, ranking third in density among the autosomes. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.

    Funded by: Medical Research Council: G0000107; Wellcome Trust: 077187

    Nature 2006;440;7087;1045-9

* quick link - http://q.sanger.ac.uk/pk0dctcg