Sanger Institute - Publications 2002

Number of papers published in 2002: 34

  • Induced mitotic recombination: a switch in time.

    Adams DJ and Bradley A

    Nature genetics 2002;30;1;6-7

  • Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2).

    Bentley SD, Chater KF, Cerdeño-Tárraga AM, Challis GL, Thomson NR, James KD, Harris DE, Quail MA, Kieser H, Harper D, Bateman A, Brown S, Chandra G, Chen CW, Collins M, Cronin A, Fraser A, Goble A, Hidalgo J, Hornsby T, Howarth S, Huang CH, Kieser T, Larke L, Murphy L, Oliver K, O'Neil S, Rabbinowitsch E, Rajandream MA, Rutherford K, Rutter S, Seeger K, Saunders D, Sharp S, Squares R, Squares S, Taylor K, Warren T, Wietzorrek A, Woodward J, Barrell BG, Parkhill J and Hopwood DA

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. sdb@sanger.ac.uk

    Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

    Nature 2002;417;6885;141-7

  • The architecture of variant surface glycoprotein gene expression sites in Trypanosoma brucei.

    Berriman M, Hall N, Sheader K, Bringaud F, Tiwari B, Isobe T, Bowman S, Corton C, Clark L, Cross GA, Hoek M, Zanders T, Berberof M, Borst P and Rudenko G

    The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Trypanosoma brucei evades the immune system by switching between Variant Surface Glycoprotein (VSG) genes. The active VSG gene is transcribed in one of approximately 20 telomeric expression sites (ESs). It has been postulated that ES polymorphism plays a role in host adaptation. To gain more insight into ES architecture, we have determined the complete sequence of Bacterial Artificial Chromosomes (BACs) containing DNA from three ESs and their flanking regions. There was variation in the order and number of ES-associated genes (ESAGs). ESAGs 6 and 7, encoding transferrin receptor subunits, are the only ESAGs with functional copies in every ES that has been sequenced until now. A BAC clone containing the VO2 ES sequences comprised approximately half of a 330 kb 'intermediate' chromosome. The extensive similarity between this intermediate chromosome and the left telomere of T. brucei 927 chromosome I, suggests that this previously uncharacterised intermediate size class of chromosomes could have arisen from breakage of megabase chromosomes. Unexpected conservation of sequences, including pseudogenes, indicates that the multiple ESs could have arisen through a relatively recent amplification of a single ES.

    Funded by: NIAID NIH HHS: AI21729; Wellcome Trust: 095161

    Molecular and biochemical parasitology 2002;122;2;131-40

  • Databases and tools for browsing genomes.

    Birney E, Clamp M and Hubbard T

    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom. birney@ebi.ac.uk

    To maximize the value of genome sequences they need to be integrated with other types of biological data and with each other. The entire collection of data then needs to be made available in a way that is easy to view and mine for complex relationships. The recently determined vertebrate genome sequences of human and mouse are so large that building the infrastructure to manage these datasets is a major challenge. This article reviews the database systems and tools for analysis that have so far been developed to address this.

    Annual review of genomics and human genetics 2002;3;293-310

  • A global analysis of Caenorhabditis elegans operons.

    Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M and Kim SK

    Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Box B121, 4200 E. 9th Avenue, Denver, Colorado 80262, USA. tom.blumenthal@uchsc.edu

    The nematode worm Caenorhabditis elegans and its relatives are unique among animals in having operons. Operons are regulated multigene transcription units, in which polycistronic pre-messenger RNA (pre-mRNA coding for multiple peptides) is processed to monocistronic mRNAs. This occurs by 3' end formation and trans-splicing using the specialized SL2 small nuclear ribonucleoprotein particle for downstream mRNAs. Previously, the correlation between downstream location in an operon and SL2 trans-splicing has been strong, but anecdotal. Although only 28 operons have been reported, the complete sequence of the C. elegans genome reveals numerous gene clusters. To determine how many of these clusters represent operons, we probed full-genome microarrays for SL2-containing mRNAs. We found significant enrichment for about 1,200 genes, including most of a group of several hundred genes represented by complementary DNAs that contain SL2 sequence. Analysis of their genomic arrangements indicates that >90% are downstream genes, falling in 790 distinct operons. Our evidence indicates that the genome contains at least 1,000 operons, 2 8 genes long, that contain about 15% of all C. elegans genes. Numerous examples of co-transcription of genes encoding functionally related proteins are evident. Inspection of the operon list should reveal previously unknown functional relationships.

    Nature 2002;417;6891;851-4

  • Mining the mouse genome.

    Bradley A

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Nature 2002;420;6915;512-4

  • Human genome. HapMap launched with pledges of $100 million.

    Couzin J

    Science (New York, N.Y.) 2002;298;5595;941-2

  • Mutations of the BRAF gene in human cancer.

    Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, Teague J, Woffendin H, Garnett MJ, Bottomley W, Davis N, Dicks E, Ewing R, Floyd Y, Gray K, Hall S, Hawes R, Hughes J, Kosmidou V, Menzies A, Mould C, Parker A, Stevens C, Watt S, Hooper S, Wilson R, Jayatilake H, Gusterson BA, Cooper C, Shipley J, Hargrave D, Pritchard-Jones K, Maitland N, Chenevix-Trench G, Riggins GJ, Bigner DD, Palmieri G, Cossu A, Flanagan A, Nicholson A, Ho JW, Leung SY, Yuen ST, Weber BL, Seigler HF, Darrow TL, Paterson H, Marais R, Marshall CJ, Wooster R, Stratton MR and Futreal PA

    Cancer Genome Project, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Cancers arise owing to the accumulation of mutations in critical genes that alter normal programmes of cell proliferation, differentiation and death. As the first stage of a systematic genome-wide screen for these genes, we have prioritized for analysis signalling pathways in which at least one gene is mutated in human cancer. The RAS RAF MEK ERK MAP kinase pathway mediates cellular responses to growth signals. RAS is mutated to an oncogenic form in about 15% of human cancer. The three RAF genes code for cytoplasmic serine/threonine kinases that are regulated by binding RAS. Here we report BRAF somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers. All mutations are within the kinase domain, with a single substitution (V599E) accounting for 80%. Mutated BRAF proteins have elevated kinase activity and are transforming in NIH3T3 cells. Furthermore, RAS function is not required for the growth of cancer cell lines with the V599E mutation. As BRAF is a serine/threonine kinase that is commonly activated by somatic point mutation in human cancer, it may provide new therapeutic opportunities in malignant melanoma.

    Nature 2002;417;6892;949-54

  • A first-generation linkage disequilibrium map of human chromosome 22.

    Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, Dibling T, Tinsley E, Kirby S, Carter D, Papaspyridonos M, Livingstone S, Ganske R, Lõhmussaar E, Zernant J, Tõnisson N, Remm M, Mägi R, Puurand T, Vilo J, Kurg A, Rice K, Deloukas P, Mott R, Metspalu A, Bentley DR, Cardon LR and Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    DNA sequence variants in specific genes or regions of the human genome are responsible for a variety of phenotypes such as disease risk or variable drug response. These variants can be investigated directly, or through their non-random associations with neighbouring markers (called linkage disequilibrium (LD)). Here we report measurement of LD along the complete sequence of human chromosome 22. Duplicate genotyping and analysis of 1,504 markers in Centre d'Etude du Polymorphisme Humain (CEPH) reference families at a median spacing of 15 kilobases (kb) reveals a highly variable pattern of LD along the chromosome, in which extensive regions of nearly complete LD up to 804 kb in length are interspersed with regions of little or no detectable LD. The LD patterns are replicated in a panel of unrelated UK Caucasians. There is a strong correlation between high LD and low recombination frequency in the extant genetic map, suggesting that historical and contemporary recombination rates are similar. This study demonstrates the feasibility of developing genome-wide maps of LD.

    Nature 2002;418;6897;544-8

  • Computational detection and location of transcription start sites in mammalian genomic DNA.

    Down TA and Hubbard TJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. td2@sanger.ac.uk

    Transcription, the process whereby RNA copies are made from sections of the DNA genome, is directed by promoter regions. These define the transcription start site, and also the set of cellular conditions under which the promoter is active. At least in more complex species, it appears to be common for genes to have several different transcription start sites, which may be active under different conditions. Eukaryotic promoters are complex and fairly diffuse structures, which have proven hard to detect in silico. We show that a novel hybrid machine-learning method is able to build useful models of promoters for >50% of human transcription start sites. We estimate specificity to be >70%, and demonstrate good positional accuracy. Based on the structure of our learned models, we conclude that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.

    Genome research 2002;12;3;458-61

  • Genome sequence of the human malaria parasite Plasmodium falciparum.

    Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM and Barrell B

    The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. gardner@tigr.org

    The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

    Funded by: NIAID NIH HHS: R01 AI028398; Wellcome Trust: 061524

    Nature 2002;419;6906;498-511

  • Sequence and analysis of chromosome 2 of Dictyostelium discoideum.

    Glöckner G, Eichinger L, Szafranski K, Pachebat JA, Bankier AT, Dear PH, Lehmann R, Baumgart C, Parra G, Abril JF, Guigó R, Kumpf K, Tunggal B, Cox E, Quail MA, Platzer M, Rosenthal A, Noegel AA and Dictyostelium Genome Sequencing Consortium

    IMB Jena, Department of Genome Analysis, Beutenbergstr. 11, 07745 Jena, Germany. gernot@imb-jena.de

    The genome of the lower eukaryote Dictyostelium discoideum comprises six chromosomes. Here we report the sequence of the largest, chromosome 2, which at 8 megabases (Mb) represents about 25% of the genome. Despite an A + T content of nearly 80%, the chromosome codes for 2,799 predicted protein coding genes and 73 transfer RNA genes. This gene density, about 1 gene per 2.6 kilobases (kb), is surpassed only by Saccharomyces cerevisiae (one per 2 kb) and is similar to that of Schizosaccharomyces pombe (one per 2.5 kb). If we assume that the other chromosomes have a similar gene density, we can expect around 11,000 genes in the D. discoideum genome. A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes. This analysis strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom, placing it close to the base of metazoan evolution.

    Nature 2002;418;6893;79-85

  • A physical map of the mouse genome.

    Gregory SG, Sekhon M, Schein J, Zhao S, Osoegawa K, Scott CE, Evans RS, Burridge PW, Cox TV, Fox CA, Hutton RD, Mullenger IR, Phillips KJ, Smith J, Stalker J, Threadgold GJ, Birney E, Wylie K, Chinwalla A, Wallis J, Hillier L, Carter J, Gaige T, Jaeger S, Kremitzki C, Layman D, Maas J, McGrane R, Mead K, Walker R, Jones S, Smith M, Asano J, Bosdet I, Chan S, Chittaranjan S, Chiu R, Fjell C, Fuhrmann D, Girn N, Gray C, Guin R, Hsiao L, Krzywinski M, Kutsche R, Lee SS, Mathewson C, McLeavy C, Messervier S, Ness S, Pandoh P, Prabhu AL, Saeedi P, Smailus D, Spence L, Stott J, Taylor S, Terpstra W, Tsai M, Vardy J, Wye N, Yang G, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Shvartsbeyn A, Gebregeorgis E, Krol M, Russell D, Overton L, Malek JA, Holmes M, Heaney M, Shetty J, Feldblyum T, Nierman WC, Catanese JJ, Hubbard T, Waterston RH, Rogers J, de Jong PJ, Fraser CM, Marra M, McPherson JD and Bentley DR

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs were aligned to the human genome sequence on the basis of 51,486 homology matches, thus enabling use of the conserved synteny (correspondence between chromosome blocks) of the two genomes to accelerate construction of the mouse map. The map provides a framework for assembly of whole-genome shotgun sequence data, and a tile path of clones for generation of the reference sequence. Definition of the human-mouse alignment at this level of resolution enables identification of a mouse clone that corresponds to almost any position in the human genome. The human sequence may be used to facilitate construction of other mammalian genome maps using the same strategy.

    Funded by: NHGRI NIH HHS: U01 HG002137-03

    Nature 2002;418;6899;743-50

  • Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13.

    Hall N, Pain A, Berriman M, Churcher C, Harris B, Harris D, Mungall K, Bowman S, Atkin R, Baker S, Barron A, Brooks K, Buckee CO, Burrows C, Cherevach I, Chillingworth C, Chillingworth T, Christodoulou Z, Clark L, Clark R, Corton C, Cronin A, Davies R, Davis P, Dear P, Dearden F, Doggett J, Feltwell T, Goble A, Goodhead I, Gwilliam R, Hamlin N, Hance Z, Harper D, Hauser H, Hornsby T, Holroyd S, Horrocks P, Humphray S, Jagels K, James KD, Johnson D, Kerhornou A, Knights A, Konfortov B, Kyes S, Larke N, Lawson D, Lennard N, Line A, Maddison M, McLean J, Mooney P, Moule S, Murphy L, Oliver K, Ormond D, Price C, Quail MA, Rabbinowitsch E, Rajandream MA, Rutter S, Rutherford KM, Sanders M, Simmonds M, Seeger K, Sharp S, Smith R, Squares R, Squares S, Stevens K, Taylor K, Tivey A, Unwin L, Whitehead S, Woodward J, Sulston JE, Craig A, Newbold C and Barrell BG

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. nh1@sanger.ac.uk

    Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.

    Nature 2002;419;6906;527-31

  • The genome sequence of the malaria mosquito Anopheles gambiae.

    Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH and Hoffman SL

    Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA. robert.holt@celera.com

    Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

    Funded by: NIAID NIH HHS: R01AI44273, U01AI48846, U01AI50687

    Science (New York, N.Y.) 2002;298;5591;129-49

  • QuickTree: building huge Neighbour-Joining trees of protein sequences.

    Howe K, Bateman A and Durbin R

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. klh@sanger.ac.uk

    We have written a fast implementation of the popular Neighbor-Joining tree building algorithm. QuickTree allows the reconstruction of phylogenies for very large protein families (including the largest Pfam alignment containing 27000 HIV GP120 glycoprotein sequences) that would be infeasible using other popular methods.

    Bioinformatics (Oxford, England) 2002;18;11;1546-7

  • GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

    Howe KL, Chothia T and Durbin R

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    We describe a method (implemented in a program, GAZE) for assembling arbitrary evidence for individual gene components (features) into predictions of complete gene structures. Our system is generic in that both the features themselves, and the model of gene structure against which potential assemblies are validated and scored, are external to the system and supplied by the user. GAZE uses a dynamic programming algorithm to obtain the highest scoring gene structure according to the model and posterior probabilities that each input feature is part of a gene. A novel pruning strategy ensures that the algorithm has a run-time effectively linear in sequence length. To demonstrate the flexibility of our system in the incorporation of additional evidence into the gene prediction process, we show how it can be used to both represent nonstandard gene structures (in the form of trans-spliced genes in Caenorhabditis elegans), and make use of similarity information (in the form of Expressed Sequence Tag alignments), while requiring no change to the underlying software. GAZE is available at http://www.sanger.ac.uk/Software/analysis/GAZE.

    Genome research 2002;12;9;1418-27

  • The Ensembl genome database project.

    Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I and Clamp M

    The Wellcome Trust Sanger Institute and European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

    Nucleic acids research 2002;30;1;38-41

  • MaxBench: evaluation of sequence and structure comparison methods.

    Leplae R and Hubbard TJ

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. lp1@sanger.ac.uk

    Summary: MaxBench is a web-based system available for evaluating the results of sequence and structure comparison methods, based on the SCOP protein domain classification. The system makes it easy for developers to both compare the overall performance of their methods to standard algorithms and investigate the results of individual comparisons.

    Availability: http://www.sanger.ac.uk/Users/lp1/MaxBench/

    Bioinformatics (Oxford, England) 2002;18;3;494-5

  • SCOP database in 2002: refinements accommodate structural genomics.

    Lo Conte L, Brenner SE, Hubbard TJ, Chothia C and Murzin AG

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK. loredana@mrc-lmb.cam.ac.uk

    The SCOP (Structural Classification of Proteins) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. Protein domains in SCOP are grouped into species and hierarchically classified into families, superfamilies, folds and classes. Recently, we introduced a new set of features with the aim of standardizing access to the database, and providing a solid basis to manage the increasing number of experimental structures expected from structural genomics projects. These features include: a new set of identifiers, which uniquely identify each entry in the hierarchy; a compact representation of protein domain classification; a new set of parseable files, which fully describe all domains in SCOP and the hierarchy itself. These new features are reflected in the ASTRAL compendium. The SCOP search engine has also been updated, and a set of links to external resources added at the level of domain entries. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.

    Nucleic acids research 2002;30;1;264-7

  • The transcriptional program of meiosis and sporulation in fission yeast.

    Mata J, Lyne R, Burns G and Bähler J

    The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK.

    Sexual reproduction requires meiosis to produce haploid gametes, which in turn can fuse to regenerate a diploid organism. We have studied the transcriptional program that drives this developmental process in Schizosaccharomyces pombe using DNA microarrays. Here we show that hundreds of genes are regulated in successive waves of transcription that correlate with major biological events of meiosis and sporulation. Each wave is associated with specific promoter motifs. Clusters of neighboring genes (mostly close to telomeres) are co-expressed early in the process, which reflects a more global control of these genes. We find that two Atf-like transcription factors are essential for the expression of late genes and formation of spores, and identify dozens of potential Atf target genes. Comparison with the meiotic program of the distantly related Saccharomyces cerevisiae reveals an unexpectedly small shared meiotic transcriptome, suggesting that the transcriptional regulation of meiosis evolved independently in both species.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Nature genetics 2002;32;1;143-7

  • Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations.

    Meijers-Heijboer H, van den Ouweland A, Klijn J, Wasielewski M, de Snoo A, Oldenburg R, Hollestelle A, Houben M, Crepin E, van Veghel-Plandsoen M, Elstrodt F, van Duijn C, Bartels C, Meijers C, Schutte M, McGuffog L, Thompson D, Easton D, Sodha N, Seal S, Barfoot R, Mangion J, Chang-Claude J, Eccles D, Eeles R, Evans DG, Houlston R, Murday V, Narod S, Peretz T, Peto J, Phelan C, Zhang HX, Szabo C, Devilee P, Goldgar D, Futreal PA, Nathanson KL, Weber B, Rahman N, Stratton MR and CHEK2-Breast Cancer Consortium

    Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands.

    Mutations in BRCA1 and BRCA2 confer a high risk of breast and ovarian cancer, but account for only a small fraction of breast cancer susceptibility. To find additional genes conferring susceptibility to breast cancer, we analyzed CHEK2 (also known as CHK2), which encodes a cell-cycle checkpoint kinase that is implicated in DNA repair processes involving BRCA1 and p53 (refs 3,4,5). We show that CHEK2(*)1100delC, a truncating variant that abrogates the kinase activity, has a frequency of 1.1% in healthy individuals. However, this variant is present in 5.1% of individuals with breast cancer from 718 families that do not carry mutations in BRCA1 or BRCA2 (P = 0.00000003), including 13.5% of individuals from families with male breast cancer (P = 0.00015). We estimate that the CHEK2(*)1100delC variant results in an approximately twofold increase of breast cancer risk in women and a tenfold increase of risk in men. By contrast, the variant confers no increased cancer risk in carriers of BRCA1 or BRCA2 mutations. This suggests that the biological mechanisms underlying the elevated risk of breast cancer in CHEK2 mutation carriers are already subverted in carriers of BRCA1 or BRCA2 mutations, which is consistent with participation of the encoded proteins in the same pathway.

    Nature genetics 2002;31;1;55-9

  • Comparative ab initio prediction of gene structures using pair HMMs.

    Meyer IM and Durbin R

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. im1@sanger.ac.uk

    We present a novel comparative method for the ab initio prediction of protein coding genes in eukaryotic genomes. The method simultaneously predicts the gene structures of two un-annotated input DNA sequences which are homologous to each other and retrieves the subsequences which are conserved between the two DNA sequences. It is capable of predicting partial, complete and multiple genes and can align pairs of genes which differ by events of exon-fusion or exon-splitting. The method employs a probabilistic pair hidden Markov model. We generate annotations using our model with two different algorithms: the Viterbi algorithm in its linear memory implementation and a new heuristic algorithm, called the stepping stone, for which both memory and time requirements scale linearly with the sequence length. We have implemented the model in a computer program called DOUBLESCAN. In this article, we introduce the method and confirm the validity of the approach on a test set of 80 pairs of orthologous DNA sequences from mouse and human. More information can be found at: http://www.sanger.ac.uk/Software/analysis/doublescan/

    Bioinformatics (Oxford, England) 2002;18;10;1309-18

  • The significance of performance ranking in CASP--response to Marti-Renom et al.

    Moult J, Fidelis K, Zemla A, Hubbard T and Tramontano A

    Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville 20850, USA. jmoult@tunc.org

    Structure (London, England : 1993) 2002;10;3;291-2; discussion 292-3

  • Initial sequencing and comparative analysis of the mouse genome.

    Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC and Lander ES

    Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA. waterston@gs.washington.edu

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

    Funded by: NHGRI NIH HHS: U54 HG003273

    Nature 2002;420;6915;520-62

  • Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

    Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, Yamanaka I, Kiyosawa H, Yagi K, Tomaru Y, Hasegawa Y, Nogami A, Schönbach C, Gojobori T, Baldarelli R, Hill DP, Bult C, Hume DA, Quackenbush J, Schriml LM, Kanapin A, Matsuda H, Batalov S, Beisel KW, Blake JA, Bradt D, Brusic V, Chothia C, Corbani LE, Cousins S, Dalla E, Dragani TA, Fletcher CF, Forrest A, Frazer KS, Gaasterland T, Gariboldi M, Gissi C, Godzik A, Gough J, Grimmond S, Gustincich S, Hirokawa N, Jackson IJ, Jarvis ED, Kanai A, Kawaji H, Kawasawa Y, Kedzierski RM, King BL, Konagaya A, Kurochkin IV, Lee Y, Lenhard B, Lyons PA, Maglott DR, Maltais L, Marchionni L, McKenzie L, Miki H, Nagashima T, Numata K, Okido T, Pavan WJ, Pertea G, Pesole G, Petrovsky N, Pillai R, Pontius JU, Qi D, Ramachandran S, Ravasi T, Reed JC, Reed DJ, Reid J, Ring BZ, Ringwald M, Sandelin A, Schneider C, Semple CA, Setou M, Shimada K, Sultana R, Takenaka Y, Taylor MS, Teasdale RD, Tomita M, Verardo R, Wagner L, Wahlestedt C, Wang Y, Watanabe Y, Wells C, Wilming LG, Wynshaw-Boris A, Yanagisawa M, Yang I, Yang L, Yuan Z, Zavolan M, Zhu Y, Zimmer A, Carninci P, Hayatsu N, Hirozane-Kishikawa T, Konno H, Nakamura M, Sakazume N, Sato K, Shiraki T, Waki K, Kawai J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Imotani K, Ishii Y, Itoh M, Kagawa I, Miyazaki A, Sakai K, Sasaki D, Shibata K, Shinagawa A, Yasunishi A, Yoshino M, Waterston R, Lander ES, Rogers J, Birney E, Hayashizaki Y, FANTOM Consortium and RIKEN Genome Exploration Research Group Phase I & II Team

    [1] Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.

    Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.

    Nature 2002;420;6915;563-73

  • Mutation of TBCE causes hypoparathyroidism-retardation-dysmorphism and autosomal recessive Kenny-Caffey syndrome.

    Parvari R, Hershkovitz E, Grossman N, Gorodischer R, Loeys B, Zecic A, Mortier G, Gregory S, Sharony R, Kambouris M, Sakati N, Meyer BF, Al Aqeel AI, Al Humaidan AK, Al Zanhrani F, Al Swaid A, Al Othman J, Diaz GA, Weiner R, Khan KT, Gordon R, Gelb BD and HRD/Autosomal Recessive Kenny-Caffey Syndrome Consortium

    Department of Developmental Molecular Genetics, Soroka Medical Center and Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva 84105, Israel.

    The syndrome of congenital hypoparathyroidism, mental retardation, facial dysmorphism and extreme growth failure (HRD or Sanjad-Sakati syndrome; OMIM 241410) is an autosomal recessive disorder reported almost exclusively in Middle Eastern populations. A similar syndrome with the additional features of osteosclerosis and recurrent bacterial infections has been classified as autosomal recessive Kenny-Caffey syndrome (AR-KCS; OMIM 244460). Both traits have previously been mapped to chromosome 1q43-44 (refs 5,6) and, despite the observed clinical variability, share an ancestral haplotype, suggesting a common founder mutation. We describe refinement of the critical region to an interval of roughly 230 kb and identification of deletion and truncation mutations of TBCE in affected individuals. The gene TBCE encodes one of several chaperone proteins required for the proper folding of alpha-tubulin subunits and the formation of alpha-beta-tubulin heterodimers. Analysis of diseased fibroblasts and lymphoblastoid cells showed lower microtubule density at the microtubule-organizing center (MTOC) and perturbed microtubule polarity in diseased cells. Immunofluorescence and ultrastructural studies showed disturbances in subcellular organelles that require microtubules for membrane trafficking, such as the Golgi and late endosomal compartments. These findings demonstrate that HRD and AR-KCS are chaperone diseases caused by a genetic defect in the tubulin assembly pathway, and establish a potential connection between tubulin physiology and the development of the parathyroid.

    Nature genetics 2002;32;3;448-52

  • Restricting genome data won't stop bioterrorism.

    Read TD and Parkhill J

    Nature 2002;417;6887;379

  • p53 mutant mice that display early ageing-associated phenotypes.

    Tyner SD, Venkatachalam S, Choi J, Jones S, Ghebranious N, Igelmann H, Lu X, Soron G, Cooper B, Brayton C, Park SH, Thompson T, Karsenty G, Bradley A and Donehower LA

    Cell and Molecular Biology Program, Baylor College of Medicine, Houston, TX 77030, USA.

    The p53 tumour suppressor is activated by numerous stressors to induce apoptosis, cell cycle arrest, or senescence. To study the biological effects of altered p53 function, we generated mice with a deletion mutation in the first six exons of the p53 gene that express a truncated RNA capable of encoding a carboxy-terminal p53 fragment. This mutation confers phenotypes consistent with activated p53 rather than inactivated p53. Mutant (p53+/m) mice exhibit enhanced resistance to spontaneous tumours compared with wild-type (p53+/+) littermates. As p53+/m mice age, they display an early onset of phenotypes associated with ageing. These include reduced longevity, osteoporosis, generalized organ atrophy and a diminished stress tolerance. A second line of transgenic mice containing a temperature-sensitive mutant allele of p53 also exhibits early ageing phenotypes. These data suggest that p53 has a role in regulating organismal ageing.

    Nature 2002;415;6867;45-53

  • Tools for targeted manipulation of the mouse genome.

    van der Weyden L, Adams DJ and Bradley A

    The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1SA, United Kingdom.

    In the postgenomic era the mouse will be central to the challenge of ascribing a function to the 40,000 or so genes that constitute our genome. In this review, we summarize some of the classic and modern approaches that have fueled the recent dramatic explosion in mouse genetics. Together with the sequencing of the mouse genome, these tools will have a profound effect on our ability to generate new and more accurate mouse models and thus provide a powerful insight into the function of human genes during the processes of both normal development and disease.

    Physiological genomics 2002;11;3;133-64

  • Cancer: stuck at first base.

    van der Weyden L, Jonkers J and Bradley A

    Nature 2002;419;6903;127-8

  • The mosaic structure of variation in the laboratory mouse genome.

    Wade CM, Kulbokas EJ, Kirby AW, Zody MC, Mullikin JC, Lander ES, Lindblad-Toh K and Daly MJ

    Whitehead Institute for Biomedical Research and Whitehead/MIT Center for Genome Research, 9 Cambridge Center, Cambridge, Massachusetts 02139, USA.

    Most inbred laboratory mouse strains are known to have originated from a mixed but limited founder population in a few laboratories. However, the effect of this breeding history on patterns of genetic variation among these strains and the implications for their use are not well understood. Here we present an analysis of the fine structure of variation in the mouse genome, using single nucleotide polymorphisms (SNPs). When the recently assembled genome sequence from the C57BL/6J strain is aligned with sample sequence from other strains, we observe long segments of either extremely high (approximately 40 SNPs per 10 kb) or extremely low (approximately 0.5 SNPs per 10 kb) polymorphism rates. In all strain-to-strain comparisons examined, only one-third of the genome falls into long regions (averaging >1 Mb) of a high SNP rate, consistent with estimated divergence rates between Mus musculus domesticus and either M. m. musculus or M. m. castaneus. These data suggest that the genomes of these inbred strains are mosaics with the vast majority of segments derived from domesticus and musculus sources. These observations have important implications for the design and interpretation of positional cloning experiments.

    Nature 2002;420;6915;574-8

  • The genome sequence of Schizosaccharomyces pombe.

    Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S, Basham D, Bowman S, Brooks K, Brown D, Brown S, Chillingworth T, Churcher C, Collins M, Connor R, Cronin A, Davis P, Feltwell T, Fraser A, Gentles S, Goble A, Hamlin N, Harris D, Hidalgo J, Hodgson G, Holroyd S, Hornsby T, Howarth S, Huckle EJ, Hunt S, Jagels K, James K, Jones L, Jones M, Leather S, McDonald S, McLean J, Mooney P, Moule S, Mungall K, Murphy L, Niblett D, Odell C, Oliver K, O'Neil S, Pearson D, Quail MA, Rabbinowitsch E, Rutherford K, Rutter S, Saunders D, Seeger K, Sharp S, Skelton J, Simmonds M, Squares R, Squares S, Stevens K, Taylor K, Taylor RG, Tivey A, Walsh S, Warren T, Whitehead S, Woodward J, Volckaert G, Aert R, Robben J, Grymonprez B, Weltjens I, Vanstreels E, Rieger M, Schäfer M, Müller-Auer S, Gabel C, Fuchs M, Düsterhöft A, Fritzc C, Holzer E, Moestl D, Hilbert H, Borzym K, Langer I, Beck A, Lehrach H, Reinhardt R, Pohl TM, Eger P, Zimmermann W, Wedler H, Wambutt R, Purnelle B, Goffeau A, Cadieu E, Dréano S, Gloux S, Lelaure V, Mottier S, Galibert F, Aves SJ, Xiang Z, Hunt C, Moore K, Hurst SM, Lucas M, Rochet M, Gaillardin C, Tallada VA, Garzon A, Thode G, Daga RR, Cruzado L, Jimenez J, Sánchez M, del Rey F, Benito J, Domínguez A, Revuelta JL, Moreno S, Armstrong J, Forsburg SL, Cerutti L, Lowe T, McCombie WR, Paulsen I, Potashkin J, Shpakovski GV, Ussery D, Barrell BG, Nurse P and Cerrutti L

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

    Nature 2002;415;6874;871-80

  • The PASTA domain: a beta-lactam-binding domain.

    Yeats C, Finn RD and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK CB10 1SA. cay@sanger.ac.uk

    The PASTA domain (for penicillin-binding protein and serine/threonine kinase associated domain) is found in the high molecular weight penicillin-binding proteins and eukaryotic-like serine/threonine kinases of a range of pathogens. We describe this previously uncharacterized domain and infer that it binds beta-lactam antibiotics and their peptidoglycan analogues. We postulate that PknB-like kinases are key regulators of cell-wall biosynthesis. The essential function of these enzymes suggests an additional pathway for the action of beta-lactam antibiotics.

    Trends in biochemical sciences 2002;27;9;438

* quick link - http://q.sanger.ac.uk/kn1qxt63