Sanger Institute - Publications 2000

Number of papers published in 2000: 23

  • The genome sequence of Drosophila melanogaster.

    Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Sidén-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, WoodageT, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM and Venter JC

    Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P41 HG000739, P50-HG00750, U54 HG003273

    Science (New York, N.Y.) 2000;287;5461;2185-95

  • From sequence to chromosome: the tip of the X chromosome of D. melanogaster.

    Benos PV, Gatt MK, Ashburner M, Murphy L, Harris D, Barrell B, Ferraz C, Vidal S, Brun C, Demailles J, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Borkova D, Minana B, Kafatos FC, Louis C, Sidén-Kiamos I, Bolshakov S, Papagiannakis G, Spanos L, Cox S, Madueño E, de Pablos B, Modolell J, Peter A, Schöttler P, Werner M, Mourkioti F, Beinert N, Dowe G, Schäfer U, Jäckle H, Bucheton A, Callister DM, Campbell LA, Darlamitsou A, Henderson NS, McMillan PJ, Salles C, Tait EA, Valenti P, Saunder RD and Glover DM

    The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Hall, Cambridge CB10 1SD, UK.

    One of the rewards of having a Drosophila melanogaster whole-genome sequence will be the potential to understand the molecular bases for structural features of chromosomes that have been a long-standing puzzle. Analysis of 2.6 megabases of sequence from the tip of the X chromosome of Drosophila identifies 273 genes. Cloned DNAs from the characteristic bulbous structure at the tip of the X chromosome in the region of the broad complex display an unusual pattern of in situ hybridization. Sequence analysis revealed that this region comprises 154 kilobases of DNA flanked by 1.2-kilobases of inverted repeats, each composed of a 350-base pair satellite related element. Thus, some aspects of chromosome structure appear to be revealed directly within the DNA sequence itself.

    Science (New York, N.Y.) 2000;287;5461;2220-2

  • Distinct element analysis of bulk crushing: effect of particle properties and loading rate

    C. Couroyer, Z. Ning and M. Ghadiri

    Powder Technology 2000;109;241–254

  • Domains in gene silencing and cell differentiation proteins: the novel PAZ domain and redefinition of the Piwi domain.

    Cerutti L, Mian N and Bateman A

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridge, UK.

    Trends in biochemical sciences 2000;25;10;481-2

  • Functional genomic analysis of C. elegans chromosome I by systematic RNA interference.

    Fraser AG, Kamath RS, Zipperlen P, Martinez-Campos M, Sohrmann M and Ahringer J

    Wellcome/CRC Institute, University of Cambridge, UK.

    Complete genomic sequence is known for two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and it will soon be known for humans. However, biological function has been assigned to only a small proportion of the predicted genes in any animal. Here we have used RNA-mediated interference (RNAi) to target nearly 90% of predicted genes on C. elegans chromosome I by feeding worms with bacteria that express double-stranded RNA. We have assigned function to 13.9% of the genes analysed, increasing the number of sequenced genes with known phenotypes on chromosome I from 70 to 378. Although most genes with sterile or embryonic lethal RNAi phenotypes are involved in basal cell metabolism, many genes giving post-embryonic phenotypes have conserved sequences but unknown function. In addition, conserved genes are significantly more likely to have an RNAi phenotype than are genes with no conservation. We have constructed a reusable library of bacterial clones that will permit unlimited RNAi screens in the future; this should help develop a more complete view of the relationships between the genome, gene function and the environment.

    Funded by: Wellcome Trust: 054523

    Nature 2000;408;6810;325-30

  • Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III.

    Gönczy P, Echeverri C, Oegema K, Coulson A, Jones SJ, Copley RR, Duperon J, Oegema J, Brehm M, Cassin E, Hannak E, Kirkham M, Pichler S, Flohrs K, Goessen A, Leidel S, Alleaume AM, Martin C, Ozlü N, Bork P and Hyman AA

    Max-Planck-Institute for Cell Biology and Genetics, Dresden, Germany.

    Genome sequencing projects generate a wealth of information; however, the ultimate goal of such projects is to accelerate the identification of the biological function of genes. This creates a need for comprehensive studies to fill the gap between sequence and function. Here we report the results of a functional genomic screen to identify genes required for cell division in Caenorhabditis elegans. We inhibited the expression of approximately 96% of the approximately 2,300 predicted open reading frames on chromosome III using RNA-mediated interference (RNAi). By using an in vivo time-lapse differential interference contrast microscopy assay, we identified 133 genes (approximately 6%) necessary for distinct cellular processes in early embryos. Our results indicate that these genes represent most of the genes on chromosome III that are required for proper cell division in C. elegans embryos. The complete data set, including sample time-lapse recordings, has been deposited in an open access database. We found that approximately 47% of the genes associated with a differential interference contrast phenotype have clear orthologues in other eukaryotes, indicating that this screen provides putative gene functions for other species as well.

    Nature 2000;408;6810;331-6

  • Analysis of vertebrate SCL loci identifies conserved enhancers.

    Göttgens B, Barton LM, Gilbert JG, Bench AJ, Sanchez MJ, Bahn S, Mistry S, Grafham D, McMurray A, Vaudin M, Amaya E, Bentley DR, Green AR and Sinclair AM

    University of Cambridge, Department of Haematology, MRC Centre, Hills Road, Cambridge CB2 2QH, UK.

    The SCL gene encodes a highly conserved bHLH transcription factor with a pivotal role in hemopoiesis and vasculogenesis. We have sequenced and analyzed 320 kb of genomic DNA composing the SCL loci from human, mouse, and chicken. Long-range sequence comparisons demonstrated multiple peaks of human/mouse homology, a subset of which corresponded precisely with known SCL enhancers. Comparisons between mammalian and chicken sequences identified some, but not all, SCL enhancers. Moreover, one peak of human/mouse homology (+23 region), which did not correspond to a known enhancer, showed significant homology to an analogous region of the chicken SCL locus. A transgenic Xenopus reporter assay was established and demonstrated that the +23 region contained a new neural enhancer. This combination of long-range comparative sequence analysis with a high-throughput transgenic bioassay provides a powerful strategy for identifying and characterizing developmentally important enhancers.

    Nature biotechnology 2000;18;2;181-6

  • Entering the post-genomic era of malaria research.

    Horrocks P, Bowman S, Kyes S, Waters AP and Craig A

    Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, England.

    The sequencing of the genome of Plasmodium falciparum promises to revolutionize the way in which malaria research will be carried out. Beyond simple gene discovery, the genome sequence will facilitate the comprehensive determination of the parasite's gene expression during its developmental phases, pathology, and in response to environmental variables, such as drug treatment and host genetic background. This article reviews the current status of the P. falciparum genome sequencing project and the unique insights it has generated. We also summarize the application of bioinformatics and analytical tools that have been developed for functional genomics. The aim of these activities is the rational, information-based identification of new therapeutic strategies and targets, based on a thorough insight into the biology of Plasmodium spp.

    Bulletin of the World Health Organization 2000;78;12;1424-37

  • Open annotation offers a democratic solution to genome sequencing.

    Hubbard T and Birney E

    Nature 2000;403;6772;825

  • SCOP: a structural classification of proteins database.

    Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG and Chothia C

    MRC Laboratory of Molecular Biology, Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK.

    The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and distant evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database so far. The sequences of proteins in SCOP provide the basis of the ASTRAL sequence libraries that can be used as a source of data to calibrate sequence search algorithms and for the generation of statistics on, or selections of, protein structures. Links can be made from SCOP to PDB-ISL: a library containing sequences homologous to proteins of known structure. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL

    Nucleic acids research 2000;28;1;257-9

  • Cancer predisposition caused by elevated mitotic recombination in Bloom mice.

    Luo G, Santoro IM, McDaniel LD, Nishijima I, Mills M, Youssoufian H, Vogel H, Schultz RA and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.

    Bloom syndrome is a disorder associated with genomic instability that causes affected people to be prone to cancer. Bloom cell lines show increased sister chromatid exchange, yet are proficient in the repair of various DNA lesions. The underlying cause of this disease are mutations in a gene encoding a RECQ DNA helicase. Using embryonic stem cell technology, we have generated viable Bloom mice that are prone to a wide variety of cancers. Cell lines from these mice show elevations in the rates of mitotic recombination. We demonstrate that the increased rate of loss of heterozygosity (LOH) resulting from mitotic recombination in vivo constitutes the underlying mechanism causing tumour susceptibility in these mice.

    Nature genetics 2000;26;4;424-9

  • Attrition of granular solids in a shear cell

    M. Ghadiri, Z. Ning, S.J Kenter and E Puikc

    Chemical Engineering Science 2000;55;5445–5456

  • An SNP map of human chromosome 22.

    Mullikin JC, Hunt SE, Cole CG, Mortimore BJ, Rice CM, Burton J, Matthews LH, Pavitt R, Plumb RW, Sims SK, Ainscough RM, Attwood J, Bailey JM, Barlow K, Bruskiewich RM, Butcher PN, Carter NP, Chen Y, Clee CM, Coggill PC, Davies J, Davies RM, Dawson E, Francis MD, Joy AA, Lamble RG, Langford CF, Macarthy J, Mall V, Moreland A, Overton-Larty EK, Ross MT, Smith LC, Steward CA, Sulston JE, Tinsley EJ, Turney KJ, Willey DL, Wilson GD, McMurray AA, Dunham I, Rogers J and Bentley DR

    The Sanger Centre, Hinxton, Cambridge, UK.

    The human genome sequence will provide a reference for measuring DNA sequence variation in human populations. Sequence variants are responsible for the genetic component of individuality, including complex characteristics such as disease susceptibility and drug response. Most sequence variants are single nucleotide polymorphisms (SNPs), where two alternate bases occur at one position. Comparison of any two genomes reveals around 1 SNP per kilobase. A sufficiently dense map of SNPs would allow the detection of sequence variants responsible for particular characteristics on the basis that they are associated with a specific SNP allele. Here we have evaluated large-scale sequencing approaches to obtaining SNPs, and have constructed a map of 2,730 SNPs on human chromosome 22. Most of the SNPs are within 25 kilobases of a transcribed exon, and are valuable for association studies. We have scaled up the process, detecting over 65,000 SNPs in the genome as part of The SNP Consortium programme, which is on target to build a map of 1 SNP every 5 kilobases that is integrated with the human genome sequence and that is freely available in the public domain.

    Nature 2000;407;6803;516-20

  • In defense of complete genomes.

    Parkhill J

    Nature biotechnology 2000;18;5;493-4

  • Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491.

    Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, Klee SR, Morelli G, Basham D, Brown D, Chillingworth T, Davies RM, Davis P, Devlin K, Feltwell T, Hamlin N, Holroyd S, Jagels K, Leather S, Moule S, Mungall K, Quail MA, Rajandream MA, Rutherford KM, Simmonds M, Skelton J, Whitehead S, Spratt BG and Barrell BG

    The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Neisseria meningitidis causes bacterial meningitis and is therefore responsible for considerable morbidity and mortality in both the developed and the developing world. Meningococci are opportunistic pathogens that colonize the nasopharynges and oropharynges of asymptomatic carriers. For reasons that are still mostly unknown, they occasionally gain access to the blood, and subsequently to the cerebrospinal fluid, to cause septicaemia and meningitis. N. meningitidis strains are divided into a number of serogroups on the basis of the immunochemistry of their capsular polysaccharides; serogroup A strains are responsible for major epidemics and pandemics of meningococcal disease, and therefore most of the morbidity and mortality associated with this disease. Here we have determined the complete genome sequence of a serogroup A strain of Neisseria meningitidis, Z2491. The sequence is 2,184,406 base pairs in length, with an overall G+C content of 51.8%, and contains 2,121 predicted coding sequences. The most notable feature of the genome is the presence of many hundreds of repetitive elements, ranging from short repeats, positioned either singly or in large multiple arrays, to insertion sequences and gene duplications of one kilobase or more. Many of these repeats appear to be involved in genome fluidity and antigenic variation in this important human pathogen.

    Nature 2000;404;6777;502-6

  • The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences.

    Parkhill J, Wren BW, Mungall K, Ketley JM, Churcher C, Basham D, Chillingworth T, Davies RM, Feltwell T, Holroyd S, Jagels K, Karlyshev AV, Moule S, Pallen MJ, Penn CW, Quail MA, Rajandream MA, Rutherford KM, van Vliet AH, Whitehead S and Barrell BG

    The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Campylobacter jejuni, from the delta-epsilon group of proteobacteria, is a microaerophilic, Gram-negative, flagellate, spiral bacterium-properties it shares with the related gastric pathogen Helicobacter pylori. It is the leading cause of bacterial food-borne diarrhoeal disease throughout the world. In addition, infection with C. jejuni is the most frequent antecedent to a form of neuromuscular paralysis known as Guillain-Barré syndrome. Here we report the genome sequence of C. jejuni NCTC11168. C. jejuni has a circular chromosome of 1,641,481 base pairs (30.6% G+C) which is predicted to encode 1,654 proteins and 54 stable RNA species. The genome is unusual in that there are virtually no insertion sequences or phage-associated sequences and very few repeat sequences. One of the most striking findings in the genome was the presence of hypervariable sequences. These short homopolymeric runs of nucleotides were commonly found in genes encoding the biosynthesis or modification of surface structures, or in closely linked genes of unknown function. The apparently high rate of variation of these homopolymeric tracts may be important in the survival strategy of C. jejuni.

    Nature 2000;403;6770;665-8

  • A browser for expression data.

    Pocock MR and Hubbard TJ

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Summary: We have written a fully extensible Java application for visually browsing expression data, and clusters of genes or experimental conditions calculated from that data. The application requires a run-time environment for Java2.

    Availability: http://www.

    Bioinformatics (Oxford, England) 2000;16;4;402-3

  • Artemis: sequence visualization and annotation.

    Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA and Barrell B

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Summary: Artemis is a DNA sequence visualization and annotation tool that allows the results of any analysis or sets of analyses to be viewed in the context of the sequence and its six-frame translation. Artemis is especially useful in analysing the compact genomes of bacteria, archaea and lower eukaryotes, and will cope with sequences of any size from small genes to whole genomes. It is implemented in Java, and can be run on any suitable platform. Sequences and annotation can be read and written directly in EMBL, GenBank and GFF format. AVAILABITLTY: Artemis is available under the GNU General Public License from

    Bioinformatics (Oxford, England) 2000;16;10;944-5

  • LMNA, encoding lamin A/C, is mutated in partial lipodystrophy.

    Shackleton S, Lloyd DJ, Jackson SN, Evans R, Niermeijer MF, Singh BM, Schmidt H, Brabant G, Kumar S, Durrington PN, Gregory S, O'Rahilly S and Trembath RC

    Division of Medical Genetics, Departments of Medicine and Genetics, University of Leicester, Leicester, UK.

    The lipodystrophies are a group of disorders characterized by the absence or reduction of subcutaneous adipose tissue. Partial lipodystrophy (PLD; MIM 151660) is an inherited condition in which a regional (trunk and limbs) loss of fat occurs during the peri-pubertal phase. Additionally, variable degrees of resistance to insulin action, together with a hyperlipidaemic state, may occur and simulate the metabolic features commonly associated with predisposition to atherosclerotic disease. The PLD locus has been mapped to chromosome 1q with no evidence of genetic heterogeneity. We, and others, have refined the location to a 5.3-cM interval between markers D1S305 and D1S1600 (refs 5, 6). Through a positional cloning approach we have identified five different missense mutations in LMNA among ten kindreds and three individuals with PLD. The protein product of LMNA is lamin A/C, which is a component of the nuclear envelope. Heterozygous mutations in LMNA have recently been identified in kindreds with the variant form of muscular dystrophy (MD) known as autosomal dominant Emery-Dreifuss MD (EDMD-AD; ref. 7) and dilated cardiomyopathy and conduction-system disease (CMD1A). As LMNA is ubiquitously expressed, the finding of site-specific amino acid substitutions in PLD, EDMD-AD and CMD1A reveals distinct functional domains of the lamin A/C protein required for the maintenance and integrity of different cell types.

    Nature genetics 2000;24;2;153-6

  • A Competition-based Score Measurement for Speaker Verification

    Yong Gu and Trevor Thomas

    Proceedings of EUSIPCO 2000, Tenth European Signal Processing Conference, Tampere, Finland 2000

  • Advances on HMM-based Text-dependent Speaker Verification

    Yong Gu and Trevor Thomas

    Proceedings ICSLP2000 2000

  • Competition-based score analysis for utterance verificationin name recognition

    Yong Gu and Trevor Thomas

    Proceedings ICSLP 2000


    Yong Gu, Hans Jongebloed, Dorota Iskra, Els den Os, Lou Boves

    Proceedings ICSLP2000 2000;Vol. 2;Vol. 2, p.450-453