Sanger Institute - Publications 1997

Number of papers published in 1997: 13

  • The nucleotide sequence of Saccharomyces cerevisiae chromosome XIII.

    Bowman S, Churcher C, Badcock K, Brown D, Chillingworth T, Connor R, Dedman K, Devlin K, Gentles S, Hamlin N, Hunt S, Jagels K, Lye G, Moule S, Odell C, Pearson D, Rajandream M, Rice P, Skelton J, Walsh S, Whitehead S and Barrell B

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Systematic sequencing of the genome of Saccharomyces cerevisiae has revealed thousands of new predicted genes and allowed analysis of long-range features of chromosomal organization. Generally, genes and predicted genes seem to be distributed evenly throughout the genome, having no overall preference for DNA strand. Apart from the smaller chromosomes, which can have substantially lower gene density in their telomeric regions, there is a consistent average of one open reading frame (ORF) approximately every two kilobases. However, one of the most surprising findings for a eukaryote with approximately 6,000 genes was the amount of apparent redundancy in its genome. This redundancy occurs both between individual ORFs and over more extensive chromosome regions, which have been duplicated preserving gene order and orientation. Here we report the entire nucleotide sequence of chromosome XIII, the sixth-largest S. cerevisiae chromosome, and demonstrate that its features and organization are consistent with those observed for other S. cerevisiae chromosomes. Analysis revealed 459 ORFs, 284 have not been identified previously. Both intra- and interchromosomal duplications of regions of this chromosome have occurred.

    Funded by: Wellcome Trust

    Nature 1997;387;6632 Suppl;90-3

  • Population statistics of protein structures: lessons from structural classifications.

    Brenner SE, Chothia C and Hubbard TJ

    Structural Biology Centre, National Institute for Bioscience and Human-Technology, Ibaraki, Japan.

    Structural classifications aid the interpretation of proteins by describing degrees of structural and evolutionary relatedness. They have also recently revealed strikingly skewed distributions at all levels; for example, a small number of folds are far more common than others, and just a few superfamilies are known to have diverged widely. The classifications also provide an indication of the total number of superfamilies in nature.

    Funded by: Wellcome Trust

    Current opinion in structural biology 1997;7;3;369-76

  • The nucleotide sequence of Saccharomyces cerevisiae chromosome XVI.

    Bussey H, Storms RK, Ahmed A, Albermann K, Allen E, Ansorge W, Araujo R, Aparicio A, Barrell B, Badcock K, Benes V, Botstein D, Bowman S, Brückner M, Carpenter J, Cherry JM, Chung E, Churcher C, Coster F, Davis K, Davis RW, Dietrich FS, Delius H, DiPaolo T, Hani J et al.

    Department of Biology, McGill University, Montreal, Canada.

    The nucleotide sequence of the 948,061 base pairs of chromosome XVI has been determined, completing the sequence of the yeast genome. Chromosome XVI was the last yeast chromosome identified, and some of the genes mapped early to it, such as GAL4, PEP4 and RAD1 (ref. 2) have played important roles in the development of yeast biology. The architecture of this final chromosome seems to be typical of the large yeast chromosomes, and shows large duplications with other yeast chromosomes. Chromosome XVI contains 487 potential protein-encoding genes, 17 tRNA genes and two small nuclear RNA genes; 27% of the genes have significant similarities to human gene products, and 48% are new and of unknown biological function. Systematic efforts to explore gene function have begun.

    Funded by: Wellcome Trust

    Nature 1997;387;6632 Suppl;103-5

  • The nucleotide sequence of Saccharomyces cerevisiae chromosome IX.

    Churcher C, Bowman S, Badcock K, Bankier A, Brown D, Chillingworth T, Connor R, Devlin K, Gentles S, Hamlin N, Harris D, Horsnell T, Hunt S, Jagels K, Jones M, Lye G, Moule S, Odell C, Pearson D, Rajandream M, Rice P, Rowley N, Skelton J, Smith V, Barrell B et al.

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Large-scale systematic sequencing has generally depended on the availability of an ordered library of large-insert bacterial or viral genomic clones for the organism under study. The generation of these large insert libraries, and the location of each clone on a genome map, is a laborious and time-consuming process. In an effort to overcome these problems, several groups have successfully demonstrated the viability of the whole-genome random 'shotgun' method in large-scale sequencing of both viruses and prokaryotes. Here we report the sequence of Saccharomyces cerevisiae chromosome IX, determined in part by a whole-chromosome 'shotgun', and describe the particular difficulties encountered in the random 'shotgun' sequencing of an entire eukaryotic chromosome. Analysis of this sequence shows that chromosome IX contains 221 open reading frames (ORFs), of which approximately 30% have been sequenced previously. This chromosome shows features typical of a small Saccharomyces cerevisiae chromosome.

    Funded by: Wellcome Trust

    Nature 1997;387;6632 Suppl;84-7

  • The relationship between chromosome structure and function at a human telomeric region.

    Flint J, Thomas K, Micklem G, Raynham H, Clark K, Doggett NA, King A and Higgs DR

    MRC Molecular Haematology Unit, John Radcliffe Hospital, Headington, Oxford, UK.

    We have sequenced a contiguous 284,495-bp segment of DNA extending from the terminal (TTAGGG)n repeats of the short arm of chromosome 16, providing a full description of the transition from telomeric through subtelomeric DNA to sequences that are unique to the chromosome. To complement and extend analysis of the primary sequence, we have characterized mRNA transcripts, patterns of DNA methylation and DNase I sensitivity. Together with previous data these studies describe in detail the structural and functional organization of a human telomeric region.

    Funded by: Wellcome Trust

    Nature genetics 1997;15;3;252-7

  • New horizons in sequence analysis.

    Hubbard TJ

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    An ever increasing number of protein sequences are being compared, partly because of the availability of full sets of protein sequences from several completed genome-sequencing projects. The resulting problem of scale has shifted the emphasis of sequence analysis method development from sensitivity and flexibility, which relies on manual intervention and interpretation, to the automatic generation of results of known reliability.

    Funded by: Wellcome Trust

    Current opinion in structural biology 1997;7;2;190-3

  • The nucleotide sequence of Saccharomyces cerevisiae chromosome IV.

    Jacq C, Alt-Mörbe J, Andre B, Arnold W, Bahr A, Ballesta JP, Bargues M, Baron L, Becker A, Biteau N, Blöcker H, Blugeon C, Boskovic J, Brandt P, Brückner M, Buitrago MJ, Coster F, Delaveau T, del Rey F, Dujon B, Eide LG, Garcia-Cantalejo JM, Goffeau A, Gomez-Peris A, Zaccaria P et al.

    Laboratoire de Génétique Moléculaire, URA 1302 du CNRS, Ecole Normale Supérieure, Paris, France.

    The complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome IV has been determined. Apart from chromosome XII, which contains the 1-2 Mb rDNA cluster, chromosome IV is the longest S. cerevisiae chromosome. It was split into three parts, which were sequenced by a consortium from the European Community, the Sanger Centre, and groups from St Louis and Stanford in the United States. The sequence of 1,531,974 base pairs contains 796 predicted or known genes, 318 (39.9%) of which have been previously identified. Of the 478 new genes, 225 (28.3%) are homologous to previously identified genes and 253 (32%) have unknown functions or correspond to spurious open reading frames (ORFs). On average there is one gene approximately every two kilobases. Superimposed on alternating regional variations in G+C composition, there is a large central domain with a lower G+C content that contains all the yeast transposon (Ty) elements and most of the tRNA genes. Chromosome IV shares with chromosomes II, V, XII, XIII and XV some long clustered duplications which partly explain its origin.

    Nature 1997;387;6632 Suppl;75-8

  • Instability of highly expanded CAG repeats in mice transgenic for the Huntington's disease mutation.

    Mangiarini L, Sathasivam K, Mahal A, Mott R, Seller M and Bates GP

    Division of Medical and Molecular Genetics, Guy's Hospital, London, UK.

    Six inherited neurodegenerative diseases are caused by a CAG/polyglutamine expansion, including spinal and bulbar muscular atrophy (SBMA), Huntington's disease (HD), spinocerebellar ataxia type 1 (SCA1), dentatorubral pallidoluysian atrophy (DRPLA) Machado-Joseph disease (MJD or SCA3) and SCA2. Normal and expanded HD allele sizes of 6-39 and 35-121 repeats have been reported, and the allele distributions for the other diseases are comparable. Intergenerational instability has been described in all cases, and repeats tend to be more unstable on paternal transmission. This may present as larger increases on paternal inheritance as in HD, or as a tendency to increase on male and decrease on female transmission as in SCA1 (ref. 15). Somatic repeat instability is also apparent and appears most pronounced in the CNS. The major exception is the cerebellum, which in HD, DRPLA, SCA1 and MJD has a smaller repeat relative to the other brain regions tested. Of non-CNS tissues, instability was observed in blood, liver, kidney and colon. A mouse model of CAG repeat instability would be helpful in unravelling its molecular basis although an absence of CAG repeat instability in transgenic mice has so far been reported. These studies include (CAG) in the androgen receptor cDNA, (CAG) in the HD cDNA, (CAG) in the SCA1 cDNA, (CAG) in the SCA3 cDNA and as an isolated (CAG) tract.

    Nature genetics 1997;15;2;197-200

  • Critical assessment of methods of protein structure prediction (CASP): round II.

    Moult J, Hubbard T, Bryant SH, Fidelis K and Pedersen JT

    Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville 20850, USA.

    Proteins 1997;Suppl 1;2-6

  • Intermediate sequences increase the detection of homology between sequences.

    Park J, Teichmann SA, Hubbard T and Chothia C

    Cambridge Centre for Protein Engineering, UK.

    Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through a third sequence that is suitably intermediate between the two. High scores, for a sequence match between the first and third sequences and between the second and the third sequences, imply that the first and second sequences are related even though their own match score is low. We have tested the usefulness of this idea using a database that contains the sequences of 971 protein domains whose structures are known and whose residue identities with each other are some 40% or less (PDB40D). On the basis of sequence and structural information, 2143 pairs of these sequences are known to have an evolutionary relationship. FASTA, in an all-against-all comparison of the sequences in the database, detected 320 (15%) of these relationships as well as three false positive (i.e. 1% error rate). Using intermediate sequences found by FASTA matches of PDB40D sequences to those in the large non-redundant OWL database we could detect 550 evolutionary relationships with an error rate of 1%. This means the intermediate sequence procedure increases the ability to recognise the evolutionary relationships amongst the PDB40D sequences by 70%.

    Funded by: Wellcome Trust

    Journal of molecular biology 1997;273;1;349-54

  • Detection of protein-ligand NOEs with small, weakly binding ligands by combined relaxation and diffusion filtering

    Ponstingl, H and Otting, G

    J Biomol NMR 1997;9;4;441-444

  • Distinct element simulation of impact breakage of lactose agglomerates

    Z. Ning, R. Boerefijn, M. Ghadiri and C. Thornton

    Advanced Powder Technology 1997

  • Numerical criteria for the evaluation of ab initio predictions of protein structure.

    Zemla A, Venclovas C, Reinhardt A, Fidelis K and Hubbard TJ

    Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, CA 94551, USA.

    As part of the CASP2 protein structure prediction experiment, a set of numerical criteria were defined for the evaluation of "ab initio" predictions. The evaluation package comprises a series of electronic submission formats, a submission validator, evaluation software, and a series of scripts to summarize the results for the CASP2 meeting and for presentation via the World Wide Web (WWW). The evaluation package is accessible for use on new predictions via WWW so that results can be compared to those submitted to CASP2. With further input from the community, the evaluation criteria are expected to evolve into a comprehensive set of measures capturing the overall quality of a prediction as well as critical detail essential for further development of prediction methods. We discuss present measures, limitations of the current criteria, and possible improvements.

    Funded by: Wellcome Trust

    Proteins 1997;Suppl 1;140-50