Sanger Institute - Publications 1998

Number of papers published in 1998: 58

  • Structure and distribution of pentapeptide repeats in bacteria.

    Bateman A, Murzin AG and Teichmann SA

    Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom.

    We report the discovery of a novel family of proteins, each member contains tandem pentapeptide (five residue) repeats, described by the motif A(D/N)LXX. Members of this family are both membrane bound and cytoplasmic. The function of these repeats is uncertain, but they may have a targeting or structural function rather than enzymatic activity. This family is most common in cyanobacteria, suggesting a function related to cyanobacterial-specific metabolism. Although no experimental information is available for the structure of this family, it is predicted that the tandem pentapeptide repeats will form a right-handed beta-helical structure. A structural model of the pentapeptide repeats is presented.

    Funded by: Wellcome Trust

    Protein science : a publication of the Protein Society 1998;7;6;1477-80

  • Genome-scale DNA sequencing: where are we?

    Beck S and Sterk P

    Sanger Centre, Hinxton, Cambridge, UK.

    Current opinion in biotechnology 1998;9;1;116-21

  • A detailed physical and transcriptional map of the region of chromosome 20 that is deleted in myeloproliferative disorders and refinement of the common deleted region.

    Bench AJ, Aldred MA, Humphray SJ, Champion KM, Gilbert JG, Asimakopoulos FA, Deloukas P, Gwilliam R, Bentley DR and Green AR

    Department of Haematology, University of Cambridge, United Kingdom.

    Acquired deletions of the long arm of chromosome 20 are the most common chromosomal abnormality seen in polycythemia vera and are also associated with other myeloid malignancies. Such deletions are believed to mark the site of one or more tumor suppressor genes, loss of which perturbs normal hematopoiesis. A common deleted region (CDR) has previously been identified on 20q. We have now constructed the most detailed physical map of this region to date--a YAC contig that encompasses the entire CDR and spans 23 cM (11 Mb). This contig contains 140 DNA markers and 65 unique expressed sequences. Our data represent a first step toward a complete transcriptional map of the CDR. The high marker density within the physical map permitted two complementary approaches to reducing the size of the CDR. Microsatellite PCR refined the centromeric boundary of the CDR to D20S465 and was used to search for homozygous deletions in 28 patients using 32 markers. No such deletions were detected. Genetic changes on the remaining chromosome 20 may therefore be too small to be detected or may occur in a subpopulation of cells.

    Genomics 1998;49;3;351-62

  • Coordination of human genome sequencing via a consensus framework map.

    Bentley DR, Pruitt KD, Deloukas P, Schuler GD and Ostell J

    Sanger Centre, Hinxton, Cambridge, UK.

    Trends in genetics : TIG 1998;14;10;381-4

  • A new candidate region for the positional cloning of the XLP gene.

    Bolino A, Yin L, Seri M, Cusano R, Cinti R, Coffey A, Brooksbank R, Howell G, Bentley D, Davis JR, Lanyi A, Huang D, Stark M, Creaven M, Bjørkhaug L, Heitzmann F, Lamartine J, Gaudi S, Sylla BS, Lenoir GM, Castagnola E, Giacchino R, Porta G, Franco B, Zollo M, Sumegi J and Romeo G

    International XLP Consortium, Laboratorio di Genetica Molecolare, Istituto Gaslini, Genova, Quarto, Italy.

    X-linked lymphoproliferative disease (XLP) is an inherited immunodeficiency characterised by selective susceptibility to Epstein-Barr virus and frequent association with malignant lymphomas chiefly located in the ileocecal region, liver, kidney and CNS. Taking advantage of a large bacterial clone contig, we obtained a genomic sequence of 197620 bp encompassing a deletion (XLP-D) of 116 kb in an XLP family, whose breakpoints were identified. The study of potential exons from this region in 40 unrelated XLP patients did not reveal any mutation. To define the critical region for XLP and investigate the role of the XLP-D deletion, detailed haplotypes in a region of approximately 20 cM were reconstructed in a total of 87 individuals from 7 families with recurrence of XLP. Two recombination events in a North American family and a new microdeletion (XLP-G) in an Italian family indicate that the XLP gene maps in the interval between DXS1001 and DXS8057, approximately 800 kb centromeric to the previously reported familial microdeletion XLP-D.

    Funded by: NIAID NIH HHS: 1RO1 AI33532OIA3; Telethon: E.0633, TGM06S01

    European journal of human genetics : EJHG 1998;6;5;509-17

  • UHX1 and PCTK1: precise characterisation and localisation within a gene-rich region in Xp11.23 and evaluation as candidate genes for retinal diseases mapped to Xp21.1-p11.2.

    Brandau O, Nyakatura G, Jedele KB, Platzer M, Achatz H, Ross M, Murken J, Rosenthal A and Meindl A

    Abteilung für Medizinische Genetik, Kinderpoliklinik der Universität, München, Germany.

    The gene for ubiquitin hydrolase on the X chromosome (UHX1), cloned and mapped to Xp21.2-p11.2, is a candidate gene for retinal diseases. We used fine mapping techniques to localise UHX1 between markers DXS1266 and DXS337, where congenital stationary night blindness (XICSNB) and retinitis pigmentosa type 2 (RP2) are also located. Reevaluation of the UHX1 gene structure demonstrated five new exons, for a total of 21 exons and a predicted protein product of 963 amino acids. Evaluation of patients revealed no UHX1 mutations using SSCP (10 CSNB1 and 20 XLRP) or deletion screening with cDNA hybridisation (13 CSNB1 and 43 XLRP). Likewise, no aberrations were found in the nearby PCTAIRE1 (PCTK1) gene in 13 CSNB1 and 43 XLRP patients by deletion screening. Thus mutations of UHX1, and probably PCTK1, do not appear to cause common X-linked eye diseases. UHX1's role in patients with mental retardation may be appropriate for further investigations into UHX1 function.

    European journal of human genetics : EJHG 1998;6;5;459-66

  • Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

    Brenner SE, Chothia C and Hubbard TJ

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, United Kingdom.

    Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536-540]. The evaluation tested the programs BLAST [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460-480], FASTA [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448], and SSEARCH [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195-197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of SSEARCH and FASTA are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by BLAST and WU-BLAST2 exaggerate significance by orders of magnitude. SSEARCH, FASTA ktup = 1, and WU-BLAST2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20-30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 1998;95;11;6073-8

  • Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for genome mapping, sequencing, and comparative genomics.

    Brosch R, Gordon SV, Billault A, Garnier T, Eiglmeier K, Soravito C, Barrell BG and Cole ST

    Unité de Génétique Moléculaire Bacteriénne, Institut Pasteur, Paris, France.

    The bacterial artificial chromosome (BAC) cloning system is capable of stably propagating large, complex DNA inserts in Escherichia coli. As part of the Mycobacterium tuberculosis H37Rv genome sequencing project, a BAC library was constructed in the pBeloBAC11 vector and used for genome mapping, confirmation of sequence assembly, and sequencing. The library contains about 5,000 BAC clones, with inserts ranging in size from 25 to 104 kb, representing theoretically a 70-fold coverage of the M. tuberculosis genome (4.4 Mb). A total of 840 sequences from the T7 and SP6 termini of 420 BACs were determined and compared to those of a partial genomic database. These sequences showed excellent correlation between the estimated sizes and positions of the BAC clones and the sizes and positions of previously sequenced cosmids and the resulting contigs. Many BAC clones represent linking clones between sequenced cosmids, allowing full coverage of the H37Rv chromosome, and they are now being shotgun sequenced in the framework of the H37Rv sequencing project. Also, no chimeric, deleted, or rearranged BAC clones were detected, which was of major importance for the correct mapping and assembly of the H37Rv sequence. The minimal overlapping set contains 68 unique BAC clones and spans the whole H37Rv chromosome with the exception of a single gap of approximately 150 kb. As a postgenomic application, the canonical BAC set was used in a comparative study to reveal chromosomal polymorphisms between M. tuberculosis, M. bovis, and M. bovis BCG Pasteur, and a novel 12.7-kb segment present in M. tuberculosis but absent from M. bovis and M. bovis BCG was characterized. This region contains a set of genes whose products show low similarity to proteins involved in polysaccharide biosynthesis. The H37Rv BAC library therefore provides us with a powerful tool both for the generation and confirmation of sequence data as well as for comparative genomics and other postgenomic applications. It represents a major resource for present and future M. tuberculosis research projects.

    Funded by: Wellcome Trust

    Infection and immunity 1998;66;5;2221-9

  • Genetic analysis of meiotic recombination in humans by use of sperm typing: reduced recombination within a heterozygous paracentric inversion of chromosome 9q32-q34.3.

    Brown GM, Leversha M, Hulten M, Ferguson-Smith MA, Affara NA and Furlong RA

    Department of Pathology, University of Cambridge, Cambridge, United Kingdom.

    To investigate patterns of genetic recombination within a heterozygous paracentric inversion of chromosome 9 (46XY inv[9] [q32q34.3]), we performed sperm typing using a series of polymorphic microsatellite markers spanning the inversion region. For comparison, two donors with cytogenetically normal chromosomes 9, one of whom was heterozygous for a pericentric chromosome 2 inversion (46XY inv[2] [p11q13]), were also tested. Linkage analysis was performed by use of the multilocus linkage-analysis program SPERM, and also CRI-MAP, which was adapted for sperm-typing data. Analysis of the controls generated a marker order in agreement with previously published data and revealed no significant interchromosomal effects of the inv(2) on recombination on chromosome 9. FISH employing cosmids containing appropriate chromosome 9 markers was used to localize the inversion breakpoint of inv(9). Analysis of inv(9) sperm was performed by use of a set of microsatellite markers that mapped centromeric to, telomeric to, and within the inversion breakpoints. Three distinct patterns of recombination across the region were observed. Proximal to the centromeric breakpoint, recombination was similar to normal levels. Distal to the telomeric breakpoint, there was an increase in recombination found in the inversion patient. Finally, within the inversion, recombination was dramatically reduced, but several apparent double recombinants were found. A putative model explaining these data is proposed.

    Funded by: Wellcome Trust

    American journal of human genetics 1998;62;6;1484-92

  • Fulminant jejuno-ileitis following ablation of enteric glia in adult transgenic mice.

    Bush TG, Savidge TC, Freeman TC, Cox HJ, Campbell EA, Mucke L, Johnson MH and Sofroniew MV

    Medical Research Council Cambridge Centre for Brain Repair, Department of Anatomy, University of Cambridge, United Kingdom.

    To investigate the roles of astroglial cells, we targeted their ablation genetically. Transgenic mice were generated expressing herpes simplex virus thymidine kinase from the mouse glial fibrillary acidic protein (GFAP) promoter. In adult transgenic mice, 2 weeks of subcutaneous treatment with the antiviral agent ganciclovir preferentially ablated transgene-expressing, GFAP-positive glia from the jejunum and ileum, causing a fulminating and fatal jejuno-ileitis. This pathology was independent of bacterial overgrowth and was characterized by increased myeloperoxidase activity, moderate degeneration of myenteric neurons, and intraluminal hemorrhage. These findings demonstrate that enteric glia play an essential role in maintaining the integrity of the bowel and suggest that their loss or dysfunction may contribute to the cellular mechanisms of inflammatory bowel disease.

    Funded by: Wellcome Trust

    Cell 1998;93;2;189-201

  • Genome sequence of the nematode C. elegans: a platform for investigating biology.

    C. elegans Sequencing Consortium

    The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes. More than 40 percent of the predicted protein products find significant matches in other organisms. There is a variety of repeated sequences, both local and dispersed. The distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.

    Science (New York, N.Y.) 1998;282;5396;2012-8

  • Genetic nomenclature for Trypanosoma and Leishmania.

    Clayton C, Adams M, Almeida R, Baltz T, Barrett M, Bastien P, Belli S, Beverley S, Biteau N, Blackwell J, Blaineau C, Boshart M, Bringaud F, Cross G, Cruz A, Degrave W, Donelson J, El-Sayed N, Fu G, Ersfeld K, Gibson W, Gull K, Ivens A, Kelly J, Vanhamme L et al.

    Zentrum für Molekulare Biologie, Heidelberg, Germany.

    Funded by: Wellcome Trust

    Molecular and biochemical parasitology 1998;97;1-2;221-4

  • Host response to EBV infection in X-linked lymphoproliferative disease results from mutations in an SH2-domain encoding gene.

    Coffey AJ, Brooksbank RA, Brandau O, Oohashi T, Howell GR, Bye JM, Cahn AP, Durham J, Heath P, Wray P, Pavitt R, Wilkinson J, Leversha M, Huckle E, Shaw-Smith CJ, Dunham A, Rhodes S, Schuster V, Porta G, Yin L, Serafini P, Sylla B, Zollo M, Franco B, Bolino A, Seri M, Lanyi A, Davis JR, Webster D, Harris A, Lenoir G, de St Basile G, Jones A, Behloradsky BH, Achatz H, Murken J, Fassler R, Sumegi J, Romeo G, Vaudin M, Ross MT, Meindl A and Bentley DR

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    X-linked lymphoproliferative syndrome (XLP or Duncan disease) is characterized by extreme sensitivity to Epstein-Barr virus (EBV), resulting in a complex phenotype manifested by severe or fatal infectious mononucleosis, acquired hypogammaglobulinemia and malignant lymphoma. We have identified a gene, SH2D1A, that is mutated in XLP patients and encodes a novel protein composed of a single SH2 domain. SH2D1A is expressed in many tissues involved in the immune system. The identification of SH2D1A will allow the determination of its mechanism of action as a possible regulator of the EBV-induced immune response.

    Funded by: NIAID NIH HHS: 1 R01 AI33532-OIA3; Telethon: E.0440, TGM06S01; Wellcome Trust

    Nature genetics 1998;20;2;129-35

  • Analysis of the genome of Mycobacterium tuberculosis H37Rv.

    Cole ST and Barrell BG

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, Paris, France.

    The powerful combination of genomics and bioinformatics is providing a wealth of information about Mycobacterium tuberculosis, the aetiological agent of human tuberculosis, that will facilitate the conception and development of new therapies. The starting point for genome sequencing was the integrated map of the 4.4 Mb circular chromosome of the widely used, virulent reference strain, M. tuberculosis H37Rv. Cosmids and bacterial artificial chromosomes were selected from ordered libraries and subjected to systematic shotgun sequence analysis. This approach simplified sequence assembly as the genome is rich in repetitive DNA. In common with most bacteria, > 90% of the potential coding capacity is used, and probable or tentative functions could be attributed to > 70% of the genes. The potential biological roles of two of the principal driving forces in genome dynamics, insertion sequence elements and polymorphic multigene families are discussed.

    Funded by: Wellcome Trust

    Novartis Foundation symposium 1998;217;160-72; discussion 172-7

  • Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.

    Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S, Murphy L, Oliver K, Osborne J, Quail MA, Rajandream MA, Rogers J, Rutter S, Seeger K, Skelton J, Squares R, Squares S, Sulston JE, Taylor K, Whitehead S and Barrell BG

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, UK.

    Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

    Funded by: NIAID NIH HHS: Z01 AI000783-11; Wellcome Trust

    Nature 1998;393;6685;537-44

  • Sequence assembly with CAFTOOLS.

    Dear S, Durbin R, Hillier L, Marth G, Thierry-Mieg J and Mott R

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    Large-scale genomic sequencing requires a software infrastructure to support and integrate applications that are not directly compatible. We describe a suite of software tools built around the Common Assembly Format (CAF), a comprehensive representation of a sequence assembly as a text file. These tools form the backbone of sequencing informatics at the Sanger Centre and the Genome Sequencing Center. The CAF format is intentionally flexible, and our Perl and C libraries, which parse and manipulate it, provide powerful tools for creating new applications as well as wrappers to incorporate other software. The tools are available free by anonymous FTP from

    Genome research 1998;8;3;260-7

  • A physical map of 30,000 human genes.

    Deloukas P, Schuler GD, Gyapay G, Beasley EM, Soderlund C, Rodriguez-Tomé P, Hui L, Matise TC, McKusick KB, Beckmann JS, Bentolila S, Bihoreau M, Birren BB, Browne J, Butler A, Castle AB, Chiannilkulchai N, Clee C, Day PJ, Dehejia A, Dibling T, Drouot N, Duprat S, Fizames C, Fox S, Gelling S, Green L, Harrison P, Hocking R, Holloway E, Hunt S, Keil S, Lijnzaad P, Louis-Dit-Sully C, Ma J, Mendis A, Miller J, Morissette J, Muselet D, Nusbaum HC, Peck A, Rozen S, Simon D, Slonim DK, Staples R, Stein LD, Stewart EA, Suchard MA, Thangarajah T, Vega-Czarny N, Webber C, Wu X, Hudson J, Auffray C, Nomura N, Sikela JM, Polymeropoulos MH, James MR, Lander ES, Hudson TJ, Myers RM, Cox DR, Weissenbach J, Boguski MS and Bentley DR

    Sanger Centre, Hinxton Hall, Hinxton, Cambridge CB10 1SA UK.

    A map of 30,181 human gene-based markers was assembled and integrated with the current genetic map by radiation hybrid mapping. The new gene map contains nearly twice as many genes as the previous release, includes most genes that encode proteins of known function, and is twofold to threefold more accurate than the previous version. A redesigned, more informative and functional World Wide Web site ( provides the mapping information and associated data and annotations. This resource constitutes an important infrastructure and tool for the study of complex genetic traits, the positional cloning of disease genes, the cross-referencing of mammalian genomes, and validated human transcribed sequences for large-scale studies of gene expression.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 1998;282;5389;744-6

  • Expression profiling of single cells using 3 prime end amplification (TPEA) PCR.

    Dixon AK, Richardson PJ, Lee K, Carter NP and Freeman TC

    The Sanger Centre, The Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK,Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QJ, UK.

    The ability to relate the physiological status of individual cells to the complement of genes they express is limited by current methodological approaches for performing these analyses. We report here the development of a robust and reproducible method for amplifying 3' sequences of mRNA derived from single cells and demonstrate that the amplified cDNA, derived from individual human lymphoblastoma cells, can be used for the expression profiling of up to 40 different genes per cell. In addition, we show that 3 prime end amplification (TPEA) PCR can be used to enable the detection of both high and low abundance mRNA species in samples harvested from live neurons in rat brain slices. This procedure will facilitate the study of complex tissue function at the cellular level.

    Funded by: Wellcome Trust

    Nucleic acids research 1998;26;19;4426-31

  • Data disclosure in the human genome project.

    Dunham I

    Molecular medicine today 1998;4;8;335-6

  • Base qualities help sequencing software.

    Durbin R and Dear S

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Genome research 1998;8;3;161-2

  • An oligo-screening strategy to fill gaps found during shotgun sequencing projects.

    Flint J, Sims M, Clark K, Staden R and Thomas K

    Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, England.

    During the course of projects to sequence human and nematode cosmids we encountered great difficulties generating contiguous sequence in regions with repetitive DNA (Alu repeats in humans and tandem or inverted repeats in the nematode). We have developed a simple and efficient strategy to fill gaps. By screening M13, plasmid or phagemid libraries with oligonucleotides flanking the gap, clones are identified that contiguate the cosmid sequence. Our method has been integrated into the GAP4 sequence assembly program. The strategy reduces both time and costs in large scale sequencing projects.

    DNA sequence : the journal of DNA sequencing and mapping 1998;8;4;241-5

  • The Saccharomyces cerevisiae early secretion mutant tip20 is synthetic lethal with mutants in yeast coatomer and the SNARE proteins Sec22p and Ufe1p.

    Frigerio G

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Tip20p is an 80 kDa cytoplasmic protein bound to the cytoplasmic surface of the endoplasmic reticulum (ER) by interaction with the type II integral membrane protein Sec20p. Both proteins are required for vesicular transport between the ER and Golgi complex. Recently, sec20-1 was found to be defective in retrograde transport. A collection of temperature-sensitive tip20 mutants are shown to be lethal in combination with ufe1-1, a target SNARE of the ER and ret2-1, yeast delta-COP. A subset of tip20 mutants was found to be lethal in combination with sec20-1, sec21-1, sec22-3 and sec27-1. Since all pairwise combinations of a tip20 mutant, sec20-1, and ufe1-1 are lethal, Tip20p and Sec20p might be part of the docking complex for Golgi-derived retrograde transport vesicles. Since carboxy-terminal tip20 truncations are lethal in combination with mutants in three coatomer subunits, Tip20p might be involved in binding or uncoating of COPI coated retrograde transport vesicles.

    Funded by: Wellcome Trust

    Yeast (Chichester, England) 1998;14;7;633-46

  • The pufferfish SLP-1 gene, a new member of the SCL/TAL-1 family of transcription factors.

    Göttgens B, Gilbert JG, Barton LM, Aparicio S, Hawker K, Mistry S, Vaudin M, King A, Bentley D, Elgar G and Green AR

    Department of Haematology, MRC Centre, University of Cambridge, United Kingdom.

    The SCL/TAL-1 gene encodes a basic helix-loop-helix (bHLH) transcription factor essential for the development of all hemopoietic lineages and also acts as a T-cell oncogene. Four related genes have been described in mammals (LYL-1, TAL-2, NSCL1, and NSCL2), all of which exhibit a high degree of sequence similarity to SCL/TAL-1 in the bHLH domain and two of which (LYL-1 and TAL-2) have also been implicated in the pathogenesis of T-cell acute lymphoblastic leukemia. In this study we describe the identification and characterization of a pufferfish gene termed SLP-1, which represents a new member of this gene family. The genomic structure and sequence of SLP-1 suggests that it forms a subfamily with SCL/TAL-1 and LYL-1 and is most closely related to SCL/TAL-1. However, unlike SCL/TAL-1, SLP-1 is widely expressed. Sequence analysis of a whole cosmid containing SLP-1 shows that SLP-1 is flanked upstream by a zinc finger gene and a fork-head-domain gene and downstream by a heme-oxygenase and a RING finger gene.

    Funded by: Wellcome Trust

    Genomics 1998;48;1;52-62

  • Statement on the rapid release of genomic DNA sequence.

    Guyer M

    National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.

    Genome research 1998;8;5;413

  • TAPASIN, DAXX, RGL2, HKE2 and four new genes (BING 1, 3 to 5) form a dense cluster at the centromeric end of the MHC.

    Herberg JA, Beck S and Trowsdale J

    Human Immunogenetics Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London, U.K.

    TAPASIN, a gene recently shown to be required for antigen presentation through MHC class I molecules, is located 180 kbp centromeric of HLA-DP in a region linked to several diseases, and associated with altered developmental phenotypes in the mouse. We present the genomic analysis of a 70 kbp gene-dense segment flanking the TAPASIN locus, including sequence, structure and preliminary characterisation of seven additional genes. BING1 is a Zn finger gene containing a POZ motif. BING3 is similar to myosin regulatory light chain. BING4 shows homologies only to hypothetical yeast and Caenorhabditis elegans proteins. BING5 is found within an intron of BING4 on the complementary strand, and encodes a molecule with no homologies to database proteins. Another three genes were identified whose full sequence was not previously known; namely, RGL2, DAXX (BING2) and HKE2. RGL2 encodes an effector of Ras, homologous to the mouse RalGDS protein, Rlf. DAXX encodes an effector of Fas that stimulates apoptosis through the Jun kinase (JNK) pathway. The location of DAXX is of interest given the linkage of autoimmune disease to the MHC and to apoptosis.

    Funded by: Wellcome Trust

    Journal of molecular biology 1998;277;4;839-57

  • Genomic structure and domain organisation of the human Bak gene.

    Herberg JA, Phillips S, Beck S, Jones T, Sheer D, Wu JJ, Prochazka V, Barr PJ, Kiefer MC and Trowsdale J

    Imperial Cancer Research Fund Laboratories, 44 Lincoln's Inn Fields, London, UK.

    The Bcl-2 homologue, Bak, is a potent inducer of apoptosis. FISH data presented here located the gene to 6p21.3. Mapping was consistent with its location centromeric of the HSET locus and approximately 400kb from the MHC. The construction of a contig of genomic clones across the locus facilitated the sequencing of a PAC containing the gene. Comparison of the gene structure to functional and physical domains revealed a good agreement between the physical structure and the intron-exon organisation. The position of a single intron was conserved in comparison to other members of the Bcl-2 family, namely Bax, CED-9, Bcl-X and Bcl-2, but all other introns were displaced, consistent with a divergent phylogeny.

    Funded by: Cancer Research UK: A3585; Wellcome Trust

    Gene 1998;211;1;87-94

  • Genomic analysis of the Tapasin gene, located close to the TAP loci in the MHC.

    Herberg JA, Sgouros J, Jones T, Copeman J, Humphray SJ, Sheer D, Cresswell P, Beck S and Trowsdale J

    Imperial Cancer Research Fund Laboratories, London, GB.

    The Tapasin molecule is a member of the immunoglobulin (Ig) superfamily required for the association of TAP transporters and MHC class I heterodimers in the endoplasmic reticulum. In this study, the Tapasin gene was precisely mapped in relation to the MHC. The gene was centromeric of the HLA-DP locus between the HSET and HKE1.5 genes and within 500 kbp of the TAP1 and TAP2 genes. A homologous mouse EST was mapped to a syntenic position on chromosome 17, centromeric of the H-2 K locus. Similarly, the rat Tapasin gene was shown to be in an equivalent location with respect to the RT1.A locus. The localization of Tapasin, TAP, LMP and class I genes within such a short distance of each other on the chromosome implies some regulatory or functional significance. We determined the Tapasin gene sequence for comparison of its structure to that of other Ig superfamily members, such as MHC class I genes. The IgC domain was encoded by a separate exon. However, the positions of the other introns were not characteristic of other Ig superfamily genes, indicating that Tapasin has a distinct phylogeny.

    Funded by: Cancer Research UK: A3585; Wellcome Trust

    European journal of immunology 1998;28;2;459-67

  • Dynamic programming alignment accuracy.

    Holmes I and Durbin R

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, England. ihh,

    Algorithms for generating alignments of biological sequences have inherent statistical limitations when it comes to the accuracy of the alignments they produce. Using simulations, we measure the accuracy of the standard global dynamic programming method and show that it can be reasonably well modelled by an "edge wander" approximation to the distribution of the optimal scoring path around the correct path in the vicinity of a gap. We also give a table from which accuracy values can be predicted for commonly used scoring schemes and sequence divergences (the PAM and BLOSUM series). Finally we describe how to calculate the expected accuracy of a given alignment, and show how this can be used to construct an optimal accuracy alignment algorithm which generates significantly more accurate alignments than standard dynamic programming methods in simulated experiments.

    Journal of computational biology : a journal of computational molecular cell biology 1998;5;3;493-504

  • Large-scale sequence comparisons reveal unusually high levels of variation in the HLA-DQB1 locus in the class II region of the human MHC.

    Horton R, Niblett D, Milne S, Palmer S, Tubby B, Trowsdale J and Beck S

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Comparison of genomic sequences flanking the HLA-DQB1 locus in the human MHC class II region reveals local sequence variation of up to 10%, which is the highest level of sequence variation found in the human genome so far. The variation is haplotype-specific and extends far beyond the transcriptional unit of the DQB1 gene, suggesting hitch-hiking along with functionally selected alleles as the most likely mechanism. All major insertions/deletions (indels) were found to be of retroviral origin and in the immediate upstream region of DQB1. Possible cis-acting effects of these indels on the transcriptional regulation of DQB1 are discussed.

    Funded by: Wellcome Trust

    Journal of molecular biology 1998;282;1;71-97

  • SCOP, Structural Classification of Proteins database: applications to evaluation of the effectiveness of sequence alignment methods and statistics of protein structural data.

    Hubbard TJ, Ailey B, Brenner SE, Murzin AG and Chothia C

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, England.

    The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of all known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database, so far. The database can be used as a source of data to calibrate sequence search algorithms and for the generation of population statistics on protein structures. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.

    Acta crystallographica. Section D, Biological crystallography 1998;54;Pt 6 Pt 1;1147-54

  • Characterization of the human synaptogyrin gene family.

    Kedra D, Pan HQ, Seroussi E, Fransson I, Guilbaud C, Collins JE, Dunham I, Blennow E, Roe BA, Piehl F and Dumanski JP

    Department of Molecular Medicine, Karolinska Hospital, Stockholm, Sweden.

    Genomic sequencing was combined with searches of databases for identification of active genes on human chromosome 22. A cosmid from 22q13, located in the telomeric vicinity of the PDGFB (platelet-derived growth factor B-chain) gene, was fully sequenced. Using an expressed sequence tag-based approach we characterized human (SYNGR1) and mouse (Syngr1) orthologs of the previously cloned rat synaptogyrin gene (RATSYNGR1). The human SYNGR1 gene reveals three (SYNGR1a, SYNGR1b, SYNGR1c) alternative transcript forms of 4.5, 1.3 and 0.9 kb, respectively. The transcription of SYNGR1 starts from two different promoters, and leads to predicted proteins with different N- and C-terminal ends. The most abundant SYNGR1 a transcript, the 4.5-kb form, which corresponds to RATSYNGR1, is highly expressed in neurons of the central nervous system and at much lower levels in other tissues, as determined by in situ hybridization histochemistry. The levels of SYNGR1b and SYNGR1c transcripts are low and limited to heart, skeletal muscle, ovary and fetal liver. We also characterized two additional members of this novel synaptogyrin gene family in human (SYNGR2 and SYNGR3), and one in mouse (Syngr2). The human SYNGR2 gene transcript of 1.6 kb is expressed at high levels in all tissues, except brain. The 2.2-kb SYNGR3 transcript was detected in brain and placenta only. The human SYNGR2 and SYNGR3 genes were mapped by fluorescence in situ hybridization to 17qtel and 16ptel, respectively. The human SYNGR2 gene has a processed pseudogene localized in 15q11. All predicted synaptogyrin proteins contain four strongly conserved transmembrane domains, which is consistent with the M-shaped topology. The C-terminal polypeptide ends are variable in length, display a low degree of sequence similarity between family members, and are therefore likely to convey the functional specificity of each protein.

    Funded by: Wellcome Trust

    Human genetics 1998;103;2;131-41

  • Detection of gains and losses in 18 meningiomas by comparative genomic hybridization.

    Khan J, Parsa NZ, Harada T, Meltzer PS and Carter NP

    Laboratory of Cancer Genetics, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-4470, USA.

    Comparative genomic hybridization (CGH) was used to examine gains and losses in 18 meningioma tumors that had been previously analyzed for loss of heterozygosity (LOH) at 22q12. Partial or complete losses were seen by CGH in only 9 of 18 cases on chromosome 22. This compares with 11 of 18 losses of single or more loci by LOH. The discrepancy in these results in probably explained by the increased sensitivity of LOH by using microsatellite markers that are able to detect small deletions, whereas losses on the order of 10-15 megabases are required for confident identification by CGH. There was no consistent pattern of gains or losses by CGH, including those tumors that lacked LOH at 22q12. In one tumor of interest in which CGH and LOH studies failed to demonstrate loss on chromosome 22, CGH identified an area of amplification at 17q22-23.

    Funded by: Wellcome Trust

    Cancer genetics and cytogenetics 1998;103;2;95-100

  • Identification of an ATP-sensitive potassium channel current in rat striatal cholinergic interneurones.

    Lee K, Dixon AK, Freeman TC and Richardson PJ

    Parke Davis Neuroscience Research Centre, Cambridge University Forvie Site, UK.

    1. Whole-cell patch-clamp recordings were made from rat striatal cholinergic interneurones in slices of brain tissue in vitro. In the absence of ATP in the electrode solution, these neurones were found to gradually hyperpolarize through the induction of an outward current at -60 mV. This outward current and the resultant hyperpolarization were blocked by the sulphonylureas tolbutamide and glibenclamide and by the photorelease of caged ATP within neurones. 2. This ATP-sensitive outward current was not observed when 2 mM ATP was present in the electrode solution. Under these conditions, 500 microM diazoxide was found to induce an outward current that was blocked by tolbutamide. 3. Using permeabilized patch recordings, neurones were shown to hyperpolarize in response to glucose deprivation or metabolic poisoning with sodium azide (NaN3). The resultant hyperpolarization was blocked by tolbutamide. 4. In cell-attached recordings, metabolic inhibition with 1 mM NaN3 revealed the presence of a tolbutamide-sensitive channel exhibiting a unitary conductance of 44.1 pS. 5. Reverse transcription followed by the polymerase chain reaction using cytoplasm from single cholinergic interneurones demonstrated the expression of the ATP-sensitive potassium (KATP) channel subunits Kir6.1 and SUR1 but not Kir6.2 or SUR2. 6. It is concluded that cholinergic interneurones within the rat striatum exhibit a KATP channel current and that this channel is formed from Kir6.1 and SUR1 subunits.

    The Journal of physiology 1998;510 ( Pt 2);441-53

  • GLASS: a tool to visualize protein structure prediction data in three dimensions and evaluate their consistency.

    Leplae R, Hubbard T and Tramontano A

    Istituto di Ricerche di Biologia Molecolare, P. Angeletti, Pomezia, Italy.

    When a protein sequence does not share any significant sequence similarity with a protein of known structure, homology modeling cannot be applied. However, many novel and interesting methods, such as secondary structure prediction, fold recognition, and prediction of long-range interactions, are being developed and have been shown to be reasonably successful in predicting protein structures from sequence data and evolutionary information. The a priori evaluation of the correctness of a prediction obtained by one of these methods is however often problematic. Consequently, it is important to use all available information provided by as many different methods as possible and all the available experimental data about the protein of interest, since the consistency of the results is indicative of the reliability of the prediction. Hence the need has arisen for suitable tools able to compare results provided by different methods and evaluate their consistency. We have therefore constructed GLASS, a general platform to read, visualize, compare, and evaluate prediction results from many different sources and to project these prediction results into three dimensions. In addition, GLASS allows the comparison of selected parameters calculated for a model with the distribution observed in real protein structures, thus providing an easy way to test new methods for evaluating the likelihood of different structural models. GLASS can be considered as a "workbench" for structural predictions useful to both experimentalists and theoreticians.

    Funded by: Wellcome Trust

    Proteins 1998;30;4;339-51

  • Transmission disequilibrium analysis of a triplet repeat within the hKCa3 gene using family trios with schizophrenia.

    Li T, Hu X, Chandy KG, Fantino E, Kalman K, Gutman G, Gargus JJ, Freeman B, Murray RM, Dawson E, Liu X, Bruinvels AT, Sham PC and Collier DA

    Department of Psychological Medicine and Centre for Social, Genetic and Developmental Psychiatry, The Institute of Psychiatry, De Crespigny Park, Denmark Hill, SE5 8AF, United Kingdom.

    hKCa3 is a neuronal small conductance calcium-activated potassium channel which contains a polyglutamine tract, encoded by a polymorphic CAG repeat in the gene. Since an association between longer alleles of the CAG repeat and schizophrenia has been reported, we performed haplotype-based haplotype relative risk (HHRR) and transmission disequilibrium (TDT) in 97 family trios with schizophrenia from SW China. We found no evidence for an excess of longer CAG repeats in the patients, and the ETDT test was not significant for either allele-wise (P = 0.31) or genotype-wise analysis (P = 0.18). However, there was a deficit of transmission of the (CAG)20 repeat allele to affected offspring when this allele was considered individually by TDT (P = 0.012; not corrected for multiple testing). These data do not support a role for larger alleles at the hKCa3 locus in psychosis in Chinese subjects.

    Biochemical and biophysical research communications 1998;251;2;662-5

  • Short-insert libraries as a method of problem solving in genome sequencing.

    McMurray AA, Sulston JE and Quail MA

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the sequencing process) form complex folded secondary structures that are inaccessible to the enzymes. The solution, however, is simply to break them up. Specifically, the offending fragments are sonicated heavily and recloned, as much smaller fragments, into pUC vector. The sequences obtained from the resulting library can subsequently be assembled, free from the effects of secondary structure, to produce high-quality, complete sequence. Because of the success and simplicity of this procedure, we have begun to use it for the sequencing of all regions in which standard primer walking has been at all difficult.

    Genome research 1998;8;5;562-6

  • Mutations in a gene encoding a novel protein tyrosine phosphatase cause progressive myoclonus epilepsy.

    Minassian BA, Lee JR, Herbrick JA, Huizenga J, Soder S, Mungall AJ, Dunham I, Gardner R, Fong CY, Carpenter S, Jardim L, Satishchandra P, Andermann E, Snead OC, Lopes-Cendes I, Tsui LC, Delgado-Escueta AV, Rouleau GA and Scherer SW

    Department of Genetics, The Hospital for Sick Children, University of Toronto, Ontario, Canada.

    Lafora's disease (LD; OMIM 254780) is an autosomal recessive form of progressive myoclonus epilepsy characterized by seizures and cumulative neurological deterioration. Onset occurs during late childhood and usually results in death within ten years of the first symptoms. With few exceptions, patients follow a homogeneous clinical course despite the existence of genetic heterogeneity. Biopsy of various tissues, including brain, revealed characteristic polyglucosan inclusions called Lafora bodies, which suggested LD might be a generalized storage disease. Using a positional cloning approach, we have identified at chromosome 6q24 a novel gene, EPM2A, that encodes a protein with consensus amino acid sequence indicative of a protein tyrosine phosphatase (PTP). mRNA transcripts representing alternatively spliced forms of EPM2A were found in every tissue examined, including brain. Six distinct DNA sequence variations in EPM2A in nine families, and one homozygous microdeletion in another family, have been found to cosegregate with LD. These mutations are predicted to cause deleterious effects in the putative protein product, named laforin, resulting in LD.

    Funded by: NINDS NIH HHS: 5P01-NS21908; Wellcome Trust

    Nature genetics 1998;20;2;171-4

  • Alternative splicing, genomic structure, and fine chromosome localization of REV3L.

    Morelli C, Mungall AJ, Negrini M, Barbanti-Brodano G and Croce CM

    Department of Diagnostic and Experimental Medicine, Section of Microbiology, University of Ferrara, Ferrara (Italy).

    We have localized a human homolog, REV3L, of the Saccharomyces cerevisiae REV3 gene on chromosome region 6q21. The full-length cDNA consists of 10,919 nucleotides, with a putative open reading frame of 9,159 bp for a predicted protein of 3,053 amino acids. The gene contains 33 exons in about 200 kb of genomic DNA. In contrast to the previously reported sequence, an additional exon and an alternative splicing site are demonstrated.

    Cytogenetics and cell genetics 1998;83;1-2;18-20

  • Inversin, a novel gene in the vertebrate left-right axis pathway, is partially deleted in the inv mouse.

    Morgan D, Turnpenny L, Goodship J, Dai W, Majumder K, Matthews L, Gardner A, Schuster G, Vien L, Harrison W, Elder FF, Penman-Splitt M, Overbeek P and Strachan T

    Department of Human Genetics, University of Newcastle upon Tyne, UK.

    Visceral left-right asymmetry occurs in all vertebrates, but the inversion of embryo turning (inv) mouse, which resulted following a random transgene insertion, is the only model in which these asymmetries are consistently reversed. We report positional cloning of the gene underlying this recessive phenotype. Although transgene insertion was accompanied by neighbouring deletion and duplication events, our YAC phenotype rescue studies indicate that the mutant phenotype results from the deletion. After extensively characterizing the 47-kb deleted region and flanking sequences from the wild-type mouse genome, we found evidence for only one gene sequence in the deleted region. We determined the full-length 5.5-kb cDNA sequence and identified 16 exons, of which exons 3-11 were eliminated by the deletion, causing a frameshift. The novel gene specifies a 1062-aa product with tandem ankyrin-like repeat sequences. Characterization of complementing and non-complementing YAC transgenic families revealed that correction of the inv mutant phenotype was concordant with integration and intact expression of this novel gene, which we have named inversin (Invs).

    Funded by: Wellcome Trust

    Nature genetics 1998;20;2;149-56

  • Trace alignment and some of its applications.

    Mott R

    Informatics Group, Sanger Centre, Cambridge, UK.

    MOTIVATION: Extra useful information can be extracted from a DNA chromatogram trace, over that contained in the base-called DNA sequence. Many sequencing applications can benefit from examination of these traces. RESULTS: An algorithm, based on dynamic programming, for aligning a DNA chromatogram to a DNA sequence is described and implemented. Its applications to vector clipping, EST alignment and mutation detection are discussed.

    Bioinformatics (Oxford, England) 1998;14;1;92-7

  • Fine-mapping, genomic organization, and transcript analysis of the human ubiquitin-conjugating enzyme gene UBE2L3.

    Moynihan TP, Cole CG, Dunham I, O'Neil L, Markham AF and Robinson PA

    Clinical Sciences Building, St. James's University Hospital, Leeds, LS9 7TF, United Kingdom.

    The human UBE2L3 gene encodes the ubiquitin-conjugating enzyme UbcH7, demonstrated to participate in the ubiquitination of p53, c-Fos, and NF-kappaB in vitro. We report the fine-mapping of this four-exon gene to chromosome 22q11.2. We have constructed a comprehensive genomic clone contig across this gene, demonstrating that the gene lies adjacent to the microsatellite marker D22S446 and spans approximately 57 kb. Four mRNA species are transcribed from this gene, differing in the length of their 3' UTR. Sequence comparison of the UBE2L3 cDNA with its murine homologue reveals a remarkably high degree of sequence conservation within the 3'UTR.

    Funded by: Wellcome Trust

    Genomics 1998;51;1;124-7

  • Estimation of distances and map construction using radiation hybrids.

    Newell W, Beck S, Lehrach H and Lyall A

    Oxford Molecular Group, The Medawar Centre, Oxford OX4 4GA, UK.

    A method of estimating distances between pairs of genetic markers is described that directly uses their observed joint frequency distribution in a panel of radiation hybrids (RHs). The distance measure is based on the strength of association between marker pairs, which is high for close markers and decays with distance. These distances are then submitted to a previous method that generates linear coordinates for the markers directly from the intermarker distance matrix. This method of map building from RH data is simpler than others, because it uses only the observed joint frequency distributions of markers in the panel, and does not attempt to model unobserved quantities such as the retention of different sized fragments that contain the markers. It also incorporates directly the observed variation in retention of different markers, without needing a model for differential fragment retention dependent on chromosomal location, which is generally not known. Only small, precise distances are used in map construction, thereby reducing any effects of different fragment retention frequencies and local variations in X-ray sensitivity. The method is tested by simulation, and known marker distances and locations are successfully recovered from RH raw data. The method is also applied to publicly available data sets related to the recent transcript map of the human genome.

    Genome research 1998;8;5;493-508

  • Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.

    Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T and Chothia C

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK.

    The sequences of related proteins can diverge beyond the point where their relationship can be recognised by pairwise sequence comparisons. In attempts to overcome this limitation, methods have been developed that use as a query, not a single sequence, but sets of related sequences or a representation of the characteristics shared by related sequences. Here we describe an assessment of three of these methods: the SAM-T98 implementation of a hidden Markov model procedure; PSI-BLAST; and the intermediate sequence search (ISS) procedure. We determined the extent to which these procedures can detect evolutionary relationships between the members of the sequence database PDBD40-J. This database, derived from the structural classification of proteins (SCOP), contains the sequences of proteins of known structure whose sequence identities with each other are 40% or less. The evolutionary relationships that exist between those that have low sequence identities were found by the examination of their structural details and, in many cases, their functional features. For nine false positive predictions out of a possible 432,680, i.e. at a false positive rate of about 1/50,000, SAM-T98 found 35% of the true homologous relationships in PDBD40-J, whilst PSI-BLAST found 30% and ISS found 25%. Overall, this is about twice the number of PDBD40-J relations that can be detected by the pairwise comparison procedures FASTA (17%) and GAP-BLAST (15%). For distantly related sequences in PDBD40-J, those pairs whose sequence identity is less than 30%, SAM-T98 and PSI-BLAST detect three times the number of relationships found by the pairwise methods.

    Journal of molecular biology 1998;284;4;1201-10

  • SPEM: a parser for EMBL style flat file database entries.

    Pocock MR, Hubbard T and Birney E

    Informatics, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Summary: We present a set of Perl modules for the flexible and robust parsing and editing of EMBL/SWISS-PROT databases.

    Availability: The Web page at uk/Software/PerlModule/ provides information about downloading the SPEM and PrEMBL modules, and provides links to documentation and example code.

    Bioinformatics (Oxford, England) 1998;14;9;823-4

  • Using neural networks for prediction of the subcellular location of proteins.

    Reinhardt A and Hubbard T

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Neural networks have been trained to predict the subcellular location of proteins in prokaryotic or eukaryotic cells from their amino acid composition. For three possible subcellular locations in prokaryotic organisms a prediction accuracy of 81% can be achieved. Assigning a reliability index, 33% of the predictions can be made with an accuracy of 91%. For eukaryotic proteins (excluding plant sequences) an overall prediction accuracy of 66% for four locations was achieved, with 33% of the sequences being predicted with an accuracy of 82% or better. With the subcellular location restricting a protein's possible function, this method should be a useful tool for the systematic analysis of genome data and is available via a server on the world wide web.

    Funded by: Wellcome Trust

    Nucleic acids research 1998;26;9;2230-6

  • Deletion mapping on chromosome 10p and definition of a critical region for the second DiGeorge syndrome locus (DGS2).

    Schuffenhauer S, Lichtner P, Peykar-Derakhshandeh P, Murken J, Haas OA, Back E, Wolff G, Zabel B, Barisic I, Rauch A, Borochowitz Z, Dallapiccola B, Ross M and Meitinger T

    Abteilung Medizinische Genetik, Ludwig-Maximilians-Universität München, Germany.

    DiGeorge syndrome (DGS) is a developmental field defect, characterised by absent/hypoplastic thymus and parathyroid, and conotruncal heart defects, with haploinsufficiency loci at 22q (DGS1) and 10p (DGS2). We performed fluorescence in situ hybridisations (FISH) and polymerase chain reaction (PCR) analyses in 12 patients with 10p deletions, nine of them with features of DGS, and in a familial translocation 10p;14q associated with midline defects. The critical DGS2 region is defined by two DGS patients, and maps within a 1 cM interval including D10S547 and D10S585. The other seven DGS patients are hemizygous for both loci. The breakpoint of the reciprocal translocation 10p;14q maps at a distance of at least 12 cM distal to the critical DGS2 region. Interstitial and terminal deletions described are in the range of 10-50 cM and enable the tentative mapping of loci for ptosis and hearing loss, features which are not part of the DGS clinical spectrum.

    European journal of human genetics : EJHG 1998;6;3;213-25

  • Gene number in an invertebrate chordate, Ciona intestinalis.

    Simmen MW, Leitgeb S, Clark VH, Jones SJ and Bird A

    Institute of Cell and Molecular Biology, University of Edinburgh, King's Buildings, Edinburgh EH9 3JR, United Kingdom.

    Gene number can be considered a pragmatic measure of biological complexity, but reliable data is scarce. Estimates for vertebrates are 50-100,000 genes per haploid genome, whereas invertebrate estimates fall below 25,000. We wished to test the hypothesis that the origin of vertebrates coincided with extensive gene creation. A prediction is that gene number will differ sharply between invertebrate and vertebrate members of the chordate phylum. A gene number estimation method requiring limited sequence sampling of genomic DNA was developed and validated by using data for Caenorhabditis elegans. Using the method, we estimated that the invertebrate chordate Ciona intestinalis has 15,500 protein-coding genes (+/-3,700). This number is significantly lower than gene numbers of vertebrate chordates, but similar to those of invertebrates in distantly related phyla. The data indicate that evolution of vertebrates was accompanied by a dramatic increase in protein-coding capacity of the genome.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 1998;95;8;4437-40

  • Z extensions to the RHMAPPER package.

    Soderlund C, Lau T and Deloukas P

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    MOTIVATION: Extensions have been made to the RHMAPPER-1.1 package. One set of extensions computes the totally linked markers and uses the results as input to the salient RHMAPPER functions. The second set of extensions uses TKperl to provide an interactive interface for ease of querying the database and displaying maps. AVAILABILITY: The extensions can be obtained via Supplementary information: The User's Manual can be viewed from http:/ CONTACT:

    Bioinformatics (Oxford, England) 1998;14;6;538-9

  • Pfam: multiple sequence alignments and HMM-profiles of protein domains.

    Sonnhammer EL, Eddy SR, Birney E, Bateman A and Durbin R

    Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, Building 38A, Room 8N805, National Institutes of Health, Bethesda, MD 20894, USA.

    Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.0 of Pfam contains 527 manually verified families which are available for browsing and on-line searching via the World Wide Web in the UK at and in the US at http://genome.wustl. edu/Pfam/ Pfam 2.0 matches one or more domains in 50% of Swissprot-34 sequences, and 25% of a large sample of predicted proteins from the Caenorhabditis elegans genome.

    Funded by: NHGRI NIH HHS: R01-HG01363; Wellcome Trust

    Nucleic acids research 1998;26;1;320-2

  • DNA sequence and structure of the mouse RING3 gene: identification of variant RING3 transcripts.

    Thorpe KL and Beck S

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, England.

    Funded by: Wellcome Trust

    Immunogenetics 1998;48;1;82-6

  • Physical map of human 6p21.2-6p21.3: region flanking the centromeric end of the major histocompatibility complex.

    Tripodis N, Mason R, Humphray SJ, Davies AF, Herberg JA, Trowsdale J, Nizetic D, Senger G and Ragoussis J

    Division of Medical and Molecular Genetics, United Medical and Dental School of Guy's and St. Thomas', Guy's Hospital, London SE1 9RT, UK.

    We have physically mapped and cloned a 2.5-Mb chromosomal segment flanking the centromeric end of the major histocompatibility complex (MHC). We characterized in detail 27 YACs, 144 cosmids, 51 PACs, and 5 BACs, which will facilitate the complete genomic sequencing of this region of chromosome 6. The contig contains the genes encoding CSBP, p21, HSU09564 serine kinase, ZNF76, TCP-11, RPS10, HMGI(Y), BAK, and the human homolog of Tctex-7 (HSET). The GLO1 gene was mapped further centromeric in the 6p21.2-6p21.1 region toward TCTE-1. The gene order of the GLO1-HMGI(Y) segment in respect to the centromere is similar to the gene order in the mouse t-chromosome distal inversion, indicating that there is conservation in gene content but not gene order between humans and mice in this region. The close linkage of the BAK and CSBP genes to the MHC is of interest because of their possible involvement in autoimmune disease.

    Genome research 1998;8;6;631-43

  • Characterization of SCML1, a new gene in Xp22, with homology to developmental polycomb genes.

    van de Vosse E, Walpole SM, Nicolaou A, van der Bent P, Cahn A, Vaudin M, Ross MT, Durham J, Pavitt R, Wilkinson J, Grafham D, Bergen AA, van Ommen GJ, Yates JR, den Dunnen JT and Trump D

    MGC-Department of Human Genetics, Leiden University, Al Leiden, The Netherlands.

    Using exon trapping, we have identified a new human gene in Xp22 encoding a 3-kb mRNA. Expression of this RNA is detectable in a range of tissues but is most pronounced in skeletal muscle and heart. The gene, designated "sex comb on midleg-like-1" (SCML1), maps 14 kb centromeric of marker DXS418, between DXS418 and DXS7994, and is transcribed from telomere to centromere. SCML1 spans 18 kb of genomic DNA, consists of six exons, and has a 624-bp open reading frame. The predicted 27-kDa SCML1 protein contains two domains that each have a high homology to two Drosophila transcriptional repressors of the polycomb group (PcG) genes and their homologues in mouse and human. PcG genes are known to be involved in the regulation of homeotic genes, and the mammalian homologues of the PcG genes repress the expression of Hox genes. SCML1 appears to be a new human member of this gene group and may play an important role in the control of embryonal development.

    Funded by: Wellcome Trust

    Genomics 1998;49;1;96-102

  • Localization of HuC (ELAVL3) to chromosome 19p13.2 by fluorescence in situ hybridization utilizing a novel tyramide labeling technique.

    Van Tine BA, Knops JF, Butler A, Deloukas P, Shaw GM and King PH

    Department of Pathology, University of Alabama at Birmingham, Birmingham, Alabama, 35294, USA.

    HuC is a neural-specific member of the Elav family of RNA-binding proteins. This highly conserved gene family plays a crucial role in neurogenesis, and HuC (HGMW-approved symbol ELAVL3) is expressed at an early stage of neural development. Using a novel tyramide fluorescence in situ hybridization (T-FISH) technique, we localized HuC to chromosome 19p13.2. This localization was confirmed by radiation hybrid mapping and coincides with that of HuR (HGMW-approved symbol ELAVL1), another elav family member. Dual T-FISH analysis with HuC and HuR probes, however, indicated distinct loci, with HuC being centromeric to HuR. This study demonstrates the utility of T-FISH in colocalizing two genes on the same chromosomal preparation using only biotinylated probes.

    Funded by: NIAID NIH HHS: NIHUO1-AI32775; Wellcome Trust

    Genomics 1998;53;3;296-9

  • Sequencing and analysis of genes involved in the biosynthesis of a vancomycin group antibiotic.

    van Wageningen AM, Kirkpatrick PN, Williams DH, Harris BR, Kershaw JK, Lennard NJ, Jones M, Jones SJ and Solenberg PJ

    Cambridge Centre for Molecular Recognition, Department of Chemistry, Cambridge, UK.

    Background: The emergence of resistance to vancomycin, the drug of choice against methicillin-resistant Staphylococcus aureus, in enterococci has increased the need for new antibiotics. As chemical modification of the antibiotic structure is not trivial, we have initiated studies towards enzymatic modification by sequencing the DNA coding for the biosynthesis of chloroeremomycin (also known as A82846B and LY264826).

    Results: Analysis of 72 kilobases of genomic DNA from Amycolatopsis orientalis, the organism that produces chloroeremomycin, revealed the presence of 39 putative genes, including those coding for the biosynthesis of the antibiotic. Translation and subsequent comparison with known proteins in public databases identified enzymes responsible for the biosynthesis of the heptapeptide backbone and 4-epi-vancosamine, as well as those for chlorination and oxidation reactions involved in the biosynthesis of chloroeremomycin.

    Conclusions: The genes responsible for the biosynthesis of chloroeremomycin have been identified, and selective expression of these genes could lead to the synthesis of new potent glycopeptide antibiotics.

    Funded by: Wellcome Trust

    Chemistry & biology 1998;5;3;155-62

  • The Human Genome Project: reaching the finish line.

    Waterston R and Sulston JE

    Genome Sequencing Center, Washington University School of Medicine, St. Louis, MO 63108, USA.

    Science (New York, N.Y.) 1998;282;5386;53-4

  • Automated sequence preprocessing in a large-scale sequencing environment.

    Wendl MC, Dear S, Hodgson D and Hillier L

    Genome Sequencing Center, Washington University, St. Louis, Missouri 63108 USA.

    A software system for transforming fragments from four-color fluorescence-based gel electrophoresis experiments into assembled sequence is described. It has been developed for large-scale processing of all trace data, including shotgun and finishing reads, regardless of clone origin. Design considerations are discussed in detail, as are programming implementation and graphic tools. The importance of input validation, record tracking, and use of base quality values is emphasized. Several quality analysis metrics are proposed and applied to sample results from recently sequenced clones. Such quantities prove to be a valuable aid in evaluating modifications of sequencing protocol. The system is in full production use at both the Genome Sequencing Center and the Sanger Centre, for which combined weekly production is approximately 100, 000 sequencing reads per week.

    Funded by: NHGRI NIH HHS: HG00956, HG01458

    Genome research 1998;8;9;975-84

  • Physical mapping of the CA6, ENO1, and SLC2A5 (GLUT5) genes and reassignment of SLC2A5 to 1p36.2.

    White PS, Jensen SJ, Rajalingam V, Stairs D, Sulman EP, Maris JM, Biegel JA, Wooster R and Brodeur GM

    Oncology, The Children's Hospital of Philadelphia, Philadelphia, PA (USA).

    Several human malignancies frequently exhibit deletions or rearrangements of the distal short arm of chromosome 1 (1p36), and a number of genetic diseases also map to this region. The carbonic anhydrase (CA6) and alpha-enolase (ENO1) genes, previously mapped to 1p36, were physically linked in yeast- and P1-artificial chromosome (YAC and PAC) contigs. PACs from the contig were mapped to 1p36.2 by fluorescence in situ hybridization. The ESTs D1S2068, D1S274E, D1S3275, and stSG4370 were also placed in the same contig. The physical map was integrated with the genetic map of chromosome 1 by assignment of genetic markers D1S160, D1S1615, and D1S503 to the contig. Sequencing of the EST clone representing D1S274E indicated that it was derived from the same transcript as D1S2068E and corresponded to the SLC2A5 (GLUT5) gene, previously assigned to 1p31. Reassignment of SLC2A5 to 1p36.2 was confirmed by somatic cell and radiation hybrid mapping panels and was consistent with previous EST mapping data. Sequencing of the EST clone for D1S274E revealed the presence of intronic sequences, suggesting that the clone was derived from an unprocessed message. The presence of unprocessed and/or alternatively spliced EST clones has potential ramifications for EST-based genomic projects. This information should facilitate the mapping of tumor suppressor and genetic disease loci that have been localized to this region.

    Funded by: NCI NIH HHS: CA39771; Wellcome Trust

    Cytogenetics and cell genetics 1998;81;1;60-4

  • Toward a complete human genome sequence.

    No authors listed

    We have begun a joint program as part of a coordinated international effort to determine a complete human genome sequence. Our strategy is to map large-insert bacterial clones and to sequence each clone by a random shotgun approach followed by directed finishing. As of September 1998, we have identified the map positions of bacterial clones covering approximately 860 Mb for sequencing and completed >98 Mb ( approximately 3.3%) of the human genome sequence. Our progress and sequencing data can be accessed via the World Wide Web ( or

    Genome research 1998;8;11;1097-108

* quick link -