Sanger Institute - Publications 1996

Number of papers published in 1996: 36

  • DNA sequencing of the MHC class II region and the chromosome 6 sequencing effort at the Sanger Centre.

    Abdulla S, Alderton RP, Glynne RJ, Gut IG, Hosking LK, Jackson A, Kelly A, Newell WR, Radley E, Sanseau P, Thorpe KL, Trowsdale J and Beck S

    DNA Sequencing Laboratory, Imperial Cancer Research Fund, London, UK.

    The human Major Histocompatibility Complex (MHC) is located on the short arm of chromosome 6 (6p21.3) and spans about 4 Mb. According to different gene families the MHC is subdivided into a class I, class II and class III region and many of its gene products are associated with the immune system and the susceptibility to various diseases. To date, we have sequenced about 40% (400 kb) of the class II region between HLA-DP and HLA-DQ and a coordinated effort to sequence the entire MHC is well underway. Analysis of the sequence revealed several novel genes and provides new insights into the molecular organisation and evolution of the MHC. All our data are publicly available via the MHC database (MHCDB) which allows rapid access, retrieval and display in the context of other MHC associated data. MHCDB is online available at (http:(/)/www.hgmp.mrc.ac.uk/) and, together with all our sequences also via anonymous ftp (ftp.icnet.uk/icrf-public).

    DNA sequence : the journal of DNA sequencing and mapping 1996;7;1;5-7

  • Naturally occurring nucleosome positioning signals in human exons and introns.

    Baldi P, Brunak S, Chauvin Y and Krogh A

    Division of Biology, California Institute of Technology, Pasadena 91125, USA.

    We describe the structural implications of a periodic pattern found in human exons and introns by hidden Markov models. We show that exons (besides the reading frame) have a specific sequential structure in the form of a pattern with triplet consensus non-T(A/T)G, and a minimal periodicity of roughly ten nucleotides. The periodic pattern is also present in intron sequences, although the strength per nucleotide is weaker. Using two independent profile methods based on triplet bendability parameters from DNase I experiments and nucleosome positioning data, we show that the pattern in multiple alignments of internal exon and intron sequences corresponds to a periodic "in phase" bending potential towards the major groove of the DNA. The nucleosome positioning data show that the consensus triplets (and their complements) have a preference for locations on a bent double helix where the major groove faces inward and is compressed. The in-phase triplets are located adjacent to GCC/GGC triplets known to have the strongest bias in their positioning on the nuclesome. Analysis of mRNA sequences encoding proteins with known tertiary structure exclude the possibility that the pattern is a consequence of the previously well-known periodicity caused by the encoding of alpha-helices in proteins. Finally, we discuss the relation between the bending potential of coding and non-coding regions and its impact on the translational positioning of nucleosomes and the recognition of genes by the transcriptional machinery.

    Funded by: NLM NIH HHS: R43 LM05780; Wellcome Trust

    Journal of molecular biology 1996;263;4;503-10

  • Organisation and functions of class II genes and molecules.

    Beck S, Belich M, Gruneberg U, Jackson A, Kelly A, Sanseau P, Sanderson F, Trowsdale J and Van Ham M

    DNA Sequencing Laboratory, Imperial Cancer Research Fund, Holborn, London.

    The class II region of the human MHC contains all of the known class II genes: as well as antigen processing components and only one gene not obviously associated with the immune system, RING3. As an approach to understanding linkage disequilibrium and recombination in relation to polymorphism of the region we are cloning and sequencing the class II region. To date, the sequence of the DP-DQ region has almost been completed (see Report by S. Beck). Several sets of genes implicated in the immune system, especially in antigen processing and presentation, are clustered together in the MHC: class I (HLA-A, B, C etc) class II (DR, DQ, DP, DN, DO, DM) LMP2 and 7, TAP1 and 2, TNF, C2, C4, Bf, Hsp70. This situation has provoked speculation that the MHC behaves as a gene cluster in which allelic products of polymorphic genes are maintained on a haplotype so as to co-ordinate T cell repertoire development and deployment. The high levels of linkage disequilibrium across the region are consistent with this idea. Functions of the genes in the MHC are being investigated as a step towards gaining insight into antigen processing and presentation as well as understanding MHC-disease associations. We are concentrating on the functions of the class II-related genes, DM and DN/DO as well as the TAP/LMP cluster.

    DNA sequence : the journal of DNA sequencing and mapping 1996;7;1;21-3

  • PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

    Birney E, Thompson JD and Gibson TJ

    European Molecular Biology Laboratory, Heidelberg, Germany.

    DNA translation frames can be disrupted for several reasons, including: (i) errors in sequence determination; (ii) RNA processing, such as intron removal and guide RNA editing; (iii) less commonly, polymerase frameshifting during transcription or ribosomal frameshifting during translation. Frameshifts frequently confound computational activities involving homologous sequences, such as database searches and inferences on structure, function or phylogeny made from multiple alignments. A dynamic alignment algorithm is reported here which compares a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frameshifting. The algorithm has been incorporated into a new package, WiseTools, for comparison of biological sequences. A protein profile can be compared against either a DNA sequence or a protein sequence. The program PairWise may be used interactively for alignment of any two sequence inputs. SearchWise can perform combinations of searches through DNA or protein databases by a protein profile or DNA sequence. Routine application of the programs has revealed a set of database entries with frameshifts caused by errors in sequence determination.

    Nucleic acids research 1996;24;14;2730-9

  • The Caenorhabditis elegans genome project. C. elegans Genome Consortium.

    Coulson A

    Sanger Centre, Cambridge, U.K.

    Funded by: Wellcome Trust

    Biochemical Society transactions 1996;24;1;289-91

  • Molecular cloning of tissue-specific transcripts of a transketolase-related gene: implications for the evolution of new vertebrate genes.

    Coy JF, Dübel S, Kioschis P, Thomas K, Micklem G, Delius H and Poustka A

    Deutsches Krebsforschungszentrum, Heidelberg, Germany.

    As part of a systematic search for differentially expressed genes, we have isolated a novel transketolase-related gene (TKR) (HGMW-approved symbol TKT), located between the green color vision pigment gene (GCP) and the ABP-280 filamin gene (FLN1) in Xq28. Transcripts encoding tissue-specific protein isoforms could be isolated. Comparison with known transketolases (TK) demonstrated a TKR-specific deletion mutating one thiamine binding site. Genomic sequencing of the TKR gene revealed the presence of a pseudoexon as well as the acquisition of a tissue-specific spliced exon compared to TK. Since it has been postulated that the vertebrate genome arose by two cycles of tetraploidization from a cephalochordate genome, this could represent an example of the modulation of the function of a preexisting transketolase gene by gene duplication. Thiamine defiency is closely involved with two neurological disorders, Beriberi and Wernicke-Korsakoff syndromes, and in both of these conditions TK with altered activity are found. We discuss the possible involvement of TKR in explaining the observed variant transketolase forms.

    Funded by: Wellcome Trust

    Genomics 1996;32;3;309-16

  • Tissue distribution of adenosine receptor mRNAs in the rat.

    Dixon AK, Gubitz AK, Sirinathsinghji DJ, Richardson PJ and Freeman TC

    Department of Pharmacology, University of Cambridge.

    1. A degree of ambiguity and uncertainty exists concerning the distribution of mRNAs encoding the four cloned adenosine receptors. In order to consolidate and extent current understanding in this area, the expression of the adenosine receptors has been examined in the rat by use of in situ hybridisation and the reverse transcription-polymerase chain reaction (RT-PCR). 2. In accordance with earlier studies, in situ hybridisation revealed that the adenosine A1 receptor was widely expressed in the brain, whereas A2A receptor mRNA was restricted to the striatum, nucleus accumbens and olfactory tubercle. In addition, A1 receptor mRNA was detected in large striatal cholinergic interneurones, 26% of these neurones were also found to express the A2A receptor gene. Central levels of mRNAs encoding adenosine A2B and A3 receptors were, however, below the detection limits of in situ hybridisation. 3. The more sensitive technique of RT-PCR was then employed to investigate the distribution of adenosine receptor mRNAs in the central nervous system (CNS) and a wide range of peripheral tissues. As a result, many novel sites of adenosine receptor gene expression were identified. A1 receptor expression has now been found in the heart, aorta, liver, kidney, eye and bladder. These observations are largely consistent with previous functional data. A2A receptor mRNA was detected in all brain regions tested, demonstrating that expression of this receptor is not restricted to the basal ganglia. In the periphery A2A receptor mRNA was also found to be more widely distributed than generally recognised. The ubiquitous distribution of the A2B receptor is shown for the first time, A2B mRNA was detected at various levels in all rat tissues studied. Expression of the gene encoding the adenosine A3 receptor was also found to be widespread in the rat, message detected throughout the CNS and in many peripheral tissues. This pattern of expression is similar to that observed in man and sheep, which had previously been perceived to possess distinct patterns of A3 receptor gene expression in comparison to the rat. 4. In summary, this work has comprehensively studied the expression of all the cloned adenosine receptors in the rat, and in so doing, resolves some of the uncertainty over where these receptors might act to control physiological processes mediated by adenosine.

    Funded by: Wellcome Trust

    British journal of pharmacology 1996;118;6;1461-8

  • Refined mapping and YAC contig construction of the X-linked cleft palate and ankyloglossia locus (CPX) including the proximal X-Y homology breakpoint within Xq21.3.

    Forbes SA, Brennan L, Richardson M, Coffey A, Cole CG, Gregory SG, Bentley DR, Mumm S, Moore GE and Stanier P

    Institute of Obstetrics and Gynaecology, Queen Charlotte's Hospital, London, United Kingdom.

    The gene for X-linked cleft palate (CPX) has previously been mapped in an Icelandic kindred between the unordered proximal markers DXS1002/DXS349/DXS95 and the distal marker DXYS1X, which maps to the proximal end of the X-Y homology region in Xq21.3. Using six sequence-tagged sites (STSs) within the region, a total of 91 yeast artificial chromosome (YAC) clones were isolated and overlapped in a single contig that spans approximately 3.1 Mb between DXS1002 and DXYS1X. The order of microsatellite and STS markers in this was established as DXS1002-DXS1168-DSX349-DXS95-DXS364-DXS 1196-DXS262-DXS110-DXS1066-(DXS1169, DXS1222)-DXS472-DXS1217-DXYS1X. A long-range restriction map of this region was created using eight nonchimeric, overlapping YAC clones. Analysis of newly positioned polymorphic markers in recombinant individuals from the Icelandic family has enabled us to identify DXS1196 and DXS1217 as the flanking markers for CPX. The maximum physical distance containing the CPX gene has been estimated to be 2.0 Mb, which is spanned by a minimum set of five nonchimeric YAC clones. In addition, YAC end clone and STS analyses have pinpointed the location of the proximal boundary of the X-Y homology region within the map.

    Funded by: NIAMS NIH HHS: 5 T32 AR07033-21; Wellcome Trust

    Genomics 1996;31;1;36-43

  • Localization of the human kinesin light chain gene (KNS2) to chromosome 14q32.3 by fluorescence in situ hybridization.

    Goedert M, Marsh S and Carter N

    MRC Laboratory of Molecular Biology, Cambridge CB2 2QH, United Kingdom.

    Genomics 1996;32;1;173-5

  • Life with 6000 genes.

    Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H and Oliver SG

    Université Catholique de Louvain, Unité de Biochimie Physiologique, Place Croix du Sud, 2/20, 1348 Louvain-la-Neuve, Belgium.

    The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 1996;274;5287;546, 563-7

  • The C. elegans expression pattern database: a beginning.

    Hope IA, Albertson DG, Martinelli SD, Lynch AS, Sonnhammer E and Durbin R

    Department of Biology, University of Leeds, UK. i.a.hope@leeds.ac.uk

    Funded by: Wellcome Trust

    Trends in genetics : TIG 1996;12;9;370-1

  • Hidden Markov models for sequence analysis: extension and analysis of the basic method.

    Hughey R and Krogh A

    University of California, Santa Cruz 95064, USA.

    Hidden Markov models (HMMs) are a highly effective means of modeling a family of unaligned sequences or a common motif within a set of unaligned sequences. The trained HMM can then be used for discrimination or multiple alignment. The basic mathematical description of an HMM and its expectation-maximization training procedure is relatively straightforward. In this paper, we review the mathematical extensions and heuristics that move the method from the theoretical to the practical. We then experimentally analyze the effectiveness of model regularization, dynamic model modification and optimization strategies. Finally it is demonstrated on the SH2 domain how a domain can be found from unaligned sequences using a special model type. The experimental work was completed with the aid of the Sequence Alignment and Modeling software suite.

    Computer applications in the biosciences : CABIOS 1996;12;2;95-107

  • Identification and characterization of NF1-related loci on human chromosomes 22, 14 and 2.

    Hulsebos TJ, Bijleveld EH, Riegman PH, Smink LJ and Dunham I

    Institute of Human Genetics, University of Amsterdam, Faculty of Medicine, The Netherlands. T.J.HULSEBOS@AMC.UVA.NL

    Neurofibromatosis type 1 (NF1) is a frequent hereditary disorder. The disease is characterized by a very high mutation rate (up to 1/10000 gametes per generation). NF1-related loci in the human genome have been implicated in the high mutation rate by hypothesizing that these carry disease-causing mutations, which can be transferred to the functional NF1 gene on chromosome arm 17q by interchromosomal gene conversion. To test this hypothesis, we want to identify and characterize the NF1-related loci in the human genome. In this study, we have localized an NF1-related locus in the most centromeric region of the long arm of chromosome 22. We demonstrate that this locus contains sequences homologous to cDNAs that include the GAP-related domain of the functional NF1 gene. However, the GAP-related domain itself is not represented in this locus. In addition, cosmids specific to this locus reveal, by in situ hybridization, NF1-related loci in the pericentromeric region of chromosome arm 14q and in chromosomal band 2q21. These cosmids will enable us to determine whether identified disease-causing mutations are present at the chromosome 22-associated NF1-related locus.

    Funded by: Wellcome Trust

    Human genetics 1996;98;1;7-11

  • Characterization of a second human clathrin heavy chain polypeptide gene (CLH-22) from chromosome 22q11.

    Kedra D, Peyrard M, Fransson I, Collins JE, Dunham I, Roe BA and Dumanski JP

    Department of Molecular Medicine, Karolinska Hospital, Stockholm, Sweden.

    We report cloning and characterization of the second human clathrin heavy chain polypeptide gene (CLH-22) localized to chromosome 22q11. Hence H. sapiens is the first species for which two clathrin heavy chain genes have been reported. We provide 5470 bp cDNA sequence covering the entire open reading frame of the CLH-22 gene. The predicted polypeptide is composed of 1640 amino acids. Its 6 kb transcript is expressed in all of 16 tested human tissues, suggesting it is a housekeeping gene. Skeletal muscle, testis and heart show significantly higher expression levels. Compared to the previously characterized human clathrin heavy chain gene localized on chromosome 17 (CLH-17), CLH-22 shows different transcript size and expression profile in human tissues. Northern analysis of CLH-22 suggests that several alternatively spliced transcripts exist. A presumably single, 171 bp long alternatively spliced exon has been characterized. Amino acid sequence comparison between CLH-22 and CLH-17 shows an overall identify and similarity of 84.7 and 91.1%, respectively. At the nucleic acid level, identity between open reading frames of both genes is 74.3%. Sequence comparison with previously cloned genes in other species suggests that counterparts of the CLH-17 gene have been cloned in B. taurus and R. norvegicus, whereas presumptive mammalian homologues of the CLH-22 gene are yet to be characterized. Our Northern and Southern blot analyses of meningiomas clearly suggest the CLH-22 gene may be involved in the tumor development and can be considered as a candidate for a tumor suppressor.

    Funded by: Wellcome Trust

    Human molecular genetics 1996;5;5;625-31

  • A bacterial artificial chromosome-based framework contig map of human chromosome 22q.

    Kim UJ, Shizuya H, Kang HL, Choi SS, Garrett CL, Smink LJ, Birren BW, Korenberg JR, Dunham I and Simon MI

    Division of Biology, California Institute of Technology, Pasadena, 91125, USA.

    We have constructed a physical map of human chromosome 22q using bacterial artificial chromosome (BAC) clones. The map consists of 613 chromosome 22-specific BAC clones that have been localized and assembled into contigs using 452 landmarks, 346 of which were previously ordered and mapped to specific regions of the q arm of the chromosome by means of chromosome 22-specific yeast artificial chromosome clones. The BAC-based map provides immediate access to clones that are stable and convenient for direct genome analysis. The approach to rapidly developing marker-specific BAC contigs is relatively straightforward and can be extended to generate scaffold BAC contig maps of the rest of the chromosomes. These contigs will provide substrates for sequencing the entire human genome. We discuss how to efficiently close contig gaps using the end sequences of BAC clone inserts.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 1996;93;13;6297-301

  • Transposon Tc1-derived, sequence-tagged sites in Caenorhabditis elegans as markers for gene mapping.

    Korswagen HC, Durbin RM, Smits MT and Plasterk RH

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    We present an approach to map large numbers of Tc1 transposon insertions in the genome of Caenorhabditis elegans. Strains have been described that contain up to 500 polymorphic Tc1 insertions. From these we have cloned and shotgun sequenced over 2000 Tc1 flanks, resulting in an estimated set of 400 or more distinct Tc1 insertion alleles. Alignment of these sequences revealed a weak Tc1 insertion site consensus sequence that was symmetric around the invariant TA target site and reads CAYATATRTG. The Tc1 flanking sequences were compared with 40 Mbp of a C. elegans genome sequence. We found 151 insertions within the sequenced area, a density of approximately 1 Tc1 insertion in every 265 kb. As the rest of the C. elegans genome sequence is obtained, remaining Tc1 alleles will fall into place. These mapped Tc1 insertions can serve two functions: (i) insertions in or near genes can be used to isolate deletion derivatives that have that gene mutated; and (ii) they represent a dense collection of polymorphic sequence-tagged sites. We demonstrate a strategy to use these Tc1 sequence-tagged sites in fine-mapping mutations.

    Funded by: NCRR NIH HHS: R01 RR10082-02; Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 1996;93;25;14680-5

  • A radiation hybrid map spanning the entire human X chromosome integrating YACs, genes, and STS markers.

    Kumlien J, Grigoriev A, Roest Crollius H, Ross M, Goodfellow PN and Lehrach H

    Department of Genome Analysis, Imperial Cancer Research Fund, P.O. Box 123, 44 Lincoln's Inn Fields, London WC2A 3 PX, UK.

    We present a radiation hybrid (RH) map of human Chromosome (Chr) X, using 50 markers on 72 radiation hybrids. The markers, obtained from the consensus map, form a grid spanning the entire chromosome. To check the RH map, the marker order was determined by analysis of presence or absence of retained human DNA fragments in the RHs; the comparison with the consensus showed a similar order. Any STSs, microsatellites, genes, and clones can be positioned and ordered relative to the marker grid. This approach integrates genetic, physical, and large-scale clone mapping and is used to link YAC contigs containing data from various experimental sources.

    Mammalian genome : official journal of the International Mammalian Genome Society 1996;7;10;758-66

  • Chromosome-specific paints from a high-resolution flow karyotype of the dog.

    Langford CF, Fischer PE, Binns MM, Holmes NG and Carter NP

    Sanger Centre, Hinxton, Cambridge, UK.

    Using peripheral blood lymphocyte cultures and dual-laser flow cytometry, we have routinely obtained high-resolution bivariate flow karyotypes of the dog in which 32 peaks are resolved. To allow the identification of the chromosome types in each peak, chromosomes were flow sorted, amplified and labelled by polymerase chain reaction with partially degenerate primers and hybridized onto metaphase spreads of a male dog. The chromosome paints from 22 of the 32 peaks each hybridized to single homologue pairs and eight peaks each hybridized to two pairs. Paints from the remaining two peaks hybridized to only one homologue each in the male metaphase spread, thus corresponding to the sex chromosomes X and Y. All of the 38 pairs of autosomes and the two sex chromosomes of the dog could be accounted for in these painting experiments. The positions of chromosomes 1-21 were assigned to the flow karyotype (only chromosomes 1-21 have as yet been officially designated). The high-resolution flow karyotype and the chromosome paints will facilitate further standardization of the dog karyotype. The ability to sort sufficient quantities of dog chromosomes for the production of chromosome-specific DNA libraries has the potential to accelerate the physical and genetic mapping of the dog genome.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 1996;4;2;115-23

  • Comparative analysis and genomic structure of the tuberous sclerosis 2 (TSC2) gene in human and pufferfish.

    Maheshwar MM, Sandford R, Nellist M, Cheadle JP, Sgotto B, Vaudin M and Sampson JR

    Institute of Medical Genetics, University of Wales College of Medicine, Cardiff, UK.

    Germ-line mutations of the TSC2 tumour suppressor gene have been identified in humans with tuberous sclerosis and in the Eker rat. Tuberin, the human TSC2 gene product, has a small region of homology with rap1GAP and stimulates rap1 GTPase activity in vitro, suggesting that one of its cellular roles is to function as a GTPase activating protein (GAP). We have undertaken a comparative analysis of the TSC2 gene in human and the pufferfish, Fugu rubripes. In addition to the GAP domain, three other regions of the proteins are highly conserved (peptide sequence similarity > 80%). These regions are likely to represent further functional domains. To facilitate analysis of mutations within these domains we have determined the genomic structure of the human TSC2 gene. It comprises 41 exons, including exon 31 which was absent from the originally described spliceoform of the human TSC2 transcript and was identified following exon prediction from Fugu genomic sequence. These findings support the proposal of the Fugu genome as a tool for human gene analysis.

    Human molecular genetics 1996;5;1;131-7

  • Physical mapping of chromosome 6: a strategy for the rapid generation of sequence-ready contigs.

    Mungall AJ, Edwards CA, Ranby SA, Humphray SJ, Heathcott RW, Clee CM, East CL, Holloway E, Butler AP, Langford CF, Gwilliam R, Rice KM, Maslen GL, Carter NP, Ross MT, Deloukas P, Bentley DR and Dunham I

    Sanger Centre, Huxton, Cambridge, UK.

    The development of radiation hybrid (RH) mapping (Cox et al., 1990) and the availability of large numbers of STS markers, together with extensive bacterial clone resources provided a means to accelerate the process of mapping a human chromosome and preparing bacterial clone contigs ready to sequence. Our aim is to construct physical clone maps covering those regions of chromosome 6 that are not currently extensively mapped, and use these to determine the DNA sequence of the whole chromosome. We report here a strategy which initially involves establishing a high density framework map using RH mapping. The framework markers are then used for the identification of bacterial genomic clones covering the chromosome. The bacterial clones are analysed by restriction enzyme fingerprinting and STS-content analysis to identify sequence-ready contigs. Contig gap closure will also be performed by clone walking.

    DNA sequence : the journal of DNA sequencing and mapping 1996;7;1;47-9

  • MHCDB: database of the human MHC (release 2).

    Newell WR, Trowsdale J and Beck S

    Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London WC2A 3PX, UK.

    The second release of the human major histocompatibility complex (MHC) database is now publicly available. It contains an updated physical map and considerably more genomic sequence. cDNA sequences of all current alleles are accessible as individual sequence entries. The variability of different genes is displayed graphically as static and dynamic images accessible from the database. Known disease-serotype associations have also been incorporated, together with data from the MHCPEP database of eluted peptides.

    Immunogenetics 1996;45;1;6-8

  • Analysis of the complete DNA sequence of murine cytomegalovirus.

    Rawlinson WD, Farrell HE and Barrell BG

    Laboratory of Molecular Biology, Cambridge, United Kingdom. WDR@westmed.wh.su.edu.au

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80).

    Funded by: Wellcome Trust

    Journal of virology 1996;70;12;8833-49

  • Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments.

    Riis SK and Krogh A

    Electronics Institute, Technical University of Denmark, Lyngby, Denmark.

    The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures alpha-helix, beta-strand, and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and the characteristic periodicity in alpha-helices. Since these single-structure networks all have less than 600 adjustable weights, overfitting is avoided. To obtain a three-state prediction of alpha-helix, beta-strand, or coil, ensembles of single-structure networks are combined with another neural network. This method gives an overall prediction accuracy of 66.3% when using 7-fold cross-validation on a database of 126 nonhomologous globular proteins. Applying the method to multiple sequence alignments of homologous proteins increases the prediction accuracy significantly to 71.3% with corresponding Matthew's correlation coefficients C alpha = 0.59, C beta = 0.52, and Cc = 0.50. More than 72% of the residues in the database are predicted with an accuracy of 80%. It is shown that the network outputs can be interpreted as estimated probabilities of correct prediction, and, therefore, these numbers indicate which residues are predicted with high confidence.

    Journal of computational biology : a journal of computational molecular cell biology 1996;3;1;163-83

  • Characterization of DRP2, a novel human dystrophin homologue.

    Roberts RG, Freeman TC, Kendall E, Vetrie DL, Dixon AK, Shaw-Smith C, Bone Q and Bobrow M

    Division of Medical and Molecular Genetics, UMDS, London, UK.

    The currently recognised dystrophin protein family comprises the archetype, dystrophin, its close relative, utrophin or dystrophin-related protein (DRP), and a distantly related protein known as the 87K tyrosine kinase substrate. During the course of a phylogenetic study of sequences encoding the characteristic C-terminal domains of dystrophin-related proteins, we identified an unexpected novel class of vertebrate dystrophin-related sequences. We term this class dystrophin-related protein 2 (DRP2), and suggest that utrophin/DRP be renamed DRP1 to simplify future nomenclature. DRP2 is a relatively small protein, encoded in man by a 45 kb gene localized to Xq22. It is expressed principally in the brain and spinal cord, and is similar in overall structure to the Dp116 dystrophin isoform. The discovery of a novel relative of dystrophin substantially broadens the scope for study of this interesting group of proteins and their associated glycoprotein complexes.

    Nature genetics 1996;13;2;223-6

  • An integrated YAC map of the human X chromosome.

    Roest Crollius H, Ross MT, Grigoriev A, Knights CJ, Holloway E, Misfud J, Li K, Playford M, Gregory SG, Humphray SJ, Coffey AJ, See CG, Marsh S, Vatcheva R, Kumlien J, Labella T, Lam V, Rak KH, Todd K, Mott R, Graeser D, Rappold G, Zehetner G, Poustka A, Bentley DR, Monaco AP and Lehrach H

    Max-Planck-Institut für Molekulare Genetik, Berlin, Germany. roest@mpimg-berlin-dahlem.mpg.de

    The human X chromosome is associated with a large number of disease phenotypes, principally because of its unique mode of inheritance that tends to reveal all recessive disorders in males. With the longer term goal of identifying and characterizing most of these genes, we have adopted a chromosome-wide strategy to establish a YAC contig map. We have performed > 3250 inter Alu-PCR product hybridizations to identify overlaps between YAC clones. Positional information associated with many of these YAC clones has been derived from our Reference Library Database and a variety of other public sources. We have constructed a YAC contig map of the X chromosome covering 125 Mb of DNA in 25 contigs and containing 906 YAC clones. These contigs have been verified extensively by FISH and by gel and hybridization fingerprinting techniques. This independently derived map exceeds the coverage of recently reported X chromosome maps built as part of whole-genome YAC maps.

    Genome research 1996;6;10;943-55

  • Long-range map of a 3.5-Mb region in Xp11.23-22 with a sequence-ready map from a 1.1-Mb gene-rich interval.

    Schindelhauer D, Hellebrand H, Grimm L, Bader I, Meitinger T, Wehnert M, Ross M and Meindl A

    Abteilung für Pädiatrische Genetik, Kinderpoliklinik der Universität München, Germany.

    Most of the yeast artificial chromosomes (YACs) isolated from the Xp11.23-22 region have shown instability and chimerism and are not a reliable resource for determining physical distances. We therefore constructed a long-range pulsed-field gel electrophoresis map that encompasses approximately 3.5 Mb of genomic DNA between the loci TIMP and DXS146 including a CpG-rich region around the WASP and TFE-3 gene loci. A combined YAC-cosmid contig was constructed along the genomic map and was used for fine-mapping of 15 polymorphic microsatellites and 30 expressed sequence tags (ESTs) or sequence transcribed sites (STSs), revealing the following order: tel-(SYN-TIMP)-(DXS426-ELK1)-ZNF(CA) n-L1-DXS1367-ZNF81-ZNF21-DXS6616- (HB3-OATL1pseudogenes-DXS6950)-DXS6949-DXS694 1-DXS7464E(MG61)-GW1E(EBP)- DXS7927E(MG81)-RBM- DXS722-DXS7467E(MG21)-DXS1011E-WASP-DXS6940++ +-DXS7466E(MG44)-GF1- DXS226-DXS1126-DXS1240-HB1- DXS7469E-(DXS6665-DXS1470)-TFE3-DXS7468E-+ ++SYP-DXS1208-HB2E-DXS573-DXS1331- DXS6666-DXS1039-DXS 1426-DXS1416-DXS7647-DXS8222-DXS6850-DXS255++ +-CIC-5-DXS146-cen. A sequence-ready map was constructed for an 1100-kb gene-rich interval flanked by the markers HB3 and DXS1039, from which six novel ESTs/STSs were isolated, thus increasing the number of markers used in this interval to thirty. This precise ordering is a prerequisite for the construction of a transcription map of this region that contains numerous disease loci, including those for several forms of retinal degeneration and mental retardation. In addition, the map provides the base to delineate the corresponding syntenic region in the mouse, where the mutants scurfy and tattered are localized.

    Genome research 1996;6;11;1056-69

  • Genome maps 7. The human transcript map. Wall chart.

    Schuler GD, Boguski MS, Hudson TJ, Hui L, Ma J, Castle AB, Wu X, Silva J, Nusbaum HC, Birren BB, Slonim DK, Rozen S, Stein LD, Page D, Lander ES, Stewart EA, Aggarwal A, Bajorek E, Brady S, Chu S, Fang N, Hadley D, Harris M, Hussain S, Hudson JR et al.

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

    Science (New York, N.Y.) 1996;274;5287;547-62

  • A gene map of the human genome.

    Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K, White RE, Rodriguez-Tomé P, Aggarwal A, Bajorek E, Bentolila S, Birren BB, Butler A, Castle AB, Chiannilkulchai N, Chu A, Clee C, Cowles S, Day PJ, Dibling T, Drouot N, Dunham I, Duprat S, East C, Edwards C, Fan JB, Fang N, Fizames C, Garrett C, Green L, Hadley D, Harris M, Harrison P, Brady S, Hicks A, Holloway E, Hui L, Hussain S, Louis-Dit-Sully C, Ma J, MacGilvery A, Mader C, Maratukulam A, Matise TC, McKusick KB, Morissette J, Mungall A, Muselet D, Nusbaum HC, Page DC, Peck A, Perkins S, Piercy M, Qin F, Quackenbush J, Ranby S, Reif T, Rozen S, Sanders C, She X, Silva J, Slonim DK, Soderlund C, Sun WL, Tabar P, Thangarajah T, Vega-Czarny N, Vollrath D, Voyticky S, Wilmer T, Wu X, Adams MD, Auffray C, Walter NA, Brandon R, Dehejia A, Goodfellow PN, Houlgatte R, Hudson JR, Ide SE, Iorio KR, Lee WY, Seki N, Nagase T, Ishikawa K, Nomura N, Phillips C, Polymeropoulos MH, Sandusky M, Schmitt K, Berry R, Swanson K, Torres R, Venter JC, Sikela JM, Beckmann JS, Weissenbach J, Myers RM, Cox DR, James MR, Bentley D, Deloukas P, Lander ES and Hudson TJ

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.

    The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The integrated resource is available through a site on the World Wide Web at http://www.ncbi.nlm.nih.gov/SCIENCE96/.

    Funded by: NHGRI NIH HHS: HG00098, HG00206, HG00835; Wellcome Trust; ...

    Science (New York, N.Y.) 1996;274;5287;540-6

  • The t(X;1)(p11.2;q21.2) translocation in papillary renal cell carcinoma fuses a novel gene PRCC to the TFE3 transcription factor gene.

    Sidhar SK, Clark J, Gill S, Hamoudi R, Crew AJ, Gwilliam R, Ross M, Linehan WM, Birdsall S, Shipley J and Cooper CS

    Molecular Carcinogenesis Section, Institute of Cancer Research, Haddow Laboratories, Belmont, UK.

    The specific chromosomal translocation t(X;1)(p11.2;q21.2) has been observed in human papillary renal cell carcinomas. In this study we demonstrated that this translocation results in the fusion of a novel gene designated PRCC at 1q21.2 to the TFE3 gene at Xp11.2. TFE3 encodes a member of the basic helix-loop-helix (bHLH) family of transcription factors originally identified by its ability to bind to microE3 elements in the immunoglobin heavy chain intronic enhancer. The translocation is predicted to result in the fusion of the N-terminal region of the PRCC protein, which includes a proline-rich domain, to the entire TFE3 protein. Notably the generation of the chimaeric PRCC-TFE3 gene appears to be accompanied by complete loss of normal TFE3 transcripts. This work establishes that the disruption of transcriptional control by chromosomal translocation is important in the development of kidney carcinoma in addition to its previously established role in the aetiology of sarcomas and leukaemias.

    Funded by: Wellcome Trust

    Human molecular genetics 1996;5;9;1333-8

  • Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology.

    Sjölander K, Karplus K, Brown M, Hughey R, Krogh A, Mian IS and Haussler D

    Baskin Center for Computer Engineering and Information Sciences, University of California at Santa Cruz 95064, USA. kimmen@cse.ucsc.edu

    We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichlet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model or other statistical model. These estimates give a statistical model greater generalization capacity, so that remotely related family members can be more reliably recognized by the model. This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.

    Funded by: NIGMS NIH HHS: GM17129

    Computer applications in the biosciences : CABIOS 1996;12;4;327-45

  • A member of the MAP kinase phosphatase gene family in mouse containing a complex trinucleotide repeat in the coding region.

    Theodosiou AM, Rodrigues NR, Nesbit MA, Ambrose HJ, Paterson H, McLellan-Arnold E, Boyd Y, Leversha MA, Owen N, Blake DJ, Ashworth A and Davies KE

    Biochemistry Department, University of Oxford, UK.

    We have identified a novel mouse gene encoding a protein that shows high homology to the dual-specificity tyrosine/threonine phosphatase family of proteins. The gene encodes a 5 kb transcript which is expressed predominantly in brain and lung and contains a translated complex trinucleotide repeat within the coding region. Using interspecific mouse backcross analysis, the gene has been localised to distal mouse chromosome 7. In human, homologous sequences are located in the syntenic region on distal chromosome 11p as well as to chromosome 10q11.2 and 10q22. The presence of a CG-rich trinucleotide repeat in the coding region provides a target for mutation which might result in loss of function or altered properties of this phosphatase.

    Human molecular genetics 1996;5;5;675-84

  • Yeasties and beasties: 7 years of genome sequencing.

    Thomas K

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, UK.

    The Saccharomyces cerevisiae genome sequencing project was the first of many projects aimed at sequencing the entire genomes of model organisms. Since its initiation in 1989, there have been numerous debates about the validity of genome sequencing, especially with reference to the model organisms. Seven years on, I hope to satisfy some of the critics by demonstrating that, as a consequence of the mass of data now becoming available from such projects, and the beginning of the major collaborative effort to sequence the human genome, we are now entering an exciting and dynamic time for those involved not only in genome sequencing, but also in all areas of the biological sciences.

    FEBS letters 1996;396;1;1-6

  • Phylogeny and structure of the RING3 gene.

    Thorpe KL, Abdulla S, Kaufman J, Trowsdale J and Beck S

    DNA Sequencing Laboratory, Imperial Cancer Research Fund, PO Box 123, London WC2A 3PX, UK.

    Immunogenetics 1996;44;5;391-6

  • The imprint of somatic hypermutation on the repertoire of human germline V genes.

    Tomlinson IM, Walter G, Jones PT, Dear PH, Sonnhammer EL and Winter G

    MRC Centre for Protein Engineering, Cambridge, UK.

    In the human immune system, antibodies with high affinities for antigen are created in two stages. A diverse primary repertoire of antibody structures is produced by the combinatorial rearrangement of germline V gene segments and antibodies are selected from this repertoire by binding to the antigen. Their affinities are then improved by somatic hypermutation and further rounds of selection. We have dissected the sequence diversity created at each stage in response to a wide range of antigens. In the primary repertoire, diversity is focused at the centre of the binding site. With somatic hypermutation, diversity spreads to regions at the periphery of the binding site that are highly conserved in the primary repertoire. We propose that evolution has favoured this complementarity as an efficient strategy for searching sequence space and that the germline V gene families evolved to exploit the diversity created by somatic hypermutation.

    Funded by: Wellcome Trust

    Journal of molecular biology 1996;256;5;813-17

  • An Xp22.1-p22.2 YAC contig encompassing the disease loci for RS, KFSD, CLS, HYP and RP15: refined localization of RS.

    Van de Vosse E, Bergen AA, Meershoek EJ, Oosterwijk JC, Gregory S, Bakker B, Weissenbach J, Coffey AJ, van Ommen GJ and Den Dunnen JT

    MGC-Department of Human Genetics, Leiden University, UK.

    To facilitate the positional cloning of the genes involved in retinoschisis (RS), keratosis follicularis spinulosa decalvans (KFSD), Coffin-Lowry syndrome (CLS), X-linked hypophosphatemic rickets (XLH, locus name HYP) and X-linked dominant cone-rod degeneration (locus name RP15), we have extended the molecular map of the Xp22 region. Screening of several YAC libraries allowed us to identify 156 YACs, 52 of which localize between markers DXS414 (P90) and DXS451 (kQST80H1). Analysis of their marker content facilitated the construction of a YAC contig from the region spanning (in this order): DXS414 - DXS987 - DXS207 - DXS1053 - DXS197 - DXS 43 - DXS1195 - DXS418 - DXS999 - PDHA1 - DXS7161 - DXS443 - DXS 7592 - DXS1229 - DXS365 - DXS7101 - DXS7593 - DXS1052 - DXS274 - DXS989 - DXS451. The region between DXS414 and DXS451 covers about 4.5-5 Mb. Two additional markers (DXS7593 and DXS7592) were placed in the region, thereby increasing the genetic resolution. Using the deduced marker order, the analysis of key recombinants in families segregating RS allowed us to refine the critical region for RS to 0.6 Mb, between DXS418 and DXS7161.

    European journal of human genetics : EJHG 1996;4;2;101-4

  • The Saccharomyces cerevisiae genome on the World Wide Web.

    Walsh S and Barrell B

    Sanger Centre, Hinxton Hall, Hinxton, Cambridge, UK. svw@sanger.ac.uk

    Funded by: Wellcome Trust

    Trends in genetics : TIG 1996;12;7;276-7

* quick link - http://q.sanger.ac.uk/cpyngqp2