Sanger Institute - Publications 2000

Number of papers published in 2000: 80

  • The genome sequence of Drosophila melanogaster.

    Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Sidén-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, WoodageT, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM and Venter JC

    Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P50-HG00750

    Science (New York, N.Y.) 2000;287;5461;2185-95

  • InterPro--an integrated documentation resource for protein families, domains and functional sites.

    Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ, Zdobnov EM and InterPro Consortium

    EMBL Outstation--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    MOTIVATION: InterPro is a new integrated documentation resource for protein families, domains and functional sites, developed initially as a means of rationalising the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. RESULTS: Merged annotations from PRINTS, PROSITE and Pfam form the InterPro core. Each combined InterPro entry includes functional descriptions and literature references, and links are made back to the relevant parent database(s), allowing users to see at a glance whether a particular family or domain has associated patterns, profiles, fingerprints, etc. Merged and individual entries (i.e. those that have no counterpart in the companion resources) are assigned unique accession numbers. Release 1.2 of InterPro (June 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification (PTMs) encoded by 6581 different regular expressions, profiles, fingerprints and Hidden Markov Models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1000000 hits from 264333 different proteins out of 384572 in SWISS-PROT and TrEMBL).

    Bioinformatics (Oxford, England) 2000;16;12;1145-50

  • Searching databases to find protein domain organization.

    Bateman A and Birney E

    Sanger Centre, Hinxton, United Kingdom.

    Advances in protein chemistry 2000;54;137-57

  • The structure of a LysM domain from E. coli membrane-bound lytic murein transglycosylase D (MltD).

    Bateman A and Bycroft M

    The Sanger Centre, Welcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    The LysM domain is a widespread protein module. It was originally identified in enzymes that degrade bacterial cell walls but is also present in many other bacterial proteins. Several proteins that contain the domain, such as Staphylococcal IgG binding proteins and Escherichia coli intimin, are involved in bacterial pathogenesis. LysM domains are also found in some eukaryotic proteins, apparently as a result of horizontal gene transfer from bacteria. The available evidence suggests that the LysM domain is a general peptidoglycan-binding module. We have determined the structure of this domain from E. coli membrane-bound lytic murein transglycosylase D. The LysM domain has a betaalphaalphabeta secondary structure with the two helices packing onto the same side of an anti- parallel beta sheet. The structure shows no similarity to other bacterial cell surface domains. A potential binding site in a shallow groove on surface of the protein has been identified.

    Journal of molecular biology 2000;299;4;1113-9

  • The Pfam protein families database.

    Bateman A, Birney E, Durbin R, Eddy SR, Howe KL and Sonnhammer EL

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. agb@sanger.ac.uk

    Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/. The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly searched against the Pfam library using the Wise2 package.

    Nucleic acids research 2000;28;1;263-6

  • The human major histocompatability complex: lessons from the DNA sequence.

    Beck S and Trowsdale J

    The Sanger Centre, Wellcome Trust Genome Campus, University of Cambridge, Cambridge CB10 1SA United Kindgom. beck@sanger.ac.uk

    The entire 3.6-MbpDNA sequence of a human major histocompatibility complex derived from a composite of DNA clones from different haplotypes, was completed in 1999, primarily through the work of four main groups. At that time, it was the longest contiguous human DNA sequence to have been determined. The sequence is of extremely high quality and accuracy. In this review, we discuss how the DNA sequence has facilitated our understanding of the biology and genetics of the major histocompatibility complex. We suggest some ways in which the sequence may be exploited in the future to explore the relationship between the extraordinary polymorphism of the region and its association with both autoimmune and infectious diseases.

    Annual review of genomics and human genetics 2000;1;117-37

  • From sequence to chromosome: the tip of the X chromosome of D. melanogaster.

    Benos PV, Gatt MK, Ashburner M, Murphy L, Harris D, Barrell B, Ferraz C, Vidal S, Brun C, Demailles J, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Borkova D, Minana B, Kafatos FC, Louis C, Sidén-Kiamos I, Bolshakov S, Papagiannakis G, Spanos L, Cox S, Madueño E, de Pablos B, Modolell J, Peter A, Schöttler P, Werner M, Mourkioti F, Beinert N, Dowe G, Schäfer U, Jäckle H, Bucheton A, Callister DM, Campbell LA, Darlamitsou A, Henderson NS, McMillan PJ, Salles C, Tait EA, Valenti P, Saunder RD and Glover DM

    The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Hall, Cambridge CB10 1SD, UK.

    One of the rewards of having a Drosophila melanogaster whole-genome sequence will be the potential to understand the molecular bases for structural features of chromosomes that have been a long-standing puzzle. Analysis of 2.6 megabases of sequence from the tip of the X chromosome of Drosophila identifies 273 genes. Cloned DNAs from the characteristic bulbous structure at the tip of the X chromosome in the region of the broad complex display an unusual pattern of in situ hybridization. Sequence analysis revealed that this region comprises 154 kilobases of DNA flanked by 1.2-kilobases of inverted repeats, each composed of a 350-base pair satellite related element. Thus, some aspects of chromosome structure appear to be revealed directly within the DNA sequence itself.

    Science (New York, N.Y.) 2000;287;5461;2220-2

  • The Human Genome Project--an overview.

    Bentley DR

    The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. The international program to determine the complete DNA sequence (3,000 million bases) is well underway. As of January 2000, 50% of the sequence is available in the public domain. A comprehensive working draft is expected this year, and the entire sequence is projected to be finished in 2003. DNA sequencing is carried out on mapped, overlapping bacterial clones of 150-200 kb. The working draft comprises assembled unfinished sequence and is released immediately in the public domain. The draft sequence of each clone is then completed, by closing any remaining gaps and resolving any ambiguities, before the entire sequence is checked, annotated, and submitted to the public databases. The sequence of each clone is finished to an accuracy of >99.99%. The availability of a reference sequence of the genome provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms (SNPs), in human populations. SNP typing is a powerful tool for genetic analysis, and will enable us to uncover the association of loci at specific sites in the genome with many disease traits. SNPs occur at a frequency of approximately 1 SNP/kb throughout the genome when the sequence of any two individuals is compared. Programs to detect and map SNPs in the human genome are underway with the aim of establishing a SNP map of the genome during the next two years. The human genome sequence will provide a complete description of all the genes. Annotation of the sequence with the gene structures is achieved by a combination of computational analysis (predictive and homology-based) and experimental confirmation by cDNA sequencing. Detecting homologies between newly defined gene products and proteins of known function helps to postulate biochemical functions for them, which can then be tested. Establishing the association of specific genes with disease phenotypes by mutation screening, particularly for monogenic disorders, provides further assistance in defining the functions of some gene products, as well as helping to establish the cause of the disease. As our knowledge of gene sequences and sequence variation in populations increases, we will pinpoint more and more of the genes and proteins that are important in common, complex diseases. A more detailed understanding of the function of the human genome will be achieved as we identify sequences that control gene expression. Given the availability of gene sequences, the expression status of genes in particular tissues can be monitored in parallel. By comparing corresponding genomic sequences in different species (for example: man, mouse, chicken, and zebrafish), regions that have been highly conserved during evolution can be identified, many of which reflect conserved functions such as gene regulation. These approaches promise to greatly accelerate our interpretation of the human genome sequence.

    Medicinal research reviews 2000;20;3;189-96

  • Decoding the human genome sequence.

    Bentley DR

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. drb@sanger.ac.uk

    The year 2000 is marked by the production of the sequence of the human genome. A 'working draft' of high quality sequence covering 90% of the genome has been determined and a quarter is in finished form, including the first two completed chromosomes. All sequence data from the project is made freely available to the community via the Internet, for further analysis and exploitation. The challenge which lies ahead is to decipher the information. Knowledge of the human genome sequence will enable us to understand how the genetic information determines the development, structure and function of the human body. We will be able to explore how variations within our DNA sequence cause disease, how they affect our interaction with our environment and ultimately to develop new and effective ways to improve human health.

    Human molecular genetics 2000;9;16;2353-8

  • Using GeneWise in the Drosophila annotation experiment.

    Birney E and Durbin R

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Birney@ebi.ac.uk

    The GeneWise method for combining gene prediction and homology searches was applied to the 2.9-Mb region from Drosophila melanogaster. The results from the Genome Annotation Assessment Project (GASP) showed that GeneWise provided reasonably accurate gene predictions. Further investigation indicates that many of the incorrect gene predictions from GeneWise were due to transposons with valid protein-coding genes and the remaining cases are pseudogenes or possible annotation oversights.

    Genome research 2000;10;4;547-8

  • Assessing the impact of Plasmodium falciparum genome sequencing.

    Bowman S and Horrocks P

    Pathogen Sequencing Unit, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. sharen@sanger.ac.uk

    With the publication of the complete sequences for chromosomes 2 and 3 and the increasing availability of shotgun sequence covering most of its genome, Plasmodium falciparum biology is entering its post-genomic era. Analysis of the results generated to date has identified higher-order organisation of gene families involved in parasite pathology, provided information regarding the unique biology of this parasite and allowed the identification of potential chemotherapeutic drug targets. Continuing efforts to complete the P. falciparum genome and the availability of sequences from other protozoan parasites will facilitate a broader understanding of their biology, particularly with respect to their pathogenicity.

    Microbes and infection / Institut Pasteur 2000;2;12;1479-87

  • The third human FER-1-like protein is highly similar to dysferlin.

    Britton S, Freeman T, Vafiadaki E, Keers S, Harrison R, Bushby K and Bashir R

    Molecular Genetics Unit, School of Biochemistry and Genetics, University of Newcastle upon Tyne, Newcastle upon Tyne, England, NE1 7RU, United Kingdom.

    Dysferlin, the protein product of the gene mutated in patients with an autosomal recessive limb-girdle muscular dystrophy type 2B (LGMD2B) and a distal muscular dystrophy, Miyoshi myopathy, is homologous to a Caenorhabditis elegans spermatogenesis factor, FER-1. Analysis of fer-1 mutants and of sequence predictions of the FER-1 and dysferlin ORFs has predicted a role in membrane fusion. Otoferlin, another human FER-1-like protein (ferlin), has recently been shown to be responsible for autosomal recessive nonsyndromic deafness (DFNB9). In this report we describe the third human ferlin gene, FER1L3, which maps to chromosome 10q23.3. Expression analysis of the orthologous mouse gene shows ubiquitous expression but predominant expression in the eye, esophagus, and salivary gland. All the ferlins are characterized by sequences corresponding to multiple C2 domains that share the highest level of homology with the C2A domain of rat synaptotagmin III. They are predicted to be Type II transmembrane proteins, with the majority of the protein facing the cytoplasm anchored by the C-terminal transmembrane domain. Sequence and predicted structural comparisons have highlighted the high degree of similarity of dysferlin and FER1L3, which have sequences corresponding to six C2 domains and which share more than 60% amino acid sequence identity.

    Genomics 2000;68;3;313-21

  • Refinement of an ovarian cancer tumour suppressor gene locus on chromosome arm 22q and mutation analysis of CYP2D6, SREBP2 and NAGA.

    Bryan EJ, Thomas NA, Palmer K, Dawson E, Englefield P and Campbell IG

    VBCRC, Cancer Genetics Laboratory, Peter MacCallum Cancer Institute, East Melbourne, Victoria, Australia.

    Loss of heterozygosity on chromosome 22q was detected in 53% of 123 ovarian carcinomas, suggesting the presence of at least one tumour suppressor gene. We have refined the location of one possible tumour suppressor gene to the region between the microsatellite markers D22S299 and CYP2D. Located within this region are the genes SREBP2 (sterol regulatory element binding protein 2) and NAGA (N-acetyl-alpha-D-galactosaminidase). Investigation of the coding exons of these genes by single stranded conformational polymorphism/heteroduplex analysis failed to identify any somatic genetic alterations in 57 ovarian tumours which exhibited LOH on 22q13. The CYP2D gene locus straddles the distal boundary of the candidate region. Germline variants of the active CYP2D6 gene with differing abilities to metabolise specific substrates have been implicated in the development of various cancers. Comparison of the frequency of the two common germline mutations among 258 ovarian tumours and 231 non-cancer controls did not reveal any significant differences between the two groups. This suggests that the known polymorphic variants of CYP2D6 are not involved in ovarian cancer predisposition. We also conclude that neither NAGA nor SREBP2 are likely to be mutated in ovarian carcinomas.

    International journal of cancer. Journal international du cancer 2000;87;6;798-802

  • Analysis of canonical and non-canonical splice sites in mammalian genomes.

    Burset M, Seledtsov IA and Solovyev VV

    Informatic Division, The Sanger Centre, Hinxton, Cambridge, CB10 1SA, UK.

    A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.

    Nucleic acids research 2000;28;21;4364-75

  • A 6-Mb high-resolution physical and transcription map encompassing the hereditary prostate cancer 1 (HPC1) region.

    Carpten JD, Makalowska I, Robbins CM, Scott N, Sood R, Connors TD, Bonner TI, Smith JR, Faruque MU, Stephan DA, Pinkett H, Morgenbesser SD, Su K, Graham C, Gregory SG, Williams H, McDonald L, Baxevanis AD, Klingler KW, Landes GM and Trent JM

    Cancer Genetics Branch, Bethesda, Maryland 20892, USA. jdc@nhgri.nih.gov

    Several hereditary disease loci have been genetically mapped to the chromosome 1q24-q31 interval, including the hereditary prostate cancer 1 (HPC1) locus. Here, we report the construction of a 20-Mb yeast artificial chromosome contig and a high-resolution 6-Mb sequence-ready bacterial artificial chromosome (BAC)/P1-derived artificial chromosome (PAC) contig of 1q25 by sequence and computational analysis, STS content mapping, and chromosome walking. One hundred thirty-six new STSs, including 10 novel simple sequence repeat polymorphisms that are being used for genetic refinement of multiple disease loci, have been generated from this contig and are shown to map to the 1q25 interval. The integrity of the 6-Mb BAC/PAC contig has been confirmed by restriction fingerprinting, and this contig is being used as a template for human chromosome 1 genome sequencing. A transcription mapping effort has resulted in the precise localization of 18 known genes and 31 ESTs by database searching, exon trapping, direct cDNA hybridization, and sample sequencing of BACs from the 1q25 contig. An additional 11 known genes and ESTs have been placed within the larger 1q24-q31 interval. These transcription units represent candidate genes for multiple hereditary diseases, including HPC1.

    Genomics 2000;64;1;1-14

  • Controlling the end of the cell cycle.

    Cerutti L and Simanis V

    Wellcome Trust Genome Campus, The Sanger Centre, Hinxton, CB10 1SA, UK. lmc@sanger.ac.uk.

    The past year has seen significant advances in our understanding of how the events which occur at the end of mitosis, such as cytokinesis and the inactivation of mitotic cyclin dependent kinases are triggered, and also how they are prevented from occurring prematurely or inappropriately. This control is achieved through a combination of temporally ordered proteolytic events and changes in the subcellular localisation of proteins. These studies have also revealed that the nucleolus and spindle pole bodies play a key role in this regulation.

    Current opinion in genetics & development 2000;10;1;65-9

  • Domains in gene silencing and cell differentiation proteins: the novel PAZ domain and redefinition of the Piwi domain.

    Cerutti L, Mian N and Bateman A

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridge, UK.

    Trends in biochemical sciences 2000;25;10;481-2

  • Nematode functional genomics.

    Coulson A and Kuwabara P

    School of Biological Sciences, University of Manchester, UK.

    Yeast (Chichester, England) 2000;17;1;43-7

  • CpG island libraries from human chromosomes 18 and 22: landmarks for novel genes.

    Cross SH, Clark VH, Simmen MW, Bickmore WA, Maroon H, Langford CF, Carter NP and Bird AP

    Institute of Cell and Molecular Biology, University of Edinburgh, Darwin Building, King's Buildings, Mayfield Road, Edinburgh, EH9 3JR, UK. Sally.Cross@hgu.mrc.ac.uk

    CpG islands are found at the 5' end of approximately 60% of human genes and so are important genomic landmarks. They are concentrated in early-replicating, highly acetylated gene-rich regions. With respect to CpG island content, human Chrs 18 and 22 are very different from each other: Chr 18 appears to be CpG island poor, whereas Chr 22 appears to be CpG island rich. We have constructed and validated CpG island libraries from flow-sorted Chrs 18 and 22 and used these to estimate the difference in number of CpG islands found on these two chromosomes. These libraries contain normalized collections of sequences from the 5' end of genes. Clones from the libraries were sequenced and compared with the sequence databases; one third matched ESTs, thus anchoring these ESTs at the 5' end of their gene. However, it was striking that many clones either had no match or matched only existing CpG island clones. This suggests that a significant proportion of 5' gene sequences are absent from databases, presumably either because they are difficult to clone or the gene is poorly expressed and/or has a restricted expression pattern. This point should be taken into consideration if the currently available libraries are those used for the elucidation of complete, as opposed to partial, gene sequences. The Chr 18 and 22 CpG island libraries are a sequence resource for the isolation of such 5' gene sequences from specific human chromosomes.

    Mammalian genome : official journal of the International Mammalian Genome Society 2000;11;5;373-83

  • ProtEST: protein multiple sequence alignments from expressed sequence tags.

    Cuff JA, Birney E, Clamp ME and Barton GJ

    European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Cambridge, UK.

    Motivation: An automatic sequence searching method (ProtEST) is described which constructs multiple protein sequence alignments from protein sequences and translated expressed sequence tags (ESTs). ProtEST is more effective than a simple TBLASTN search of the query against the EST database, as the sequences are automatically clustered, assembled, made non-redundant, checked for sequence errors, translated into protein and then aligned and displayed.

    Results: A ProtEST search found a non-redundant, translated, error- and length-corrected EST sequence for > 58% of sequences when single sequences from 1407 Pfam-A seed alignments were used as the probe. The average family size of the resulting alignments of translated EST sequences contained > 10 sequences. In a cross-validated test of protein secondary structure prediction, alignments from the new procedure led to an improvement of 3.4% average Q3 prediction accuracy over single sequences.

    Availability: The ProtEST method is available as an Internet World Wide Web service http://barton.ebi.ac.uk/servers/protest.html+ ++ The Wise2 package for protein and genomic comparisons and the ProtESTWise script can be found at http://www.sanger.ac.uk/Software/Wise2

    Contact: geoff@ebi.ac.uk

    Bioinformatics (Oxford, England) 2000;16;2;111-6

  • Genomics - the new rock and roll?

    Dunham I

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridge, UK.

    The end of the beginning of the Human Genome Project was announced on 26 June when the working draft or first assembly was announced. Here, Ian Dunham who led the group at the Sanger Centre that produced the first complete sequence of a human chromosome reflects on how it felt to be with the genome project from the beginning.

    Trends in genetics : TIG 2000;16;10;456-61

  • The extent of linkage disequilibrium in four populations with distinct demographic histories.

    Dunning AM, Durocher F, Healey CS, Teare MD, McBride SE, Carlomagno F, Xu CF, Dawson E, Rhodes S, Ueda S, Lai E, Luben RN, Van Rensburg EJ, Mannermaa A, Kataja V, Rennart G, Dunham I, Purvis I, Easton D and Ponder BA

    CRC Department of Oncology, University of Cambridge, Cambridge CB1 8RN, United Kingdom. alisond@srl.cam.ac.uk

    The design and feasibility of whole-genome-association studies are critically dependent on the extent of linkage disequilibrium (LD) between markers. Although there has been extensive theoretical discussion of this, few empirical data exist. The authors have determined the extent of LD among 38 biallelic markers with minor allele frequencies >.1, since these are most comparable to the common disease-susceptibility polymorphisms that association studies aim to detect. The markers come from three chromosomal regions-1,335 kb on chromosome 13q12-13, 380 kb on chromosome 19q13.2, and 120 kb on chromosome 22q13.3-which have been extensively mapped. These markers were examined in approximately 1,600 individuals from four populations, all of European origin but with different demographic histories; Afrikaners, Ashkenazim, Finns, and East Anglian British. There are few differences, either in allele frequencies or in LD, among the populations studied. A similar inverse relationship was found between LD and distance in each genomic region and in each population. Mean D' is.68 for marker pairs <5 kb apart and is.24 for pairs separated by 10-20 kb, and the level of LD is not different from that seen in unlinked marker pairs separated by >500 kb. However, only 50% of marker pairs at distances <5 kb display sufficient LD (delta>.3) to be useful in association studies. Results of the present study, if representative of the whole genome, suggest that a whole-genome scan searching for common disease-susceptibility alleles would require markers spaced < or = 5 kb apart.

    American journal of human genetics 2000;67;6;1544-54

  • MHC-linked olfactory receptor loci exhibit polymorphism and contribute to extended HLA/OR-haplotypes.

    Ehlers A, Beck S, Forbes SA, Trowsdale J, Volz A, Younger R and Ziegler A

    Institut für Immungenetik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, 14050 Berlin, Germany.

    Clusters of olfactory receptor (OR) genes are found on most human chromosomes. They are one of the largest mammalian multigene families. Here, we report a systematic study of polymorphism of OR genes belonging to the largest fully sequenced OR cluster. The cluster contains 36 OR genes, of which two belong to the vomeronasal 1 (V1-OR) family. The cluster is divided into a major and a minor region at the telomeric end of the HLA complex on chromosome 6. These OR genes could be involved in MHC-related mate preferences. The polymorphism screen was carried out with 13 genes from the HLA-linked OR cluster and three genes from chromosomes 7, 17, and 19 as controls. Ten human cell lines, representing 18 different chromosome 6s, were analyzed. They were from various ethnic origins and exhibited different HLA haplotypes. All OR genes tested, including those not linked to the HLA complex, were polymorphic. These polymorphisms were dispersed along the coding region and resulted in up to seven alleles for a given OR gene. Three polymorphisms resulted either in stop codons (genes hs6M1-4P, hs6M1-17) or in a 16-bp deletion (gene hs6M1-19P), possibly leading to lack of ligand recognition by the respective receptors in the cell line donors. In total, 13 HLA-linked OR haplotypes could be defined. Therefore, allelic variation appears to be a general feature of human OR genes.

    Genome research 2000;10;12;1968-78

  • A novel type of RNase III family proteins in eukaryotes.

    Filippov V, Solovyev V, Filippova M and Gill SS

    Department of Entomology, Graduate Programs in Biochemistry and Molecular Biology, and Genetics, University of California, Riverside, CA, USA.

    The RNase III family of double-stranded RNA-specific endonucleases is characterized by the presence of a highly conserved 9 amino acid stretch in their catalytic center known as the RNase III signature motif. We isolated the drosha gene, a new member of this family in Drosophila melanogaster. Characterization of this gene revealed the presence of two RNase III signature motifs in its sequence that may indicate that it is capable of forming an active catalytic center as a monomer. The drosha protein also contains an 825 amino acid N-terminus with an unknown function. A search for the known homologues of the drosha protein revealed that it has a similarity to two adjacent annotated genes identified during C. elegans genome sequencing. Analysis of the genomic region of these genes by the Fgenesh program and sequencing of the EST cDNA clone derived from it revealed that this region encodes only one gene. This newly identified gene in nematode genome shares a high similarity to Drosophila drosha throughout its entire protein sequence. A potential drosha homologue is also found among the deposited human cDNA sequences. A comparison of these drosha proteins to other members of the RNase III family indicates that they form a new group of proteins within this family.

    Funded by: NIAID NIH HHS: AI32572

    Gene 2000;245;1;213-21

  • Functional genomic analysis of C. elegans chromosome I by systematic RNA interference.

    Fraser AG, Kamath RS, Zipperlen P, Martinez-Campos M, Sohrmann M and Ahringer J

    Wellcome/CRC Institute, University of Cambridge, UK.

    Complete genomic sequence is known for two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and it will soon be known for humans. However, biological function has been assigned to only a small proportion of the predicted genes in any animal. Here we have used RNA-mediated interference (RNAi) to target nearly 90% of predicted genes on C. elegans chromosome I by feeding worms with bacteria that express double-stranded RNA. We have assigned function to 13.9% of the genes analysed, increasing the number of sequenced genes with known phenotypes on chromosome I from 70 to 378. Although most genes with sterile or embryonic lethal RNAi phenotypes are involved in basal cell metabolism, many genes giving post-embryonic phenotypes have conserved sequences but unknown function. In addition, conserved genes are significantly more likely to have an RNAi phenotype than are genes with no conservation. We have constructed a reusable library of bacterial clones that will permit unlimited RNAi screens in the future; this should help develop a more complete view of the relationships between the genome, gene function and the environment.

    Funded by: Wellcome Trust: 054523

    Nature 2000;408;6810;325-30

  • High throughput gene expression screening: its emerging role in drug discovery.

    Freeman T

    Gene Expression Group, The Sanger Centre, Hinxton Hall, Cambridge, UK.

    The genetic makeup and the environment influences the health and welfare of an individual. At both the tissue and cellular level, physiological function can be correlated with the transcription of genes, whose protein products contribute and influence the activity of biological systems. In order to understand these processes, it is therefore essential to determine the temporal and spatial patterns of gene expression, and, with particular relevance to drug discovery, define changes that occur during development of disease or treatment with therapeutic agents.

    Medicinal research reviews 2000;20;3;197-202

  • Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III.

    Gönczy P, Echeverri C, Oegema K, Coulson A, Jones SJ, Copley RR, Duperon J, Oegema J, Brehm M, Cassin E, Hannak E, Kirkham M, Pichler S, Flohrs K, Goessen A, Leidel S, Alleaume AM, Martin C, Ozlü N, Bork P and Hyman AA

    Max-Planck-Institute for Cell Biology and Genetics, Dresden, Germany. Pierre.Gonczy@isrec.unil.ch

    Genome sequencing projects generate a wealth of information; however, the ultimate goal of such projects is to accelerate the identification of the biological function of genes. This creates a need for comprehensive studies to fill the gap between sequence and function. Here we report the results of a functional genomic screen to identify genes required for cell division in Caenorhabditis elegans. We inhibited the expression of approximately 96% of the approximately 2,300 predicted open reading frames on chromosome III using RNA-mediated interference (RNAi). By using an in vivo time-lapse differential interference contrast microscopy assay, we identified 133 genes (approximately 6%) necessary for distinct cellular processes in early embryos. Our results indicate that these genes represent most of the genes on chromosome III that are required for proper cell division in C. elegans embryos. The complete data set, including sample time-lapse recordings, has been deposited in an open access database. We found that approximately 47% of the genes associated with a differential interference contrast phenotype have clear orthologues in other eukaryotes, indicating that this screen provides putative gene functions for other species as well.

    Nature 2000;408;6810;331-6

  • Analysis of vertebrate SCL loci identifies conserved enhancers.

    Göttgens B, Barton LM, Gilbert JG, Bench AJ, Sanchez MJ, Bahn S, Mistry S, Grafham D, McMurray A, Vaudin M, Amaya E, Bentley DR, Green AR and Sinclair AM

    University of Cambridge, Department of Haematology, MRC Centre, Hills Road, Cambridge CB2 2QH, UK.

    The SCL gene encodes a highly conserved bHLH transcription factor with a pivotal role in hemopoiesis and vasculogenesis. We have sequenced and analyzed 320 kb of genomic DNA composing the SCL loci from human, mouse, and chicken. Long-range sequence comparisons demonstrated multiple peaks of human/mouse homology, a subset of which corresponded precisely with known SCL enhancers. Comparisons between mammalian and chicken sequences identified some, but not all, SCL enhancers. Moreover, one peak of human/mouse homology (+23 region), which did not correspond to a known enhancer, showed significant homology to an analogous region of the chicken SCL locus. A transgenic Xenopus reporter assay was established and demonstrated that the +23 region contained a new neural enhancer. This combination of long-range comparative sequence analysis with a high-throughput transgenic bioassay provides a powerful strategy for identifying and characterizing developmentally important enhancers.

    Nature biotechnology 2000;18;2;181-6

  • An integrated map of human 6q22.3-q24 including a 3-Mb high-resolution BAC/PAC contig encompassing a QTL for fetal hemoglobin.

    Game L, Close J, Stephens P, Mitchell J, Best S, Rochette J, Louis-dit-Sully C, Riley J, See CG, Sanseau P, Kearney L, Bethel G, Humphray S, Dunham I, Mungall A and Thein SL

    MRC Molecular Haematology Unit, Institute of Molecular Medicine, John Radcliffe Hospital, Headley Way, Headington, OX3 9DS, United Kingdom.

    Genetic studies have previously assigned a quantitative trait locus (QTL) for hemoglobin F and F cells to a region of approximately 4 Mb between the markers D6S408 and D6S292 on chromosome 6q23. An initial yeast artificial chromosome contig of 13 clones spanning this region was generated. Further linkage analysis of an extended kindred refined the candidate interval to 1-2 cM, and key recombination events now place the QTL within a region of <800 kb. We describe a high-resolution bacterial clone contig spanning 3 Mb covering this critical region. The map consists of 223 bacterial artificial chromosome (BAC) and 100 P1 artificial chromosome (PAC) clones ordered by sequence-tagged site (STS) content and restriction fragment fingerprinting with a minimum tiling path of 22 BACs and 1 PAC. A total of 194 STSs map to this interval of 3 Mb, giving an average marker resolution of approximately one per 15 kb. About half of the markers were novel and were isolated in the present study, including three CA repeats and 13 single nucleotide polymorphisms. Altogether 24 expressed sequence tags, 6 of which are unique genes, have been mapped to the contig.

    Genomics 2000;64;3;264-76

  • An imprinted locus associated with transient neonatal diabetes mellitus.

    Gardner RJ, Mackay DJ, Mungall AJ, Polychronakos C, Siebert R, Shield JP, Temple IK and Robinson DO

    Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury, Wiltshire SP2 8BJ, UK.

    Recently, we reported the localization of a gene for transient neonatal diabetes mellitus (TNDM), a rare form of childhood diabetes, to an approximately 5.4 Mb region of chromosome 6q24. We have also shown that TNDM is associated with both paternal uniparental disomy (UPD) of chromosome 6 and paternal duplications of the critical region. The sequencing of P1-derived artificial chromosome clones from within the region of interest has allowed us to further localize the gene and to investigate the methylation status of the region. The gene is now known to reside in a 300-400 kb region of 6q24 which contains several CpG islands. At one island we have demonstrated differential DNA methylation between patients with paternal UPD of chromosome 6 and normal controls. In addition, two patients with TNDM, in whom neither paternal UPD of chromosome 6 nor duplication of 6q24 have been found, show a DNA methylation pattern identical to that of patients with paternal UPD of chromosome 6. Control individuals show a hemizygous methylation pattern. These results show that TNDM can be associated with a methylation change and identify a novel methylation imprint on chromosome 6 associated with TNDM.

    Human molecular genetics 2000;9;4;589-96

  • Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10q.

    Guy J, Spalluto C, McMurray A, Hearn T, Crosier M, Viggiano L, Miolla V, Archidiacono N, Rocchi M, Scott C, Lee PA, Sulston J, Rogers J, Bentley D and Jackson MS

    Human Genetics Unit, School of Biochemistry and Genetics, University of Newcastle upon Tyne, UK.

    The organization of centromeric heterochromatin has been established in a number of eucaryotes but remains poorly defined in human. Here we present 1025 kb of contiguous human genomic sequence which links pericentromeric satellites to the RET proto-oncogene in 10q11.2 and is presumed to span the transition from centric heterochromatin to euchromatin on this chromosome arm. Two distinct domains can be defined within the sequence. The proximal approximately 240 kb consists of arrays of satellites and other tandem repeats separated by tracts of complex sequence which have evolved by pericentromeric-directed duplication. Analysis of 32 human paralogues of these sequences indicates that most terminate at or within repeat arrays, implicating these repeats in the interchromosomal duplication process. Corroborative PCR-based analyses establish a genome-wide correlation between the distribution of these paralogues and the distribution of satellite families present in 10q11. In contrast, the distal approximately 780 kb contains few tandem repeats and is largely chromosome specific. However, a minimum of three independent intrachromosomal duplication events have resulted in >370 kb of this sequence sharing >90% identity with sequences on 10p. Using computer-based analyses and RT-PCR we confirm the presence of three genes within the sequence, ZNF11/33B, KIAA0187 and RET, in addition to five transcripts of unknown structure. All of these transcribed sequences map distal to the satellite arrays. The boundary between satellite-rich interchromosomally duplicated DNA and chromosome-specific DNA therefore appears to define a transition from pericentromeric heterochromatin to euchromatin on the long arm of this chromosome.

    Funded by: Telethon: E.0672, E.0962

    Human molecular genetics 2000;9;13;2029-42

  • A novel bacterial pathogen, Microbacterium nematophilum, induces morphological change in the nematode C. elegans.

    Hodgkin J, Kuwabara PE and Corneliussen B

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK. jah@bioch.ox.ac.uk

    The Dar (deformed anal region) phenotype, characterized by a distinctive swollen tail, was first detected in a variant strain of Caenorhabditis elegans which appeared spontaneously in 1986 during routine genetic crosses [1,2]. Dar isolates were initially analysed as morphological mutants, but we report here that two independent isolates carry an unusual bacterial infection different from those previously described [3], which is the cause of the Dar phenotype. The infectious agent is a new species of coryneform bacterium, named Microbacterium nematophilum n. sp., which fortuitously contaminated cultures of C. elegans. The bacteria adhere to the rectal and post-anal cuticle of susceptible nematodes, and induce substantial local swelling of the underlying hypodermal tissue. The swelling leads to constipation and slowed growth in the infected worms, but the infection is otherwise non-lethal. Certain mutants of C. elegans with altered surface antigenicity are resistant to infection. The induced deformation appears to be part of a survival strategy for the bacteria, as C. elegans are potentially their predators.

    Current biology : CB 2000;10;24;1615-8

  • Entering the post-genomic era of malaria research.

    Horrocks P, Bowman S, Kyes S, Waters AP and Craig A

    Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, England.

    The sequencing of the genome of Plasmodium falciparum promises to revolutionize the way in which malaria research will be carried out. Beyond simple gene discovery, the genome sequence will facilitate the comprehensive determination of the parasite's gene expression during its developmental phases, pathology, and in response to environmental variables, such as drug treatment and host genetic background. This article reviews the current status of the P. falciparum genome sequencing project and the unique insights it has generated. We also summarize the application of bioinformatics and analytical tools that have been developed for functional genomics. The aim of these activities is the rational, information-based identification of new therapeutic strategies and targets, based on a thorough insight into the biology of Plasmodium spp.

    Bulletin of the World Health Organization 2000;78;12;1424-37

  • Open annotation offers a democratic solution to genome sequencing.

    Hubbard T and Birney E

    Nature 2000;403;6772;825

  • Hurrah for genome projects!

    Ivens A

    Pathogen Sequencing Unit, Sanger Centre, Hinxton, UK. alicat@sanger.ac.uk

    Parasitology today (Personal ed.) 2000;16;8;317-20

  • Functional Websites for Parasite Genome Projects.

    Ivens I, Aslett I and Wood I

    Parasitology today (Personal ed.) 2000;16;3;93-94

  • The major and a minor class II beta-chain (B-LB ) gene flank the Tapasin gene in the B-F /B-L region of the chicken major histocompatibility complex.

    Jacob JP, Milne S, Beck S and Kaufman J

    Institute of Animal Health, Compton, Berkshire, RG20 7NN, UK.

    We have identified the major histocompatibility complex class II beta-chain (B-LB) genes present in the B-F/B-L region of the B complex of nine well-characterized lines of chickens and have cleared up much of the confusion concerning numbers and location of B-LB genes in this region. By amplifying DNA sequences between adjacent genes, we found two B-LB genes that lie on either side of Tapasin. The dominantly expressed 'major' B-LB gene in all haplotypes lies between Tapasin and RING-3, and belongs to the B-LBII family of class II beta-chain genes. The poorly expressed 'minor' B-LB gene in all haplotypes lies between B-lec1 and Tapasin, and belongs either to the B-LBII family or to the previously unmapped B-LBVI family of class II beta-chain genes. The data suggest that the B-LBII and B-LBVI genes are two lineages of B-LB genes and we propose that they all be termed B-LB genes. The location of a third B-LB gene in the B12 haplotype (and possibly other haplotypes as well) has yet to be determined. The structural organization and expression of the class II beta-chain genes in the B-F/B-L region is similar to that of chicken class I (B-F) genes, one functional result of which is differential resistance to disease and response to vaccines.

    Immunogenetics 2000;51;2;138-47

  • Alfresco--a workbench for comparative genomic sequence analysis.

    Jareborg N and Durbin R

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom. niclas.jareborg@cgr.ki.se

    Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented.

    Genome research 2000;10;8;1148-57

  • RNAi--prospects for a general technique for determining gene function.

    Kuwabara PE and Coulson A

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK CB10 1SA. pek@sanger.ac.uk

    Gene discovery programs centred around expressed sequence tag (EST) and genome sequencing projects have predictably led to an exponential surge in the number of parasite gene sequences deposited in public databases. To take advantage of this wealth of sequence information, it is essential to develop rapid methods for elucidating the biological function or mode of action of individual genes. Here, Patricia Kuwabara and Alan Coulson discuss the virtues of a powerful epigenetic gene disruption technique, RNA-mediated interference (RNAi), which was originally developed for the nematode Caenorhabditis elegans. It is anticipated that this technique will not only provide insights into gene function, but also help investigators to mine the genome for candidate drug intervention or vaccine development targets, some of which may not be readily apparent on the basis of sequence information alone.

    Parasitology today (Personal ed.) 2000;16;8;347-9

  • A C. elegans patched gene, ptc-1, functions in germ-line cytokinesis.

    Kuwabara PE, Lee MH, Schedl T and Jefferis GS

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. pek@sanger.ac.uk.

    Patched (Ptc), initially identified in Drosophila, defines a class of multipass membrane proteins that control cell fate and cell proliferation. Biochemical studies in vertebrates indicate that the membrane proteins Ptc and Smoothened (Smo) form a receptor complex that binds Hedgehog (Hh) morphogens. Smo transduces the Hh signal to downstream effectors. The Caenorhabditis elegans genome encodes two Ptc homologs and one related pseudogene but does not encode obvious Hh or Smo homologs. We have analyzed ptc-1 by RNAi and mutational deletion and find that it is an essential gene, although the absence of ptc-1 has no detectable effect on body patterning or proliferation. Therefore, the C. elegans ptc-1 gene is functional despite the lack of Hh and Smo homologs. We find that the activity and expression of ptc-1 is essentially confined to the germ line and its progenitors. ptc-1 null mutants are sterile with multinucleate germ cells arising from a probable cytokinesis defect. We have also identified a surprisingly large family of PTC-related proteins containing sterol-sensing domains, including homologs of Drosophila dispatched, in C. elegans and other phyla. These results suggest that the PTC superfamily has multiple functions in animal development.

    Funded by: NICHD NIH HHS: R01 HD25614

    Genes & development 2000;14;15;1933-44

  • SCOP: a structural classification of proteins database.

    Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG and Chothia C

    MRC Laboratory of Molecular Biology, Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK. loredana@mrc-lmb.cam.ac.uk

    The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and distant evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database so far. The sequences of proteins in SCOP provide the basis of the ASTRAL sequence libraries that can be used as a source of data to calibrate sequence search algorithms and for the generation of statistics on, or selections of, protein structures. Links can be made from SCOP to PDB-ISL: a library containing sequences homologous to proteins of known structure. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.mrc-lmb.cam.ac.uk/scop/

    Nucleic acids research 2000;28;1;257-9

  • Sequence analysis of two cosmids from Schizosaccharomyces pombe chromosome III.

    Lucas M, Gwillam R, Lepingle A, Lyne M, Rajandream MA, Rochet M, Wood V and Gaillardin C

    Institut National Agronomique, Laboratoire de Génétique Moléculaire et Cellulaire, INRA, CNRS, Centre de Biotechnologies Agro-Industrielles, 78850 Thiverval-Grignon, France. claude@platon.grignon.inra.fr

    We report the complete sequence of two cosmids, SPCC895 (38457 bp insert, EMBL Accession No. AL035247) and SPCC1322 (42068 bp insert, EMBL Accession No. AL035259), localized on chromosome III of the Schizosaccharomyces pombe genome. Fourteen Coding DNA sequences (CDSs) were identified in SPCC895 and 17 in SPCC1322. Two known genes were found in each cosmid: map2 and gms1 on SPCC895, encoding the mating type P-factor precursor and an UDP-galactose transporter, respectively, and bub1 and ade6 in SPCC1322, encoding a protein kinase and a phosphoribosylaminoimidazole carboxylase, respectively. The fission yeast K RNA gene has been localized to SPCC895. Three ribosomal proteins have been predicted among these two cosmids. Nine CDSs similar to known proteins were found on SPCC895, and seven on SPCC1322. They include putative genes for an uridylate kinase, a proteasome catalytic component, an ion transporter, a checkpoint protein, a translation initiation protein, a SNARE complex protein, a protein involved in cytoskeletal organization, a spindle pole body-associating protein, pre-mRNA splicing factor RNA helicase, a 3'-5' exonuclease for RNA 3' ss-tail, an UTP-glucose-1-phosphate uridylyltransferase, a leukotriene A(4) hydrolase, a member of the RanBP7-importin beta-Cse1p superfamily, a Ca(++)-calmodulin-dependent serine/threonine protein kinase and a prohibitin antiproliferative protein. One CDS is predicted to be an integral membrane protein. One CDS from SPCC895 is similar to a CDS of unknown function from Saccharomyces cerevisiae and three from SPCC1322 are similar to CDSs of unknown function from Candida albicans, S. cerevisiae and Sz. pombe, respectively. Finally, one CDS of SPCC895 and three of SPCC1322 correspond to orphan genes.

    Yeast (Chichester, England) 2000;16;16;1519-26

  • Sequence analysis of two cosmids from the right arm of the Schizosaccharomyces pombe chromosome II.

    Lucas M, Lyne M, Lepingle A, Rochet M and Gaillardin C

    Institut National Agronomique, Laboratoire de Génétique Moléculaire et Cellulaire, INRA, CNRS, Centre de Biotechnologies Agro-Industrielles, 78850 Thiverval-Grignon, France. mlucas@platon.grignon.inra.fr

    We report the complete sequence of two cosmids, SPBC19C7 (34815 bp insert, Accession No. AL023859) and SPBC15D4 (33203 bp insert, Accession No. AL031349), localized on chromosome II of the S. pombe genome. Twelve open reading frames (ORFs) were identified in SPBC19C7 and 16 in SPBC5D4. Two known genes were found on each cosmid: cyr1 and uve1 on SPBC19C7, encoding adenylate cyclase and a UV-endonuclease, respectively, and gpt and pho2 on SPBC15D4, encoding an N-acetylglucosamine-1-phosphate transferase and a4-nitrophenylphosphatase, respectively. Five ORFs similar to known proteins were found on SPBC19C7, and six on SPBC15D4. They include putative genes for a ubiquitin protein ligase, a prolyl-tRNA synthetase, a tRNA splicing endonuclease, a voltage-gated chloride channel, a mannosyl transferase, a kinesin-like protein, a histone transcriptional regulator, an N-acetyltransferase, a cystathionine gamma-synthase and a TFIID subunit. Two ORF products of SPBC15D4 do not have clear homologues: one encodes a putative transcriptional regulator with a binuclear zinc domain and the other a protein with six transmembrane domains. Two ORFs from SPBC15D4 are similar to unknown ORFs, one from Saccharomyces cerevisiae and the other from Caenorhabditis elegans. Finally, two ORFs of SPBC19C7 and six of SPBC15D4 correspond to orphan genes. The frequent occurrence of introns and the short and degenerated intron-exon boundaries consensus sequences significantly complicated ORF predictions. Two potential ORF-free regions spanning several kb were predicted, and a clustering of ORFs transcribed in the same orientation was observed.

    Yeast (Chichester, England) 2000;16;4;299-306

  • Mechanism of spreading of the highly related neurofibromatosis type 1 (NF1) pseudogenes on chromosomes 2, 14 and 22.

    Luijten M, Wang Y, Smith BT, Westerveld A, Smink LJ, Dunham I, Roe BA and Hulsebos TJ

    Department of Human Genetics, Academic Medical Center, University of Amsterdam, The Netherlands.

    Neurofibromatosis type 1 (NF1) is a frequent hereditary disorder that involves tissues derived from the embryonic neural crest. Besides the functional gene on chromosome arm 17q, NF1-related sequences (pseudogenes) are present on a number of chromosomes including 2, 12, 14, 15, 18, 21, and 22. We elucidated the complete nucleotide sequence of the NF1 pseudogene on chromosome 22. Only the middle part of the functional gene but not exons 21-27a, encoding the functionally important GAP-related domain of the NF1 protein, is presented in this pseudogene. In addition to the two known NF1 pseudogenes on chromosome 14 we identified two novel variants. A phylogenetic tree was constructed, from which we concluded that the NF1 pseudogenes on chromosomes 2, 14, and 22 are closely related to each other. Clones containing one of these pseudogenes cross-hybridised with the other pseudogenes in this subset, but did not reveal any in situ hybridisation with the functional NF1 gene or with NF1 pseudogenes on other chromosomes. This suggests that their hybridisation specificity is mainly determined by homologous sequences flanking the pseudogenes. Strong support for this concept was obtained by sequence analysis of the flanking regions, which revealed more than 95% homology. We hypothesise that during evolution this subset of NF1 pseudogenes initially arose by duplication and transposition of the middle part of the functional NF1 gene to chromosome 2. Subsequently, a much larger fragment, including flanking sequences, was duplicated and gave rise to the current NF1 pseudogene copies on chromosomes 14 and 22.

    Funded by: NHGRI NIH HHS: HG00313

    European journal of human genetics : EJHG 2000;8;3;209-14

  • Direct protein-protein interaction between the intracellular domain of TRA-2 and the transcription factor TRA-1A modulates feminizing activity in C. elegans.

    Lum DH, Kuwabara PE, Zarkower D and Spence AM

    Department of Molecular and Medical Genetics, University of Toronto, Toronto, M5S 1A8, Canada.

    In the nematode Caenorhabditis elegans, the zinc finger transcriptional regulator TRA-1A directs XX somatic cells to adopt female fates. The membrane protein TRA-2A indirectly activates TRA-1A by binding and inhibiting a masculinizing protein, FEM-3. Here we report that a part of the intracellular domain of TRA-2A, distinct from the FEM-3 binding region, directly binds TRA-1A. Overproduction of this TRA-1A-binding region has tra-1-dependent feminizing activity in somatic tissues, indicating that the interaction enhances TRA-1A activity. Consistent with this hypothesis, we show that tra-2(mx) mutations, which weakly masculinize somatic tissues, disrupt the TRA-2/TRA-1A interaction. Paradoxically, tra-2(mx) mutations feminize the XX germ line, as do tra-1 mutations mapping to the TRA-2 binding domain. We propose that these mutations render tra-2 insensitive to a negative regulator in the XX germ line, and we speculate that this regulator targets the TRA-2/TRA-1 complex. The intracellular domain of TRA-2A is likely to be produced as a soluble protein in vivo through proteolytic cleavage of TRA-2A or through translation of an XX germ line-specific mRNA. We further show that tagged derivatives of the intracellular domain of TRA-2 localize to the nucleus, supporting the hypothesis that this domain is capable of modulating TRA-1A activity in a manner reminiscent of Notch and Su(H).

    Genes & development 2000;14;24;3153-65

  • Cancer predisposition caused by elevated mitotic recombination in Bloom mice.

    Luo G, Santoro IM, McDaniel LD, Nishijima I, Mills M, Youssoufian H, Vogel H, Schultz RA and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.

    Bloom syndrome is a disorder associated with genomic instability that causes affected people to be prone to cancer. Bloom cell lines show increased sister chromatid exchange, yet are proficient in the repair of various DNA lesions. The underlying cause of this disease are mutations in a gene encoding a RECQ DNA helicase. Using embryonic stem cell technology, we have generated viable Bloom mice that are prone to a wide variety of cancers. Cell lines from these mice show elevations in the rates of mitotic recombination. We demonstrate that the increased rate of loss of heterozygosity (LOH) resulting from mitotic recombination in vivo constitutes the underlying mechanism causing tumour susceptibility in these mice.

    Nature genetics 2000;26;4;424-9

  • The PA domain: a protease-associated domain.

    Mahon P and Bateman A

    Department of Biochemistry, University of Cambridge, United Kingdom.

    We have identified a similarity between the apical domain of the human transferrin receptor and several other protein families. This domain is found associated with two different families of peptidases. Therefore, we term it the PA domain for protease-associated domain. The PA domain is found inserted within a loop of the peptidase domain of family M8/M33 zinc peptidases. The PA domain is also found in a vacuolar sorting receptor and a ring finger protein of unknown function that may be a cell surface receptor. The PA domain may mediate substrate determination of peptidases or form protein-protein interactions.

    Protein science : a publication of the Protein Society 2000;9;10;1930-4

  • Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three salmonella enterica serovars, Typhimurium, Typhi and Paratyphi.

    McClelland M, Florea L, Sanderson K, Clifton SW, Parkhill J, Churcher C, Dougan G, Wilson RK and Miller W

    Sidney Kimmel Cancer Center, 10835 Altman Row, San Diego, CA 92121, USA.

    The Escherichia coli K-12 genome (ECO) was compared with the sampled genomes of the sibling species Salmonella enterica serovars Typhimurium, Typhi and Paratyphi A (collectively referred to as SAL) and the genome of the close outgroup Klebsiella pneumoniae (KPN). There are at least 160 locations where sequences of >400 bp are absent from ECO but present in the genomes of all three SAL and 394 locations where sequences are present in ECO but close homologs are absent in all SAL genomes. The 394 sequences in ECO that do not occur in SAL contain 1350 (30.6%) of the 4405 ECO genes. Of these, 1165 are missing from both SAL and KPN. Most of the 1165 genes are concentrated within 28 regions of 10-40 kb, which consist almost exclusively of such genes. Among these regions were six that included previously identified cryptic phage. A hypothetical ancestral state of genomic regions that differ between ECO and SAL can be inferred in some cases by reference to the genome structure in KPN and the more distant relative Yersinia pestis. However, many changes between ECO and SAL are concentrated in regions where all four genera have a different structure. The rate of gene insertion and deletion is sufficiently high in these regions that the ancestral state of the ECO/SAL lineage cannot be inferred from the present data. The sequencing of other closely related genomes, such as S.bongori or Citrobacter, may help in this regard.

    Funded by: NIAID NIH HHS: AI 34829-09; NLM NIH HHS: LM05110

    Nucleic acids research 2000;28;24;4974-86

  • Cloning and characterization of two overlapping genes in a subregion at 6q21 involved in replicative senescence and schizophrenia.

    Morelli C, Magnanini C, Mungall AJ, Negrini M and Barbanti-Brodano G

    Department of Experimental and Diagnostic Medicine, Section of Microbiology and Interdepartment Center for Biotechnology, University of Ferrara, I-44100, Ferrara, Italy.

    Two new genes were cloned from region 6q21 and characterized. One gene, C6orf4-6, expresses three mRNA isoforms diverging at the 5' and 3' ends, and encodes two protein isoforms that differ by nine amino acids at their amino terminus. The second gene, C6UAS, is transcribed in the antisense orientation from the complementary strand of C6orf4-6. C6UAS overlaps the second exon of C6orf4, where the start codon of protein isoform 1 is located. C6UAS has no apparent ORF and most likely represents a structural RNA gene that is transcribed but not translated. This feature and the antisense polarity of transcription suggest that C6UAS could play a regulatory role on the expression of C6orf4, as indicated by a significant decrease of endogenous C6orf4 expression after transfection of C6UAS cDNA in human fibroblasts. Neither C6UAS nor C6orf4-6 genes show any homology with known human genes. The two genes were cloned from a subregion at 6q21 containing a replicative senescence gene, a tumor suppressor gene and a gene involved in hereditary schizophrenia. In addition, the common fragile site FRA6F was mapped in the same region. Cloning and characterization of C6orf4-6 and C6UAS may help to clarify the structure and the functional role of this important region.

    Gene 2000;252;1-2;217-25

  • An SNP map of human chromosome 22.

    Mullikin JC, Hunt SE, Cole CG, Mortimore BJ, Rice CM, Burton J, Matthews LH, Pavitt R, Plumb RW, Sims SK, Ainscough RM, Attwood J, Bailey JM, Barlow K, Bruskiewich RM, Butcher PN, Carter NP, Chen Y, Clee CM, Coggill PC, Davies J, Davies RM, Dawson E, Francis MD, Joy AA, Lamble RG, Langford CF, Macarthy J, Mall V, Moreland A, Overton-Larty EK, Ross MT, Smith LC, Steward CA, Sulston JE, Tinsley EJ, Turney KJ, Willey DL, Wilson GD, McMurray AA, Dunham I, Rogers J and Bentley DR

    The Sanger Centre, Hinxton, Cambridge, UK.

    The human genome sequence will provide a reference for measuring DNA sequence variation in human populations. Sequence variants are responsible for the genetic component of individuality, including complex characteristics such as disease susceptibility and drug response. Most sequence variants are single nucleotide polymorphisms (SNPs), where two alternate bases occur at one position. Comparison of any two genomes reveals around 1 SNP per kilobase. A sufficiently dense map of SNPs would allow the detection of sequence variants responsible for particular characteristics on the basis that they are associated with a specific SNP allele. Here we have evaluated large-scale sequencing approaches to obtaining SNPs, and have constructed a map of 2,730 SNPs on human chromosome 22. Most of the SNPs are within 25 kilobases of a transcribed exon, and are valuable for association studies. We have scaled up the process, detecting over 65,000 SNPs in the genome as part of The SNP Consortium programme, which is on target to build a map of 1 SNP every 5 kilobases that is integrated with the human genome sequence and that is freely available in the public domain.

    Nature 2000;407;6803;516-20

  • Report of the Fourth International Chromosome 6 Workshop 1999. 10-12 June 1999. Cambridge, UK. Abstracts.

    Mungall AJ, Beck S, Cann HM, Dunham I, Trowsdale J and Ziegler A

    The Sanger Centre, Wellcome Trust Genome Campus, Cambridge, UK. ajm@anger.ac.uk

    Cytogenetics and cell genetics 2000;88;3-4;173-96

  • Hydrophobic protein that copurifies with human brain acetylcholinesterase: amino acid sequence, genomic organization, and chromosomal localization.

    Navaratnam DS, Fernando FS, Priddle JD, Giles K, Clegg SM, Pappin DJ, Craig I and Smith AD

    Department of Pharmacology, University of Oxford, England. navaratn@mail.med.upenn.edu

    The mechanism of attachment of acetylcholinesterase (AChE) to neuronal membranes in interneuronal synapses is poorly understood. We have isolated, sequenced, and cloned a hydrophobic protein that copurifies with AChE from human caudate nucleus and that we propose forms a part of a complex of membrane proteins attached to this enzyme. It is a short protein of 136 amino acids and has a molecular mass of 18 kDa. The sequence contains stretches of both hydrophobic and hydrophilic amino acids and two cysteine residues. Analysis of the genomic sequence reveals that the coding region is divided among five short exons. Fluorescence in situ hybridization localizes the gene to chromosome 6p21.32-p21.2. Northern blot analysis shows that this gene is widely expressed in the brain with an expression pattern that parallels that of AChE.

    Journal of neurochemistry 2000;74;5;2146-53

  • In defense of complete genomes.

    Parkhill J

    Nature biotechnology 2000;18;5;493-4

  • Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491.

    Parkhill J, Achtman M, James KD, Bentley SD, Churcher C, Klee SR, Morelli G, Basham D, Brown D, Chillingworth T, Davies RM, Davis P, Devlin K, Feltwell T, Hamlin N, Holroyd S, Jagels K, Leather S, Moule S, Mungall K, Quail MA, Rajandream MA, Rutherford KM, Simmonds M, Skelton J, Whitehead S, Spratt BG and Barrell BG

    The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. parkhill@sanger.ac.uk

    Neisseria meningitidis causes bacterial meningitis and is therefore responsible for considerable morbidity and mortality in both the developed and the developing world. Meningococci are opportunistic pathogens that colonize the nasopharynges and oropharynges of asymptomatic carriers. For reasons that are still mostly unknown, they occasionally gain access to the blood, and subsequently to the cerebrospinal fluid, to cause septicaemia and meningitis. N. meningitidis strains are divided into a number of serogroups on the basis of the immunochemistry of their capsular polysaccharides; serogroup A strains are responsible for major epidemics and pandemics of meningococcal disease, and therefore most of the morbidity and mortality associated with this disease. Here we have determined the complete genome sequence of a serogroup A strain of Neisseria meningitidis, Z2491. The sequence is 2,184,406 base pairs in length, with an overall G+C content of 51.8%, and contains 2,121 predicted coding sequences. The most notable feature of the genome is the presence of many hundreds of repetitive elements, ranging from short repeats, positioned either singly or in large multiple arrays, to insertion sequences and gene duplications of one kilobase or more. Many of these repeats appear to be involved in genome fluidity and antigenic variation in this important human pathogen.

    Nature 2000;404;6777;502-6

  • The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences.

    Parkhill J, Wren BW, Mungall K, Ketley JM, Churcher C, Basham D, Chillingworth T, Davies RM, Feltwell T, Holroyd S, Jagels K, Karlyshev AV, Moule S, Pallen MJ, Penn CW, Quail MA, Rajandream MA, Rutherford KM, van Vliet AH, Whitehead S and Barrell BG

    The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Campylobacter jejuni, from the delta-epsilon group of proteobacteria, is a microaerophilic, Gram-negative, flagellate, spiral bacterium-properties it shares with the related gastric pathogen Helicobacter pylori. It is the leading cause of bacterial food-borne diarrhoeal disease throughout the world. In addition, infection with C. jejuni is the most frequent antecedent to a form of neuromuscular paralysis known as Guillain-Barré syndrome. Here we report the genome sequence of C. jejuni NCTC11168. C. jejuni has a circular chromosome of 1,641,481 base pairs (30.6% G+C) which is predicted to encode 1,654 proteins and 54 stable RNA species. The genome is unusual in that there are virtually no insertion sequences or phage-associated sequences and very few repeat sequences. One of the most striking findings in the genome was the presence of hypervariable sequences. These short homopolymeric runs of nucleotides were commonly found in genes encoding the biosynthesis or modification of surface structures, or in closely linked genes of unknown function. The apparently high rate of variation of these homopolymeric tracts may be important in the survival strategy of C. jejuni.

    Nature 2000;403;6770;665-8

  • A browser for expression data.

    Pocock MR and Hubbard TJ

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. mrp@sanger.ac.uk

    Summary: We have written a fully extensible Java application for visually browsing expression data, and clusters of genes or experimental conditions calculated from that data. The application requires a run-time environment for Java2.

    Availability: http://www. sanger.ac.uk/Users/mrp/java/ExpressionBrowser

    Bioinformatics (Oxford, England) 2000;16;4;402-3

  • EMBOSS: the European Molecular Biology Open Software Suite.

    Rice P, Longden I and Bleasby A

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK CB10 1SA.

    Trends in genetics : TIG 2000;16;6;276-7

  • Correlating physiology with gene expression in striatal cholinergic neurones.

    Richardson PJ, Dixon AK, Lee K, Bell MI, Cox PJ, Williams R, Pinnock RD and Freeman TC

    Department of Pharmacology, University of Cambridge, Sanger Centre, England, UK. pjr1001@cam.ac.uk

    The expression of 34 transmitter-related genes has been examined in the cholinergic neurones of rat striatal brain slices, with the aim of correlating gene expression with functional activity. The mRNAs encoding types I, II/IIA, and III alpha subunits of the voltage-sensitive sodium channels were detected, suggesting the presence of these three types of sodium channel. Similarly, mRNAs encoding all four alpha-amino-3-hydroxy-5-methylisoxazole-4-propionate (AMPA)-type glutamate receptor subunits and the NR1 and NR2A, 2B, and 2D subunits of the NMDA-type glutamate receptors were detected, suggesting that various combinations of these subunits mediate the cellular response to synaptically released glutamate. Other mRNAs detected included the NK1 and NK3 tachykinin receptors, all four known adenosine receptors, and the GABA-synthesising enzyme glutamate decarboxylase. Subpopulations of these cholinergic neurones have been identified on the basis of the expression of the NK3 tachykinin receptor in 5% and the trkC neurotrophin receptor in 12% of the cells investigated.

    Journal of neurochemistry 2000;74;2;839-46

  • Rapid detection of DNA sequence variants by conformation-sensitive capillary electrophoresis.

    Rozycka M, Collins N, Stratton MR and Wooster R

    Section of Molecular Carcinogenesis, Section of Cancer Genetics, The Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, United Kingdom.

    The identification of novel sequence variants, which may be either disease-causing mutations or silent polymorphisms, in large numbers of samples is becoming the rate-limiting step in associating diseases with specific genes. This is particularly true in light of the imminent arrival of the complete reference sequence of the human genome. A number of techniques have been developed to analyze DNA samples for sequence variants rapidly. We describe a new technique, capillary-based conformation-sensitive gel electrophoresis (capillary CSGE) that transfers mutation detection from acrylamide gel to capillary electrophoresis. Capillary CSGE was able to detect 7/7 short insertion/deletions and 16/22 base substitutions in a series of random single-nucleotide polymorphisms and known variants in the lipoprotein lipase and BRCA2 genes. This technique has the potential to screen many megabases of DNA in a single day.

    Genomics 2000;70;1;34-40

  • Artemis: sequence visualization and annotation.

    Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA and Barrell B

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. kmr@sanger.ac.uk

    Summary: Artemis is a DNA sequence visualization and annotation tool that allows the results of any analysis or sets of analyses to be viewed in the context of the sequence and its six-frame translation. Artemis is especially useful in analysing the compact genomes of bacteria, archaea and lower eukaryotes, and will cope with sequences of any size from small genes to whole genomes. It is implemented in Java, and can be run on any suitable platform. Sequences and annotation can be read and written directly in EMBL, GenBank and GFF format. AVAILABITLTY: Artemis is available under the GNU General Public License from http://www.sanger.ac.uk/Software/Artemis

    Bioinformatics (Oxford, England) 2000;16;10;944-5

  • Ab initio gene finding in Drosophila genomic DNA.

    Salamov AA and Solovyev VV

    The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK.

    Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.

    Genome research 2000;10;4;516-22

  • A preliminary gene map for the Van der Woude syndrome critical region derived from 900 kb of genomic sequence at 1q32-q41.

    Schutte BC, Bjork BC, Coppage KB, Malik MI, Gregory SG, Scott DJ, Brentzell LM, Watanabe Y, Dixon MJ and Murray JC

    Department of Pediatrics, University of Iowa, Iowa City, Iowa 52242 USA.

    Van der Woude syndrome (VWS) is a common form of syndromic cleft lip and palate and accounts for approximately 2% of all cleft lip and palate cases. Distinguishing characteristics include cleft lip with or without cleft palate, isolated cleft palate, bilateral lip pits, hypodontia, normal intelligence, and an autosomal-dominant mode of transmission with a high degree of penetrance. Previously, the VWS locus was mapped to a 1.6-cM region in 1q32-q41 between D1S491 and D1S205, and a 4.4-Mb contig of YAC clones of this region was constructed. In the current investigation, gene-based and anonymous STSs were developed from the existing physical map and were then used to construct a contig of sequence-ready bacterial clones across the entire VWS critical region. All STSs and BAC clones were shared with the Sanger Centre, which developed a contig of PAC clones over the same region. A subset of 11 clones from both contigs was selected for high-throughput sequence analysis across the approximately 1.1-Mb region; all but two of these clones have been sequenced completely. Over 900 kb of genomic sequence, including the 350-kb VWS critical region, were analyzed and revealed novel polymorphisms, including an 8-kb deletion/insertion, and revealed 4 known genes, 11 novel genes, 9 putative genes, and 3 psuedogenes. The positional candidates LAMB3, G0S2, HIRF6, and HSD11 were excluded as the VWS gene by mutation analysis. A preliminary gene map for the VWS critical region is as follows: [see text] 41-TEL. The data provided here will help lead to the identification of the VWS gene, and this study provides a model for how laboratories that have a regional interest in the human genome can contribute to the sequencing efforts of the entire human genome.

    Funded by: NIDCR NIH HHS: P50-DE09170, P60-DE13076, R01-DE08559; ...

    Genome research 2000;10;1;81-94

  • Prevalence of small inversions in yeast gene order evolution.

    Seoighe C, Federspiel N, Jones T, Hansen N, Bivolarovic V, Surzycki R, Tamse R, Komp C, Huizar L, Davis RW, Scherer S, Tait E, Shaw DJ, Harris D, Murphy L, Oliver K, Taylor K, Rajandream MA, Barrell BG and Wolfe KH

    Department of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland; Stanford DNA Sequencing and Technology Center, 855 California Avenue, Palo Alto, CA 94304, USA.

    Gene order evolution in two eukaryotes was studied by comparing the Saccharomyces cerevisiae genome sequence to extensive new data from whole-genome shotgun and cosmid sequencing of Candida albicans. Gene order is substantially different between these two yeasts, with only 9% of gene pairs that are adjacent in one species being conserved as adjacent in the other. Inversion of small segments of DNA, less than 10 genes long, has been a major cause of rearrangement, which means that even where a pair of genes has been conserved as adjacent, the transcriptional orientations of the two genes relative to one another are often different. We estimate that about 1,100 single-gene inversions have occurred since the divergence between these species. Other genes that are adjacent in one species are in the same neighborhood in the other, but their precise arrangement has been disrupted, probably by multiple successive multigene inversions. We estimate that gene adjacencies have been broken as frequently by local rearrangements as by chromosomal translocations or long-distance transpositions. A bias toward small inversions has been suggested by other studies on animals and plants and may be general among eukaryotes.

    Proceedings of the National Academy of Sciences of the United States of America 2000;97;26;14433-7

  • LMNA, encoding lamin A/C, is mutated in partial lipodystrophy.

    Shackleton S, Lloyd DJ, Jackson SN, Evans R, Niermeijer MF, Singh BM, Schmidt H, Brabant G, Kumar S, Durrington PN, Gregory S, O'Rahilly S and Trembath RC

    Division of Medical Genetics, Departments of Medicine and Genetics, University of Leicester, Leicester, UK.

    The lipodystrophies are a group of disorders characterized by the absence or reduction of subcutaneous adipose tissue. Partial lipodystrophy (PLD; MIM 151660) is an inherited condition in which a regional (trunk and limbs) loss of fat occurs during the peri-pubertal phase. Additionally, variable degrees of resistance to insulin action, together with a hyperlipidaemic state, may occur and simulate the metabolic features commonly associated with predisposition to atherosclerotic disease. The PLD locus has been mapped to chromosome 1q with no evidence of genetic heterogeneity. We, and others, have refined the location to a 5.3-cM interval between markers D1S305 and D1S1600 (refs 5, 6). Through a positional cloning approach we have identified five different missense mutations in LMNA among ten kindreds and three individuals with PLD. The protein product of LMNA is lamin A/C, which is a component of the nuclear envelope. Heterozygous mutations in LMNA have recently been identified in kindreds with the variant form of muscular dystrophy (MD) known as autosomal dominant Emery-Dreifuss MD (EDMD-AD; ref. 7) and dilated cardiomyopathy and conduction-system disease (CMD1A). As LMNA is ubiquitously expressed, the finding of site-specific amino acid substitutions in PLD, EDMD-AD and CMD1A reveals distinct functional domains of the lamin A/C protein required for the maintenance and integrity of different cell types.

    Nature genetics 2000;24;2;153-6

  • Improved method for detecting differentially expressed genes using cDNA indexing.

    Shaw-Smith CJ, Coffey AJ, Huckle E, Durham J, Campbell EA, Freeman TC, Walters JR and Bentley DR

    Gastroenterology Section, Imperial College School of Medicine, Hammersmith Hospital, London, England, UK.

    In cDNA indexing, differentially expressed genes are identified by the display of specific, corresponding subsets of cDNA. Subdivision of the cDNA population is achieved by the sequence-specific ligation of adapters to the overhangs created by class IIS restriction enzymes. However, inadequate specificity of ligation leads to redundancy between different adapter subsets. We evaluate the incidence of mismatches between adapters and class IIS restriction fragments during ligation and describe a modified set of conditions that improves ligation specificity. The improved protocol reduces redundancy between amplified cDNA subsets, which leads to a lower number of bands per lane of the differential display gel, and therefore simplifies analysis. We confirm the validity of this revised protocol by identifying five differentially expressed genes in mouse duodenum and ileum.

    BioTechniques 2000;28;5;958-64

  • Characterisation of a novel murine intestinal serine protease, DISP.

    Shaw-Smith CJ, Coffey AJ, Leversha M, Freeman TC, Bentley DR and Walters JR

    Division of Medicine, Imperial College School of Medicine, Hammersmith Hospital, London, UK.

    A putative novel murine serine protease, DISP, was identified by cDNA indexing and shown to be expressed primarily in distal gut. FISH analysis showed it to be localised to mouse chromosome 17A3. A possible human homologue for DISP has been identified. DISP is a novel member of clan SA/family S1 of the serine proteases, at present of unknown function.

    Biochimica et biophysica acta 2000;1490;1-2;131-6

  • Large deletions at the t(9;22) breakpoint are common and may identify a poor-prognosis subgroup of patients with chronic myeloid leukemia.

    Sinclair PB, Nacheva EP, Leversha M, Telford N, Chang J, Reid A, Bench A, Champion K, Huntly B and Green AR

    University of Cambridge, Department of Hematology, MRC Centre, Cambridge, United Kingdom.

    The hallmark of chronic myeloid leukemia (CML) is the BCR-ABL fusion gene, which is usually formed as a result of the t(9;22) translocation. Patients with CML show considerable heterogeneity both in their presenting clinical features and in the time taken for evolution to blast crisis. In this study, metaphase fluorescence in situ hybridization showed that a substantial minority of patients with CML had large deletions adjacent to the translocation breakpoint on the derivative 9 chromosome, on the additional partner chromosome in variant translocations, or on both. The deletions spanned up to several megabases, had variable breakpoints, and could be detected by microsatellite polymerase chain reaction in unfractionated bone marrow and purified peripheral blood granulocytes. The deletions were likely to occur early and possibly at the time of the Philadelphia (Ph) chromosome translocation: deletions were detected at diagnosis in 11 patients, were found in all Ph-positive metaphases, and were more prevalent in patients with variant Ph chromosomes. Kaplan-Meier analysis showed a median survival time of 36 months in patients with a deletion; patients without a detectable deletion survived > 90 months. The survival-time difference was significant on log-rank analysis (P =. 006). Multivariate analysis demonstrated that the prognostic importance of deletion status was independent of age, sex, percentage of peripheral blood blasts, and platelet count. Our data therefore suggest that an apparently simple, balanced translocation may result not only in the generation of a dominantly acting fusion oncogene but also in the loss of one or more genes that influence disease progression. (Blood. 2000;95:738-743)

    Blood 2000;95;3;738-43

  • Contigs built with fingerprints, markers, and FPC V4.7.

    Soderlund C, Humphray S, Dunham A and French L

    Clemson University Genomic Institute, Clemson, South Carolina 29634-5808, USA. cari@cs.clemson.edu

    Contigs have been assembled, and over 2800 clones selected for sequencing for human chromosomes 9, 10 and 13. Using the FPC (FingerPrinted Contig) software, the contigs are assembled with markers and complete digest fingerprints, and the contigs are ordered and localised by a global framework. Publicly available resources have been used, such as, the 1998 International Gene Map for the framework and the GSC Human BAC fingerprint database for the majority of the fingerprints. Additional markers and fingerprints are generated in-house to supplement this data. To support the scale up of building maps, FPC V4.7 has been extended to use markers with the fingerprints for assembly of contigs, new clones and markers can be automatically added to existing contigs, and poorly assembled contigs are marked accordingly. To test the automatic assembly, a simulated complete digest of 110 Mb of concatenated human sequence was used to create datasets with varying coverage, length of clones, and types of error. When no error was introduced and a tolerance of 7 was used in assembly, the largest contig with no false positive overlaps has 9534 clones with 37 out-of-order clones, that is, the starting coordinates of adjacent clones are in the wrong order. This paper describes the new features in FPC, the scenario for building the maps of chromosomes 9, 10 and 13, and the results from the simulation.

    Genome research 2000;10;11;1772-87

  • Proteolysis in Caenorhabditis elegans sex determination: cleavage of TRA-2A by TRA-3.

    Sokol SB and Kuwabara PE

    Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB2 2QH, UK.

    The Caenorhabditis elegans tra-3 gene promotes female development in XX hermaphrodites and encodes an atypical calpain regulatory protease lacking calcium-binding EF hands. We report that despite the absence of EF hands, TRA-3 has calcium-dependent proteolytic activity and its proteolytic domain is essential for in vivo function. We show that the membrane protein TRA-2A, which promotes XX female development by repressing the masculinizing protein FEM-3, is a TRA-3 substrate. Cleavage of TRA-2A by TRA-3 generates a peptide predicted to have feminizing activity. These results indicate that proteolysis regulated by calcium may control some aspects of sexual cell fate in C. elegans.

    Genes & development 2000;14;8;901-6

  • BTL-II: a polymorphic locus with homology to the butyrophilin gene family, located at the border of the major histocompatibility complex class II and class III regions in human and mouse.

    Stammers M, Rowen L, Rhodes D, Trowsdale J and Beck S

    Sanger Centre, Wellcome Trust Genome Campus, Cambridge, UK.

    Comparison of human and mouse genomic sequence at the border of the major histocompatibility complex (MHC) class II and class III regions revealed a locus encoding six exons with homology to the butyrophilin gene family and the location of a previously described gene, testis-specific basic protein (TSBP). We named the new locus BTL-II, for butyrophilin-like MHC class II associated. The six discernable exons of the BTL-II locus encode a small hydrophobic amino acid sequence (which may be a signal peptide), two immunoglobulin domains, a small 7-amino acid, heptad repeat-like exon, and a further two immunoglobulin domains. In mouse, an additional butyrophilin-like gene (NG10) is situated adjacent to BTL-II. Expression studies of the BTL-II locus in mouse showed that it is expressed in a range of gut tissues. We demonstrate that like many other genes from the MHC, BTL-II is polymorphic in a selection of diverse HLA haplotypes. In the light of the newly discovered locus, we revisit and discuss the possible origin of the butyrophilin gene family.

    Immunogenetics 2000;51;4-5;373-82

  • Delta-like and gtl2 are reciprocally expressed, differentially methylated linked imprinted genes on mouse chromosome 12.

    Takada S, Tevendale M, Baker J, Georgiades P, Campbell E, Freeman T, Johnson MH, Paulsen M and Ferguson-Smith AC

    Department of Anatomy, University of Cambridge, Downing Street, CB2 3DY, Cambridge, UK.

    The distal portion of mouse chromosome 12 is imprinted. To date, however, Gtl2 is the only imprinted gene identified on chromosome 12. Gtl2 encodes multiple alternatively spliced transcripts with no apparent open reading frame. Using conceptuses with maternal or paternal uniparental disomy for chromosome 12 (UPD12), we found that Gtl2 is expressed from the maternal allele and methylated at the 5' end of the silent paternal allele. A reciprocally imprinted gene, Delta-like (Dlk), with homology to genes involved in the Notch signalling pathway was identified 80kb upstream of Gtl2. Dlk was expressed exclusively from the paternal allele in both the embryo and placenta, but the CpG-island promoter of Dlk was completely unmethylated on both parental alleles. Rather, a paternally methylated region was identified in the last exon of the active Dlk allele. The proximity, reciprocal imprinting and methylation in this domain are reminiscent of the co-ordinately regulated Igf2-H19 imprinted domain on mouse chromosome 7. Like H19 and Igf2, Gtl2 and Dlk were found to be co-expressed in the same tissues throughout development, though not after birth. These results have implications for the regulation, function and evolution of imprinted domains.

    Current biology : CB 2000;10;18;1135-8

  • A study of var gene transcription in vitro using universal var gene primers.

    Taylor HM, Kyes SA, Harris D, Kriek N and Newbold CI

    Institute of Molecular Medicine, Nuffield Department of Medicine, John Radcliffe Hospital, Headington, Oxford, UK. htaylor@nimr.mrc.ac.uk

    The polymorphic multigene family, var, encodes the variant antigen, Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1), present on the surface of erythrocytes infected with the human malaria parasite, P. falciparum. PfEMP1 has been implicated in the pathology of malaria through its ability to bind to host endothelial receptors and uninfected erythrocytes. Understanding the relationship between host pathology, immune response and parasite variation is crucial, but requires a method of reliably detecting and differentiating all possible var genes. Several primer pairs used to date are biased and limited in their detection capacity. Here we describe a set of PCR primers that amplify the majority of var genes in the laboratory isolates 3D7 and A4, and appear to work equally well on all isolates tested. We use these universal primers to examine the relationship between var gene transcription as assessed by reverse transcriptase-PCR (RT-PCR) with that measured by Northern analysis of parasite RNA. Phenotypically selected young parasites have multiple transcripts detected by RT-PCR, but the full-length transcript appears to be homogeneous. In addition, we demonstrate that the choice of primers used for RT-PCR is crucial in data interpretation.

    Molecular and biochemical parasitology 2000;105;1;13-23

  • Arrangement of the ILT gene cluster: a common null allele of the ILT6 gene results from a 6.7-kbp deletion.

    Torkar M, Haude A, Milne S, Beck S, Trowsdale J and Wilson MJ

    Department of Pathology, Division of Immunology, University of Cambridge, Cambridge, GB.

    The leukocyte receptor cluster (LRC) is a highly polymorphic region of human chromosome 19q13.4 that encompasses at least 24 members of the immunoglobulin superfamily (Ig-SF). The centromeric end of the LRC contains eight Ig-SF loci, namely LAIR1 and seven ILT genes. All ILT genes conform to prototypic ILT gene structures. ILT6 is the only member of the ILT family that lacks a transmembrane and cytoplasmic domain. Close examination of the ILT6 genomic sequence reveals high similarity of this locus with the organization of activating ILT genes. However, the ILT6 transcript runs through the putative splice site of exon 8 that encodes for an extracellular stalk region, leading to a premature in-frame stop codon. Downstream of exon 8 are three pseudo exons that are not included in any of the known ILT6 transcripts, but share high homology to the equivalent region in activating ILT loci, suggesting that these genes have evolved from a common ancestral sequence. Comparison of two haplotypes over this region revealed a remarkable polymorphism with respect to the ILT6 gene which lacks exons 1-7 in one allele, reminiscent of the presence/absence variation displayed by the closely related and genetically linked KIR loci. Detailed sequence analysis of the two LAIR/ILT clusters suggests that the two complexes may have evolved from an inverted duplication.

    European journal of immunology 2000;30;12;3655-62

  • Construction of a high-resolution 2.5-Mb transcript map of the human 6p21.2-6p21.3 region immediately centromeric of the major histocompatibility complex.

    Tripodis N, Palmer S, Phillips S, Milne S, Beck S and Ragoussis J

    Genomics Laboratory, Division of Medical and Molecular Genetics, Guy's Campus, GKT School of Medicine, King's College London SE1 9RT, UK.

    We have constructed a 2.5-Mb physical and transcription map that spans the human 6p21.2-6p21.3 region and includes the centromeric end of the MHC, using a combination of techniques. In total 88 transcription units including exons, cDNAs, and cDNA contigs were characterized and 60 were confidently positioned on the physical map. These include a number of genes encoding nuclear and splicing factors (Ndr kinase, HSU09564, HSRP20); cell cycle, DNA packaging, and apoptosis related [p21, HMGI(Y), BAK]; immune response (CSBP, SAPK4); transcription activators and zinc finger-containing genes (TEF-5, ZNF76); embryogenesis related (Csa-19); cell signaling (DIPP); structural (HSET), and other genes (TULP1, HSPRARD, DEF-6, EO6811, cyclophilin), as well as a number of RP genes and pseudogenes (RPS10, RPS12-like, RPL12-like, RPL35-like). Furthermore, several novel genes (a Br140-like, a G2S-like, a FBN2-like, a ZNF-like, and B1/KIAA0229) have been identified, as well as cDNAs and cDNA contigs. The detailed map of the gene content of this chromosomal segment provides a number of candidate genes, which may be involved in several biological processes that have been associated with this region, such as spermatogenesis, development, embryogenesis, and neoplasia. The data provide useful tools for synteny studies between mice and humans, for genome structure analysis, gene density comparisons, and studies of nucleotide composition, of different isochores and Giemsa light and Giemsa dark bands.

    Genome research 2000;10;4;454-72

  • Large-scale chromatin organization of the major histocompatibility complex and other regions of human chromosome 6 and its response to interferon in interphase nuclei.

    Volpi EV, Chevret E, Jones T, Vatcheva R, Williamson J, Beck S, Campbell RD, Goldsworthy M, Powis SH, Ragoussis J, Trowsdale J and Sheer D

    Human Cytogenetics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX, UK.

    The large-scale chromatin organization of the major histocompatibility complex and other regions of chromosome 6 was studied by three-dimensional image analysis in human cell types with major differences in transcriptional activity. Entire gene clusters were visualized by fluorescence in situ hybridization with multiple locus-specific probes. Individual genomic regions showed distinct configurations in relation to the chromosome 6 terrritory. Large chromatin loops containing several megabases of DNA were observed extending outwards from the surface of the domain defined by the specific chromosome 6 paint. The frequency with which a genomic region was observed on an external chromatin loop was cell type dependent and appeared to be related to the number of active genes in that region. Transcriptional up-regulation of genes in the major histocompatibility complex by interferon-gamma led to an increase in the frequency with which this large gene cluster was found on an external chromatin loop. Our data are consistent with an association between large-scale chromatin organization of specific genomic regions and their transcriptional status.

    Funded by: Cancer Research UK: A3585

    Journal of cell science 2000;113 ( Pt 9);1565-76

  • Plasticity in the organization and sequences of human KIR/ILT gene families.

    Wilson MJ, Torkar M, Haude A, Milne S, Jones T, Sheer D, Beck S and Trowsdale J

    Immunology Division, Department of Pathology, Tennis Court Road, Cambridge CB2 1QP, United Kingdom.

    The approximately 1-Mb leukocyte receptor complex at 19q13.4 is a key polymorphic immunoregion containing all of the natural killer-receptor KIR and related ILT genes. When the organization of the leukocyte receptor complex was compared from two haplotypes, the gene content in the KIR region varied dramatically, with framework loci flanking regions of widely variable gene content. The ILT genes were more stable in number except for ILT6, which was present only in one haplotype. Analysis of Alu repeats and comparison of KIR gene sequences, which are over 90% identical, are consistent with a recent origin. KIR genesis was followed by extensive duplication/deletion as well as intergenic sequence exchange, reminiscent of MHC class I genes, which provide KIR ligands.

    Funded by: Cancer Research UK: A3585

    Proceedings of the National Academy of Sciences of the United States of America 2000;97;9;4778-83

  • The chemokine TECK is expressed by thymic and intestinal epithelial cells and attracts double- and single-positive thymocytes expressing the TECK receptor CCR9.

    Wurbel MA, Philippe JM, Nguyen C, Victorero G, Freeman T, Wooding P, Miazek A, Mattei MG, Malissen M, Jordan BR, Malissen B, Carrier A and Naquet P

    Centre d'Immunologie INSERM-CNRS de Marseille-Luminy, Marseille, France.

    Chemokines are key regulators of migration in lymphoid tissues. In the thymus, maturing thymocytes move from the outer capsule to the inner medulla and thereby interact with different types of stromal cells that control their maturation and selection. In the process of searching for molecules specifically expressed at different stages of mouse thymic differentiation, we have characterized the cDNA coding for the thymus-expressed chemokine (TECK) and its receptor CCR9. The TECK receptor gene was isolated and shown to be localized on the mouse chromosome 9F1-F4. Thymic dendritic cells have been initially thought to be a prevalent source of TECK. In contrast, our results indicate that thymic epithelial cells constitute the predominant source of TECK. Consistent with the latter distribution, the TECK receptor is highly expressed by double-positive thymocytes, and TECK can chemoattract both double-positive and single-positive thymocytes. The TECK transcript is also abundantly expressed in the epithelial cells lining the small intestine. In conclusion, the interplay of TECK and its receptor CCR9 is likely to have a significant role in the recruitment of developing thymocytes to discrete compartments of the thymus.

    European journal of immunology 2000;30;1;262-71

  • Analysis of 114 kb of DNA sequence from fission yeast chromosome 2 immediately centromere-distal to his5.

    Xiang Z, Moore K, Wood V, Rajandream MA, Barrell BG, Skelton J, Churcher CM, Lyne MH, Devlin K, Gwilliam R, Rutherford KM and Aves SJ

    School of Biological Sciences, University of Exeter, Washington Singer Laboratories, Perry Road, Exeter EX4 4QG, UK.

    One hundred and fourteen kilobase pairs (kb) of contiguous genomic sequence have been determined immediately distal to the his5 genetic marker located about 0.9 Mb from the centromere on the long arm of Schizosaccharomyces pombe chromosome 2. The sequence is contained in overlapping cosmid clones c16H5, c12D12, c24C6 and c19G7, of which 20 kb are identical to previously reported sequence from clone c21H7. The remaining 93 781 bp of sequence contains 10 known genes (cdc14, cdm1, cps1, gpa1, msh2, pck2, rip1, rps30-2, sad1 and ubl1), 32 open reading frames (ORFs) capable of coding for proteins of at least 100 amino acid residues in length, one 5S rRNA gene, one tRNA(Pro) gene, one lone Tf1-type long terminal repeat (LTR) and one lone Tf2-type LTR. There is a density of one protein-coding gene per 2.2 kb and 22 of the 42 ORFs (52%) incorporate one or more introns. Twenty-one of the novel ORFs show sequence similarities which suggest functions of their products, including a cyclin C, a MADS box transcription factor, mad2-like protein, telomere binding protein, topoisomerase II-associated protein, ATP-dependent DEAH box RNA helicase, G10 protein, ubiquitin-activating e1-like enzyme, nucleoporin, prolyl-tRNA synthetase, peptidylprolyl isomerase, delta-1-pyrroline-5-carboxylate dehydrogenase, protein transport protein, coatomer epsilon, TCP-1 chaperonin, beta-subunit of 6-phosphofructokinase, aminodeoxychorismate lyase, a phosphate transport protein and a thioredoxin.

    Yeast (Chichester, England) 2000;16;15;1405-11

  • The mating-type region of Schizosaccharomyces pombe h(-S) 972: sequencing and analysis of 69 kb including the expressed mat1 locus.

    Xiang Z, Wood V, Rajandream MA, Barrell BG, Moore K, Hunt C and Aves SJ

    School of Biological Sciences, University of Exeter, Washington Singer Laboratories, Perry Road, Exeter EX4 4QG, UK.

    The sequence has been determined of 68 897 bp of genomic DNA including the expressed mat1 mating-type locus from Schizosaccharomyces pombe h(-S) strain 972. The DNA sequence, located on the long arm of fission yeast chromosome II and contained in two cosmid clones, was analysed to reveal one autonomously replicating sequence, two retrotransposon long terminal repeats (LTRs), one tRNA(Gly) gene and 33 open reading frames (ORFs), of which 15 contain introns. Nine of these ORFs code for previously described genes (trt1, rpl10, rps21, nif1, sui1 (psu1), matMi, matMc, let1 and rpa4), one of which (trt1) contains 15 introns, the highest number yet recorded in a gene of S. pombe. Of the remaining 24 ORFs, sequence similarity suggests that the function of 13 of the encoded proteins may be predicted and these include four mitochondrial proteins, two transport proteins, two signalling molecules, a component of serine palmitolytransferase, a homologue of 3-methyladenine DNA glycosylase, a multifunctional alcohol dehydrogenase, a killer toxin sensitivity factor and an acetyl transferase. Six deduced sequences appear to be related to proteins of unknown function in Saccharomyces cerevisiae or S. pombe and the remaining five are hypothetical proteins.

    Yeast (Chichester, England) 2000;16;11;1061-7

  • Polymorphisms in olfactory receptor genes: a cautionary note.

    Ziegler A, Ehlers A, Forbes S, Trowsdale J, Volz A, Younger R and Beck S

    Institut für Immungenetik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, Berlin, Germany. andreas.ziegler@charite.de

    The hundreds of human olfactory receptor (OR) genes are organized into clusters occurring on nearly every chromosome. Although their sequences are not always closely related, they share stretches of considerable similarity, both at the amino acid and nucleotide levels. We demonstrate here that an HLA complex-linked OR sequence, FAT11, for which recently a number of alleles have been claimed within the Hutterites, contains sequences derived from two closely related, linked OR genes, hs6M1-12 and hs6M1-16. Instead of indicating a difference between alleles of a given locus, two of the polymorphisms described for FAT11 (at amino acids 48 and 220 of the deduced protein sequence, respectively) may in fact reflect distinct sequences of hs6M1-12 and a further, closely related HLA-linked OR locus, hs6M1-13P. As a consequence, recombination rates in Hutterites in the region telomeric of HLA-G may have to be reconsidered.

    Human immunology 2000;61;12;1281-4

* quick link - http://q.sanger.ac.uk/x2ci2op5