Sanger Institute - Publications 2001

Number of papers published in 2001: 90

  • Allele loss on chromosome 1p36 in epithelial ovarian cancers.

    Alvarez AA, Lambers AR, Lancaster JM, Maxwell GL, Ali S, Gumbs C, Berchuck A and Futreal PA

    Department of Obstetrics and Gynecology/Division of Gynecologic Oncology, Duke University Medical Center, Durham, North Carolina, 27710, USA.

    Objectives: Prior studies have shown that allelic loss on chromosome 1p36 occurs frequently in ovarian as well as several other types of cancer. This suggests that inactivation of gene(s) in this region may play a role in the pathogenesis of these cancers. The aim of this study was to further delineate the region of loss on chromosome 1p36 in ovarian cancers and to identify associated patient or tumor characteristics.

    Methods: Paired normal/cancer DNA samples from 75 ovarian cancers (21 early stage I/II and 54 advanced stage III/IV) were analyzed using microsatellite markers.

    Results: Forty-nine of 75 (65%) ovarian cancers had loss of at least one marker. The marker demonstrating the most frequent loss was D1S1597, which was lost in 29/57 (51%) informative cases. Allele loss on 1p36 was significantly more common in poorly differentiated ovarian cancers (73%) relative to well or moderately differentiated cases (48%) (P = 0.03). Evidence was obtained for two common regions of deletion: one flanked by D1S1646/D1S244 and another more proximally by D1S244/D1S228.

    Conclusion: These findings further delineate regions on chromosome 1p36 proposed to contain tumor suppressor gene(s) that may play a role in the development and/or progression of epithelial ovarian carcinoma. Allele loss on 1p36 is associated with poor histologic grade.

    Gynecologic oncology 2001;82;1;94-8

  • The InterPro database, an integrated documentation resource for protein families, domains and functional sites.

    Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ and Zdobnov EM

    EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at Questions can be emailed to

    Nucleic acids research 2001;29;1;37-40

  • The genomic structure and promoter region of the human parkin gene.

    Asakawa S, Tsunematsu Ki, Takayanagi A, Sasaki T, Shimizu A, Shintani A, Kawasaki K, Mungall AJ, Beck S, Minoshima S and Shimizu N

    Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.

    Parkin has been identified as a causative gene of the autosomal recessive juvenile parkinsonism (AR-JP). In this study, we determined the genomic structure of the Parkin gene and identified a core promoter region based on the DNA sequence of 1.4 Mb. The 5'-flanking region contained no apparent TATA or CAAT box elements but several putative cis-elements for various transcription factors. The GC- and CpG-rich regions were observed not only in the 5'-flanking sequence but also in the 5'-part of the first intron of Parkin. We identified an exact starting point of Parkin transcription. A core promoter region was determined by transfecting a series of deletion constructs with a dual luciferase reporter system into human neuroblastoma cells. Furthermore, we located a neighboring novel gene in a head-to-head direction with Parkin with only a 198-bp interval.

    Biochemical and biophysical research communications 2001;286;5;863-8

  • Fission yeast Pom1p kinase activity is cell cycle regulated and essential for cellular symmetry during growth and division.

    Bähler J and Nurse P

    Imperial Cancer Research Fund, Cell Cycle Laboratory, 44 Lincoln's Inn Fields, London WC2A 3PX, UK.

    Schizosaccharomyces pombe cells grow from both ends during most of interphase and divide symmetrically into two daughter cells. The pom1 gene, encoding a member of the Dyrk family of protein kinases, has been identified through a mutant showing abnormal cellular morphogenesis. Here we show that Pom1p kinase activity is cell cycle regulated in correlation with the state of cellular symmetry: the activity is high during symmetrical growth and division, but lower when cells grow at just one end. Point mutations in the catalytic domain lead to asymmetry during both cell growth and division, whilst cells overexpressing Pom1p form additional growing ends. Manipulations of kinase activity indicate a negative role for Pom1p in microtubule growth at cell ends. Pom1p is present in a large protein complex and requires its non-catalytic domain to localize to the cell periphery and its kinase activity to localize to cell ends. These data establish that Pom1p kinase activity plays an important role in generating cellular symmetry and suggest that there may be related roles of homologous protein kinases ubiquitously present in all eukaryotes.

    The EMBO journal 2001;20;5;1064-73

  • Regulation of the stem cell leukemia (SCL) gene: a tale of two fishes.

    Barton LM, Gottgens B, Gering M, Gilbert JG, Grafham D, Rogers J, Bentley D, Patient R and Green AR

    Department of Hematology, Cambridge Institute for Medical Research, University of Cambridge, Addenbrookes Hospital, Cambridge CB2 2XY, United Kingdom.

    The stem cell leukemia (SCL) gene encodes a tissue-specific basic helix-loop-helix (bHLH) protein with a pivotal role in hemopoiesis and vasculogenesis. Several enhancers have been identified within the murine SCL locus that direct reporter gene expression to subdomains of the normal SCL expression pattern, and long-range sequence comparisons of the human and murine SCL loci have identified additional candidate enhancers. To facilitate the characterization of regulatory elements, we have sequenced and analyzed 33 kb of the SCL genomic locus from the pufferfish Fugu rubripes, a species with a highly compact genome. Although the pattern of SCL expression is highly conserved from mammals to teleost fish, the genes flanking pufferfish SCL were unrelated to those known to flank both avian and mammalian SCL genes. These data suggest that SCL regulatory elements are confined to the region between the upstream and downstream flanking genes, a region of 65 kb in human and 8.5 kb in pufferfish. Consistent with this hypothesis, the entire 33-kb pufferfish SCL locus directed appropriate expression to hemopoietic and neural tissue in transgenic zebrafish embryos, as did a 10.4-kb fragment containing the SCL gene and extending to the 5' and 3' flanking genes. These results demonstrate the power of combining the compact genome of the pufferfish with the advantages that zebrafish provide for studies of gene regulation during development. Furthermore, the pufferfish SCL locus provides a powerful tool for the manipulation of hemopoiesis and vasculogenesis in vivo.

    Proceedings of the National Academy of Sciences of the United States of America 2001;98;12;6747-52

  • Genome acrobatics: understanding complex genomes.

    Beck S

    Head of Human Sequencing, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, tel: +44 1223 494952; fax: +44 1223 494919, Cambridge, UK

    Drug discovery today 2001;6;23;1181-1182

  • From first base: the sequence of the tip of the X chromosome of Drosophila melanogaster, a comparison of two sequencing strategies.

    Benos PV, Gatt MK, Murphy L, Harris D, Barrell B, Ferraz C, Vidal S, Brun C, Demaille J, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Borkova D, Miñana B, Kafatos FC, Bolshakov S, Sidén-Kiamos I, Papagiannakis G, Spanos L, Louis C, Madueño E, de Pablos B, Modolell J, Peter A, Schöttler P, Werner M, Mourkioti F, Beinert N, Dowe G, Schäfer U, Jäckle H, Bucheton A, Callister D, Campbell L, Henderson NS, McMillan PJ, Salles C, Tait E, Valenti P, Saunders RD, Billaud A, Pachter L, Glover DM and Ashburner M

    EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome of Drosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94 had been sequenced already in the course of studying the biology of their gene products, and examples of 12 different transposable elements. We show that an interval between bands 3A2 and 3C2, believed in the 1970s to show a correlation between the number of bands on the polytene chromosomes and the 20 genes identified by conventional genetics, is predicted to contain 45 genes from its DNA sequence. We have determined the insertion sites of P-elements from 111 mutant lines, about half of which are in a position likely to affect the expression of novel predicted genes, thus representing a resource for subsequent functional genomic analysis. We compare the European Drosophila Genome Project sequence with the corresponding part of the independently assembled and annotated Joint Sequence determined through "shotgun" sequencing. Discounting differences in the distribution of known transposable elements between the strains sequenced in the two projects, we detected three major sequence differences, two of which are probably explained by errors in assembly; the origin of the third major difference is unclear. In addition there are eight sequence gaps within the Joint Sequence. At least six of these eight gaps are likely to be sites of transposable elements; the other two are complex. Of the 275 genes in common to both projects, 60% are identical within 1% of their predicted amino-acid sequence and 31% show minor differences such as in choice of translation initiation or termination codons; the remaining 9% show major differences in interpretation.

    Genome research 2001;11;5;710-30

  • The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X.

    Bentley DR, Deloukas P, Dunham A, French L, Gregory SG, Humphray SJ, Mungall AJ, Ross MT, Carter NP, Dunham I, Scott CE, Ashcroft KJ, Atkinson AL, Aubin K, Beare DM, Bethel G, Brady N, Brook JC, Burford DC, Burrill WD, Burrows C, Butler AP, Carder C, Catanese JJ, Clee CM, Clegg SM, Cobley V, Coffey AJ, Cole CG, Collins JE, Conquer JS, Cooper RA, Culley KM, Dawson E, Dearden FL, Durbin RM, de Jong PJ, Dhami PD, Earthrowl ME, Edwards CA, Evans RS, Gillson CJ, Ghori J, Green L, Gwilliam R, Halls KS, Hammond S, Harper GL, Heathcott RW, Holden JL, Holloway E, Hopkins BL, Howard PJ, Howell GR, Huckle EJ, Hughes J, Hunt PJ, Hunt SE, Izmajlowicz M, Jones CA, Joseph SS, Laird G, Langford CF, Lehvaslaiho MH, Leversha MA, McCann OT, McDonald LM, McDowall J, Maslen GL, Mistry D, Moschonas NK, Neocleous V, Pearson DM, Phillips KJ, Porter KM, Prathalingam SR, Ramsey YH, Ranby SA, Rice CM, Rogers J, Rogers LJ, Sarafidou T, Scott DJ, Sharp GJ, Shaw-Smith CJ, Smink LJ, Soderlund C, Sotheran EC, Steingruber HE, Sulston JE, Taylor A, Taylor RG, Thorpe AA, Tinsley E, Warry GL, Whittaker A, Whittaker P, Williams SH, Wilmer TE, Wooster R and Wright CL

    The Sanger Centre, Hinxton, Cambridge, UK.

    We constructed maps for eight chromosomes (1, 6, 9, 10, 13, 20, X and (previously) 22), representing one-third of the genome, by building landmark maps, isolating bacterial clones and assembling contigs. By this approach, we could establish the long-range organization of the maps early in the project, and all contig extension, gap closure and problem-solving was simplified by containment within local regions. The maps currently represent more than 94% of the euchromatic (gene-containing) regions of these chromosomes in 176 contigs, and contain 96% of the chromosome-specific markers in the human gene map. By measuring the remaining gaps, we can assess chromosome length and coverage in sequenced clones.

    Nature 2001;409;6822;942-3

  • Parasites are GO.

    Berriman M, Aslett M, Hall N and Ivens A

    The Sanger Centre, Wellcome Trust, Genome Campus, Hinxton, Cambridge, UK.

    Trends in parasitology 2001;17;10;463-4

  • Insulin signalling.

    Bevan P

    The Sanger Centre, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK.

    Journal of cell science 2001;114;Pt 8;1429-30

  • Mining the draft human genome.

    Birney E, Bateman A, Clamp ME and Hubbard TJ

    The European Bioinformatics Institute, Hinxton, Cambridge, UK.

    Now that the draft human genome sequence is available, everyone wants to be able to use it. However, we have perhaps become complacent about our ability to turn new genomes into lists of genes. The higher volume of data associated with a larger genome is accompanied by a much greater increase in complexity. We need to appreciate both the scale of the challenge of vertebrate genome analysis and the limitations of current gene prediction methods and understanding.

    Nature 2001;409;6822;827-8

  • A novel poly(A)-binding protein gene (PABPC5) maps to an X-specific subinterval in the Xq21.3/Yp11.2 homology block of the human sex chromosomes.

    Blanco P, Sargent CA, Boucher CA, Howell G, Ross M and Affara NA

    Human Molecular Genetics Group, Division of Cellular and Molecular Pathology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, United Kingdom.

    The gene-poor human-specific Xq21.3/Yp11.2 block of homology exhibits 99% nucleotide identity, with the exception of an internal X-specific region containing the marker DXS214. This paper describes the characterization of a novel gene (PABPC5) from this X-specific subinterval that belongs to the poly(A)-binding protein gene family. The genomic structure of PABPC5 covers 4061 bp of an uninterrupted open reading frame (ORF) and a 5'UTR spanning across two exons and associated with a CpG island; the potential 382-amino-acid protein contains four RNA recognition motif domains. PABPC5 has 73% nucleotide identity with PABPC4 over 1801 bp of the ORF. At the protein level, 60% identity and 75% similarity are obtained in the comparison with human PABPC4, as well as human, mouse, and Xenopus PABPC1. RT-PCR indicates that PABPC5 is expressed in fetal brain and in a range of adult tissues. Conservation of the PABPC5 ORF and genomic structure is shown in primates and rodents. The close proximity of this gene to translocation breakpoints associated with premature ovarian failure makes it a potential candidate for this condition.

    Genomics 2001;74;1;1-11

  • Identification and characterization of KLHL4, a novel human homologue of the Drosophila Kelch gene that maps within the X-linked cleft palate and Ankyloglossia (CPX) critical region.

    Braybrook C, Warry G, Howell G, Arnason A, Bjornsson A, Moore GE, Ross MT and Stanier P

    Institute of Reproductive and Developmental Biology, Imperial College, Hammersmith Campus, Du Cane Road, London, W12 ONN, United Kingdom.

    X-linked cleft palate (CPX) is a rare nonsyndromic form of orofacial clefting that is, unlike more common forms, inherited as a highly penetrant Mendelian trait. Linkage studies using a large Icelandic kindred localized the gene to Xq21.3, and a physical map defining a 2.0-Mb candidate region was subsequently constructed. Genomic sequence is now available for much of the critical region and has been surveyed for potential transcriptional units. Through this analysis, we have identified a novel human homologue of Kelch, KLHL4. The transcript represents a mRNA of approximately 3.6 kb and encodes a protein of 718 amino acids. Protein domain analysis reveals six tandem repeats (Kelch repeats) at the C-terminus and a POZ/BTB protein-binding domain toward the N-terminus, characteristic of Drosophila Kelch and other family members. KLHL4 consists of 11 exons spanning a genomic interval of approximately 150 kb. From EST sequences and RT-PCR analysis, there is evidence for the use of alternative 3' UTRs. The mRNA is expressed in a range of fetal tissues including tongue, palate, and mandible. Mutational analysis in affected CPX patients revealed one sequence alteration that was most likely to be a silent polymorphism.

    Genomics 2001;72;2;128-36

  • Physical and transcriptional mapping of the X-linked cleft palate and ankyloglossia (CPX) critical region.

    Braybrook C, Warry G, Howell G, Mandryko V, Arnason A, Bjornsson A, Ross MT, Moore GE and Stanier P

    Institute of Reproductive and Developmental Biology, Imperial College School of Medicine, London, UK.

    Cleft palate most commonly occurs as a sporadic multifactorial disorder with a clear but difficult to define genetic component. As a semi-dominant disorder, X-linked cleft palate (CPX) provides a useful model to investigate a congenital defect that is little influenced by non-genetic factors. By using an Icelandic kindred, CPX has been localised between DXS1196 and DXS1217 and mapped, in a 3-Mb yeast artificial chromosome contig, at Xq21.3. Markers generated from this physical map have now been used to construct a contig of P1 and bacterial artificial chromosome clones for genomic DNA sequencing. Genomic DNA sequence analysis has revealed two novel expressed genes and two pseudogenes in the order Cen-KLHL4-LAMRL5-CAPZA1P-CPXCR1-Tel. KLHL4 and CPXCR1 are widely expressed in fetal tissues, including the tongue, mandible and palate. DNA mutation screening of CPXCR1 has revealed several sequence variants present on all affected CPX chromosomes. However, these variants have also been detected at a lower frequency on unaffected chromosomes, indicating that they are polymorphisms that are unlikely to cause the CPX phenotype.

    Human genetics 2001;108;6;537-45

  • SpliceDB: database of canonical and non-canonical mammalian splice sites.

    Burset M, Seledtsov IA and Solovyev VV

    The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK and Softberry Inc., 108 Corporate Park Drive, Suite 120, White Plains, NY 10604, USA.

    A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.

    Nucleic acids research 2001;29;1;255-9

  • A clone-array pooled shotgun strategy for sequencing large genomes.

    Cai WW, Chen R, Gibbs RA and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

    A simplified strategy for sequencing large genomes is proposed. Clone-Array Pooled Shotgun Sequencing (CAPSS) is based on pooling rows and columns of arrayed genomic clones, for shotgun library construction. Random sequences are accumulated, and the data are processed by sequential comparison of rows and columns to assemble the sequence of clones at points of intersection. Compared with either a clone-by-clone approach or whole-genome shotgun sequencing, CAPSS requires relatively few library constructions and only minimal computational power for a complete genome assembly. The strategy is suitable for sequencing large genomes for which there are no sequence-ready maps, but for which relatively high resolution STS maps and highly redundant BAC libraries are available. It is immediately applicable to the sequencing of mouse, rat, zebrafish, and other important genomes, and can be managed in a cooperative fashion to take advantage of a distributed international DNA sequencing capacity.

    Genome research 2001;11;10;1619-23

  • An SSLP marker-anchored BAC framework map of the mouse genome.

    Cai WW, Chow CW, Damani S, Gregory SG, Marra M and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

    We have constructed a BAC framework map of the mouse genome consisting of 2,808 PCR-confirmed BAC clusters, using a previously described method. Fingerprints of BACs from selected clusters confirm the accuracy of the map. Combined with BAC fingerprint data, the framework map covers 37% of the mouse genome.

    Nature genetics 2001;29;2;133-4

  • Lost and found.

    Cerdeño-Tárraga A, Thomson N, Holden M, Sebaihia M and Parkhill J

    Trends in microbiology 2001;9;11;526-7

  • Integration of cytogenetic landmarks into the draft sequence of the human genome.

    Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, Chen XN, Furey TS, Kim UJ, Kuo WL, Olivier M, Conroy J, Kasprzyk A, Massa H, Yonescu R, Sait S, Thoreen C, Snijders A, Lemyre E, Bailey JA, Bruzel A, Burrill WD, Clegg SM, Collins S, Dhami P, Friedman C, Han CS, Herrick S, Lee J, Ligon AH, Lowry S, Morley M, Narasimhan S, Osoegawa K, Peng Z, Plajzer-Frick I, Quade BJ, Scott D, Sirotkin K, Thorpe AA, Gray JW, Hudson J, Pinkel D, Ried T, Rowen L, Shen-Ong GL, Strausberg RL, Birney E, Callen DF, Cheng JF, Cox DR, Doggett NA, Carter NP, Eichler EE, Haussler D, Korenberg JR, Morton CC, Albertson D, Schuler G, de Jong PJ, Trask BJ and BAC Resource Consortium

    Department of Pediatrics, University of Pennsylvania, The Children's Hospital of Philadelphia, 19104, USA.

    We have placed 7,600 cytogenetically defined landmarks on the draft sequence of the human genome to help with the characterization of genes altered by gross chromosomal aberrations that cause human disease. The landmarks are large-insert clones mapped to chromosome bands by fluorescence in situ hybridization. Each clone contains a sequence tag that is positioned on the genomic sequence. This genome-wide set of sequence-anchored clones allows structural and functional analyses of the genome. This resource represents the first comprehensive integration of cytogenetic, radiation hybrid, linkage and sequence maps of the human genome; provides an independent validation of the sequence map and framework for contig order and orientation; surveys the genome for large-scale duplications, which are likely to require special attention during sequence assembly; and allows a stringent assessment of sequence differences between the dark and light bands of chromosomes. It also provides insight into large-scale chromatin structure and the evolution of chromosomes and gene families and will accelerate our understanding of the molecular bases of human disease and cancer.

    Nature 2001;409;6822;953-8

  • The morning after.

    Clamp M

    The Wellcome Trust, Sanger Institute, Hinxton, CB10 1SA, Cambridge, UK.

    Trends in genetics : TIG 2001;17;12;688-9

  • Gene expression microarray analysis in cancer biology, pharmacology, and drug development: progress and potential.

    Clarke PA, te Poele R, Wooster R and Workman P

    Cancer Research Campaign Centre for Cancer Therapeutics, E Block, Institute of Cancer Research, 15 Cotswold Road, SM2 5NG, Sutton, Surrey, UK.

    With the imminent completion of the Human Genome Project, biomedical research is being revolutionised by the ability to carry out investigations on a genome wide scale. This is particularly important in cancer, a disease that is caused by accumulating abnormalities in the sequence and expression of a number of critical genes. Gene expression microarray technology is gaining increasingly widespread use as a means to determine the expression of potentially all human genes at the level of messenger RNA. In this commentary, we review developments in gene expression microarray technology and illustrate the progress and potential of the methodology in cancer biology, pharmacology, and drug development. Important applications include: (a) development of a more global understanding of the gene expression abnormalities that contribute to malignant progression; (b) discovery of new diagnostic and prognostic indicators and biomarkers of therapeutic response; (c) identification and validation of new molecular targets for drug development; (d) provision of an improved understanding of the molecular mode of action during lead identification and optimisation, including structure-activity relationships for on-target versus off-target effects; (e) prediction of potential side-effects during preclinical development and toxicology studies; (f) confirmation of a molecular mode of action during hypothesis-testing clinical trials; (g) identification of genes involved in conferring drug sensitivity and resistance; and (h) prediction of patients most likely to benefit from the drug and use in general pharmacogenomic studies. As a result of further technological improvements and decreasing costs, the use of microarrays will become an essential and potentially routine tool for cancer and biomedical research.

    Biochemical pharmacology 2001;62;10;1311-36

  • Disruption of an imprinted gene cluster by a targeted chromosomal translocation in mice.

    Cleary MA, van Raamsdonk CD, Levorse J, Zheng B, Bradley A and Tilghman SM

    Howard Hughes Medical Institute and Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA.

    Genomic imprinting is an epigenetic process in which the activity of a gene is determined by its parent of origin. Mechanisms governing genomic imprinting are just beginning to be understood. However, the tendency of imprinted genes to exist in chromosomal clusters suggests a sharing of regulatory elements. To better understand imprinted gene clustering, we disrupted a cluster of imprinted genes on mouse distal chromosome 7 using the Cre/loxP recombination system. In mice carrying a site-specific translocation separating Cdkn1c and Kcnq1, imprinting of the genes retained on chromosome 7, including Kcnq1, Kcnq1ot1, Ascl2, H19 and Igf2, is unaffected, demonstrating that these genes are not regulated by elements near or telomeric to Cdkn1c. In contrast, expression and imprinting of the translocated Cdkn1c, Slc22a1l and Tssc3 on chromosome 11 are affected, consistent with the hypothesis that elements regulating both expression and imprinting of these genes lie within or proximal to Kcnq1. These data support the proposal that chromosomal abnormalities, including translocations, within KCNQ1 that are associated with the human disease Beckwith-Wiedemann syndrome (BWS) may disrupt CDKN1C expression. These results underscore the importance of gene clustering for the proper regulation of imprinted genes.

    Nature genetics 2001;29;1;78-82

  • Massive gene decay in the leprosy bacillus.

    Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, Honoré N, Garnier T, Churcher C, Harris D, Mungall K, Basham D, Brown D, Chillingworth T, Connor R, Davies RM, Devlin K, Duthoy S, Feltwell T, Fraser A, Hamlin N, Holroyd S, Hornsby T, Jagels K, Lacroix C, Maclean J, Moule S, Murphy L, Oliver K, Quail MA, Rajandream MA, Rutherford KM, Rutter S, Seeger K, Simon S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Taylor K, Whitehead S, Woodward JR and Barrell BG

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, Paris, France.

    Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

    Nature 2001;409;6823;1007-11

  • Sequence, structure and pathology of the fully annotated terminal 2 Mb of the short arm of human chromosome 16.

    Daniels RJ, Peden JF, Lloyd C, Horsley SW, Clark K, Tufarelli C, Kearney L, Buckle VJ, Doggett NA, Flint J and Higgs DR

    MRC Molecular Haematology Unit, Weatherall Institute for Molecular Medicine, John Radcliffe Hospital, Oxford OX3 9DS, UK.

    We have sequenced 1949 kb from the terminal Giemsa light band of human chromosome 16p, enabling us to fully annotate the region extending from the telomeric repeats to the previously published tuberous sclerosis disease 2 (TSC2) and polycystic kidney disease 1 (PKD1) genes. This region can be subdivided into two GC-rich, Alu-rich domains and one GC-rich, Alu-poor domain. The entire region is extremely gene rich, containing 100 confirmed genes and 20 predicted genes. Many of the genes encode widely expressed proteins orchestrating basic cellular processes (e.g. DNA recombination, repair, transcription, RNA processing, signal transduction, intracellular signalling and mRNA translation). Others, such as the alpha globin genes (HBA1 and HBA2), PDIP and BAIAP3, are specialized tissue-restricted genes. Some of the genes have been previously implicated in the pathophysiology of important human genetic diseases (e.g. asthma, cataracts and the ATR-16 syndrome). Others are known disease genes for alpha thalassaemia, adult polycystic kidney disease and tuberous sclerosis. There is also linkage evidence for bipolar affective disorder, epilepsy and autism in this region. Sixty-three chromosomal deletions reported here and elsewhere allow us to interpret the results of removing progressively larger numbers of genes from this well defined human telomeric region.

    Human molecular genetics 2001;10;4;339-52

  • A SNP resource for human chromosome 22: extracting dense clusters of SNPs from the genomic sequence.

    Dawson E, Chen Y, Hunt S, Smink LJ, Hunt A, Rice K, Livingston S, Bumpstead S, Bruskiewich R, Sham P, Ganske R, Adams M, Kawasaki K, Shimizu N, Minoshima S, Roe B, Bentley D and Dunham I

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The recent publication of the complete sequence of human chromosome 22 provides a platform from which to investigate genomic sequence variation. We report the identification and characterization of 12,267 potential variants (SNPs and other small insertions/deletions) of human chromosome 22, discovered in the overlaps of 460 clones used for the chromosome sequencing. We found, on average, 1 potential variant every 1.07 kb and approximately 18% of the potential variants involve insertions/deletions. The SNPs have been positioned both relative to each other, and to genes, predicted genes, repeat sequences, other genetic markers, and the 2730 SNPs previously identified on the chromosome. A subset of the SNPs were verified experimentally using either PCR-RFLP or genomic Invader assays. These experiments confirmed 92% of the potential variants in a panel of 92 individuals. [Details of the SNPs and RFLP assays can be found at and in dbSNP.]

    Genome research 2001;11;1;170-8

  • HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families.

    de Bakker PI, Bateman A, Burke DF, Miguel RN, Mizuguchi K, Shi J, Shirai H and Blundell TL

    Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.

    summary: We describe an extension to the Homologous Structure Alignment Database (HOMSTRAD; Mizuguchi et al., Protein Sci., 7, 2469-2471, 1998a) to include homologous sequences derived from the protein families database Pfam (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000). HOMSTRAD is integrated with the server FUGUE (Shi et al., submitted, 2001) for recognition and alignment of homologues, benefitting from the combination of abundant sequence information and accurate structure-based alignments. AVAILABILITY The HOMSTRAD database is available at: Query sequences can be submitted to the homology recognition/alignment server FUGUE at:

    Bioinformatics (Oxford, England) 2001;17;8;748-9

  • Parasite genome initiatives.

    Degrave WM, Melville S, Ivens A and Aslett M

    DBBM - IOC/Fiocruz, Av. Brasil 4365, Manguinhos, 21045-900, Rio de Janeiro, Brazil.

    During 1993-1994, scientists from developing and developed countries planned and initiated a number of parasite genome projects and several consortiums for the mapping and sequencing of these medium-sized genomes were established, often based on already ongoing scientific collaborations. Financial and other support came from WHO/TDR, Wellcome Trust and other funding agencies. Thus, the genomes of Plasmodium falciparum, Schistosoma mansoni, Trypanosoma cruzi, Leishmania major, Trypanosoma brucei, Brugia malayi and other pathogenic nematodes are now under study. From an initial phase of network formation, mapping efforts and resource building (EST, GSS, phage, cosmid, BAC and YAC library constructions), sequencing was initiated in gene discovery projects but soon also on a small chromosome, and now on a fully fledged genome scale. Proteomics, functional analysis, genetic manipulation and microarray analysis are ongoing to different degrees in the respective genome initiatives, and as the funding for the whole genome sequencing becomes secured, most of the participating laboratories, apart from larger sequencing centres, become oriented to post-genomics. Bioinformatics networks are being expanded, including in developing countries, for data mining, annotation and in-depth analysis.

    International journal for parasitology 2001;31;5-6;532-6

  • A superfamily of variant genes encoded in the subtelomeric region of Plasmodium vivax.

    del Portillo HA, Fernandez-Becerra C, Bowman S, Oliver K, Preuss M, Sanchez CP, Schneider NK, Villalobos JM, Rajandream MA, Harris D, Pereira da Silva LH, Barrell B and Lanzer M

    Departamento de Parasitologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, Av. Lineu Prestes 1374, São Paulo, SP 05508-900, Brazil.

    The malarial parasite Plasmodium vivax causes disease in humans, including chronic infections and recurrent relapses, but the course of infection is rarely fatal, unlike that caused by Plasmodium falciparum. To investigate differences in pathogenicity between P. vivax and P. falciparum, we have compared the subtelomeric domains in the DNA of these parasites. In P. falciparum, subtelomeric domains are conserved and contain ordered arrays of members of multigene families, such as var, rif and stevor, encoding virulence determinants of cytoadhesion and antigenic variation. Here we identify, through the analysis of a continuous 155,711-base-pair sequence of a P. vivax chromosome end, a multigene family called vir, which is specific to P. vivax. The vir genes are present at about 600-1,000 copies per haploid genome and encode proteins that are immunovariant in natural infections, indicating that they may have a functional role in establishing chronic infection through antigenic variation.

    Nature 2001;410;6830;839-42

  • Construction of transcript maps by somatic cell/radiation hybrid mapping. The human gene map.

    Deloukas P

    Sanger Centre, Cambridge, UK.

    Methods in molecular biology (Clifton, N.J.) 2001;175;155-68

  • Map integration. From a genetic map to a physical gene map and ultimately to the sequence map.

    Deloukas P

    Sanger Centre, Cambridge, UK.

    Methods in molecular biology (Clifton, N.J.) 2001;175;129-42

  • The DNA sequence and comparative analysis of human chromosome 20.

    Deloukas P, Matthews LH, Ashurst J, Burton J, Gilbert JG, Jones M, Stavrides G, Almeida JP, Babbage AK, Bagguley CL, Bailey J, Barlow KF, Bates KN, Beard LM, Beare DM, Beasley OP, Bird CP, Blakey SE, Bridgeman AM, Brown AJ, Buck D, Burrill W, Butler AP, Carder C, Carter NP, Chapman JC, Clamp M, Clark G, Clark LN, Clark SY, Clee CM, Clegg S, Cobley VE, Collier RE, Connor R, Corby NR, Coulson A, Coville GJ, Deadman R, Dhami P, Dunn M, Ellington AG, Frankland JA, Fraser A, French L, Garner P, Grafham DV, Griffiths C, Griffiths MN, Gwilliam R, Hall RE, Hammond S, Harley JL, Heath PD, Ho S, Holden JL, Howden PJ, Huckle E, Hunt AR, Hunt SE, Jekosch K, Johnson CM, Johnson D, Kay MP, Kimberley AM, King A, Knights A, Laird GK, Lawlor S, Lehvaslaiho MH, Leversha M, Lloyd C, Lloyd DM, Lovell JD, Marsh VL, Martin SL, McConnachie LJ, McLay K, McMurray AA, Milne S, Mistry D, Moore MJ, Mullikin JC, Nickerson T, Oliver K, Parker A, Patel R, Pearce TA, Peck AI, Phillimore BJ, Prathalingam SR, Plumb RW, Ramsay H, Rice CM, Ross MT, Scott CE, Sehra HK, Shownkeen R, Sims S, Skuce CD, Smith ML, Soderlund C, Steward CA, Sulston JE, Swann M, Sycamore N, Taylor R, Tee L, Thomas DW, Thorpe A, Tracey A, Tromans AC, Vaudin M, Wall M, Wallis JM, Whitehead SL, Whittaker P, Willey DL, Williams L, Williams SA, Wilming L, Wray PW, Hubbard T, Durbin RM, Bentley DR, Beck S and Rogers J

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.

    Nature 2001;414;6866;865-71

  • Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity.

    Dorrell N, Mangan JA, Laing KG, Hinds J, Linton D, Al-Ghusein H, Barrell BG, Parkhill J, Stoker NG, Karlyshev AV, Butcher PD and Wren BW

    Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK.

    Campylobacter jejuni is the leading cause of bacterial food-borne diarrhoeal disease throughout the world, and yet is still a poorly understood pathogen. Whole genome microarray comparisons of 11 C. jejuni strains of diverse origin identified genes in up to 30 NCTC 11168 loci ranging from 0.7 to 18.7 kb that are either absent or highly divergent in these isolates. Many of these regions are associated with the biosynthesis of surface structures including flagella, lipo-oligosaccharide, and the newly identified capsule. Other strain-variable genes of known function include those responsible for iron acquisition, DNA restriction/modification, and sialylation. In fact, at least 21% of genes in the sequenced strain appear dispensable as they are absent or highly divergent in one or more of the isolates tested, thus defining 1300 C. jejuni core genes. Such core genes contribute mainly to metabolic, biosynthetic, cellular, and regulatory processes, but many virulence determinants are also conserved. Comparison of the capsule biosynthesis locus revealed conservation of all the genes in this region in strains with the same Penner serotype as strain NCTC 11168. By contrast, between 5 and 17 NCTC 11168 genes in this region are either absent or highly divergent in strains of a different serotype from the sequenced strain, providing further evidence that the capsule accounts for Penner serotype specificity. These studies reveal extensive genetic diversity among C. jejuni strains and pave the way toward identifying correlates of pathogenicity and developing improved epidemiological tools for this problematic pathogen.

    Funded by: Wellcome Trust: 062511

    Genome research 2001;11;10;1706-15

  • The decaying genome of Mycobacterium leprae.

    Eiglmeier K, Parkhill J, Honoré N, Garnier T, Tekaia F, Telenti A, Klatser P, James KD, Thomson NR, Wheeler PR, Churcher C, Harris D, Mungall K, Barrell BG and Cole ST

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, 28 Rue du Docteur Roux, 75724 Paris, France.

    Everything that we need to know about Mycobacterium leprae, a close relative of the tubercle bacillus, is encrypted in its genome. Inspection of the 3.27 Mb genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus identified 1,605 genes encoding proteins and 50 genes for stable RNA species. Comparison with the genome sequence of Mycobacterium tuberculosis revealed an extreme case of reductive evolution, since less than half of the genome contains functional genes while inactivated or pseudogenes are highly abundant. The level of gene duplication was approximately 34% and, on classification of the proteins into families, the largest functional groups were found to be involved in the metabolism and modification of fatty acids and polyketides, transport of metabolites, cell envelope synthesis and gene regulation. Reductive evolution, gene decay and genome downsizing have eliminated entire metabolic pathways, together with their regulatory circuits and accessory functions, particularly those involved in catabolism. This may explain the unusually long generation time and account for our inability to culture the leprosy bacillus.

    Leprosy review 2001;72;4;387-98

  • A 6.9-Mb high-resolution BAC/PAC contig of human 4p15.3-p16.1, a candidate region for bipolar affective disorder.

    Evans KL, Le Hellard S, Morris SW, Lawson D, Whitton C, Semple CA, Fantes JA, Torrance HS, Malloy MP, Maule JC, Humphray SJ, Ross MT, Bentley DR, Muir WJ, Blackwood DH and Porteous DJ

    Medical Genetics Section, MRC Human Genetics Unit, University of Edinburgh, Molecular Medicine Centre, Crewe Road, Edinburgh, EH4 2XU, United Kingdom.

    Bipolar affective disorder (BPAD) is a complex disease with a significant genetic component and a population lifetime risk of 1%. Our previous work identified a region of human chromosome 4p that showed significant linkage to BPAD in a large pedigree. Here, we report the construction of an accurate, high-resolution physical map of 6.9 Mb of human chromosome 4p15.3-p16.1, which includes an 11-cM (5.8 Mb) critical region for BPAD. The map consists of 460 PAC and BAC clones ordered by a combination of STS content analysis and restriction fragment fingerprinting, with a single approximately 300-kb gap remaining. A total of 289 new and existing markers from a wide range of sources have been localized on the contig, giving an average marker resolution of 1 marker/23 kb. The STSs include 57 ESTs, 9 of which represent known genes. This contig is an essential preliminary to the identification of candidate genes that predispose to bipolar affective disorder, to the completion of the sequence of the region, and to the development of a high-density SNP map.

    Genomics 2001;71;3;315-23

  • Cancer and genomics.

    Futreal PA, Kasprzyk A, Birney E, Mullikin JC, Wooster R and Stratton MR

    Cancer Genome Project, Sanger Centre, Cambridge, UK.

    Identification of the genes that cause oncogenesis is a central aim of cancer research. We searched the proteins predicted from the draft human genome sequence for paralogues of known tumour suppressor genes, but no novel genes were identified. We then assessed whether it was possible to search directly for oncogenic sequence changes in cancer cells by comparing cancer genome sequences against the draft genome. Apparently chimaeric transcripts (from oncogenic fusion genes generated by chromosomal translocations, the ends of which mapped to different genomic locations) were detected to the same degree in both normal and neoplastic tissues, indicating a significant level of false positives. Our experiment underscores the limited amount and variable quality of DNA sequence from cancer cells that is currently available.

    Nature 2001;409;6822;850-2

  • Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences.

    Göttgens B, Gilbert JG, Barton LM, Grafham D, Rogers J, Bentley DR and Green AR

    The Wellcome Trust Centre for Molecular Mechanisms in Disease, Cambridge Institute for Medical Research, Addenbrooke's Hospital Site, Cambridge CB2 2XY, UK.

    Long-range comparative sequence analysis provides a powerful strategy for identifying conserved regulatory elements. The stem cell leukemia (SCL) gene encodes a bHLH transcription factor with a pivotal role in hemopoiesis and vasculogenesis, and it displays a highly conserved expression pattern. We present here a detailed sequence comparison of 193 kb of the human SCL locus to 234 kb of the mouse SCL locus. Four new genes have been identified together with an ancient mitochondrial insertion in the human locus. The SCL gene is flanked upstream by the SIL gene and downstream by the MAP17 gene in both species, but the gene order is not collinear downstream from MAP17. To facilitate rapid identification of candidate regulatory elements, we have developed a new sequence analysis tool (SynPlot) that automates the graphical display of large-scale sequence alignments. Unlike existing programs, SynPlot can display the locus features of more than one sequence, thereby indicating the position of homology peaks relative to the structure of all sequences in the alignment. In addition, high-resolution analysis of the chromatin structure of the mouse SCL gene permitted the accurate positioning of localized zones accessible to restriction endonucleases. Zones known to be associated with functional regulatory regions were found to correspond precisely with peaks of human/mouse homology, thus demonstrating that long-range human/mouse sequence comparisons allow accurate prediction of the extent of accessible DNA associated with active regulatory regions.

    Genome research 2001;11;1;87-97

  • The complex repeats of Dictyostelium discoideum.

    Glöckner G, Szafranski K, Winckler T, Dingermann T, Quail MA, Cox E, Eichinger L, Noegel AA and Rosenthal A

    IMB Jena, Department of Genome Analysis, D-07745 Jena, Germany.

    In the course of determining the sequence of the Dictyostelium discoideum genome we have characterized in detail the quantity and nature of interspersed repetitive elements present in this species. Several of the most abundant small complex repeats and transposons (DIRS-1; TRE3-A,B; TRE5-A; skipper; Tdd-4; H3R) have been described previously. In our analysis we have identified additional elements. Thus, we can now present a complete list of complex repetitive elements in D. discoideum. All elements add up to 10% of the genome. Some of the newly described elements belong to established classes (TRE3-C, D; TRE5-B,C; DGLT-A,P; Tdd-5). However, we have also defined two new classes of DNA transposable elements (DDT and thug) that have not been described thus far. Based on the nucleotide amount, we calculated the least copy number in each family. These vary between <10 up to >200 copies. Unique sequences adjacent to the element ends and truncation points in elements gave a measure for the fragmentation of the elements. Furthermore, we describe the diversity of single elements with regard to polymorphisms and conserved structures. All elements show insertion preference into loci in which other elements of the same family reside. The analysis of the complex repeats is a valuable data resource for the ongoing assembly of whole D. discoideum chromosomes.

    Genome research 2001;11;4;585-94

  • Genomics of Mycobacterium bovis.

    Gordon SV, Eiglmeier K, Garnier T, Brosch R, Parkhill J, Barrell B, Cole ST and Hewinson RG

    Veterinary Laboratories Agency, Woodham Lane, New Haw, Addlestone, Surrey, UK.

    The imminent completion of the genome sequence of Mycobacterium bovis will reveal the genetic blueprint for this most successful pathogen. Comparative analysis with the genome sequences of M. tuberculosis and M. bovis BCG promises to expose the genetic basis for the phenotypic differences between the tubercle bacilli, offering unparalleled insight into the virulence factors of the M. tuberculosis complex. Initial analysis of the sequence data has already revealed a novel deletion from M. bovis, as well as identifying variation in members of the PPE family of proteins. As the study of bacterial pathogenicity enters the postgenomic phase, the genome sequence of M. bovis promises to serve as a cornerstone of mycobacterial genetics.

    Tuberculosis (Edinburgh, Scotland) 2001;81;1-2;157-63

  • Sequencing bacterial artificial chromosomes.

    Harris DE and Murphy L

    The Sanger Centre, Cambridge, UK.

    Methods in molecular biology (Clifton, N.J.) 2001;175;217-34

  • Mutation and haplotype analysis of the CFTR gene in atypically mild cystic fibrosis patients from Northern Ireland.

    Hughes D, Dörk T, Stuhrmann M and Graham C

    Journal of medical genetics 2001;38;2;136-9

  • Subtelomeric sequence from the right arm of Schizosaccharomyces pombe chromosome I contains seven permease genes.

    Hunt C, Moore K, Xiang Z, Hurst SM, McDougall RC, Rajandream MA, Barrell BG, Gwilliam R, Wood V, Lyne MH and Aves SJ

    School of Biological Sciences, University of Exeter, Washington Singer Laboratories, Perry Road, Exeter EX4 4QG, UK.

    The sequence has been determined of 80 888 bp of contiguous subtelomeric DNA, including the isp5 gene, from the right arm of chromosome I of Schizosaccharomyces pombe; 27 open reading frames (ORFs) longer than 100 codons are present, giving a density of one gene per 3.0 kb. Seven of the predicted proteins are members of the major facilitator superfamily (MFS) of transport proteins, including four amino acid permease homologues, bringing this family of amino acid permease sequences to 17 in Sz. pombe, and a phylogenetic analysis is presented. Also encoded is an allantoate permease homologue, a sulphate permease homologue and a probable urea active transporter. Predicted non-membrane proteins include a 1-aminocyclopropane-1-carboxylate deaminase (ACC deaminase), a class III aminotransferase, serine acetyltransferase, protein-L-isoaspartate O-methyltransferase, alpha-glucosidase, alpha-galactosidase, esterase/lipase, oxidoreductase of the short-chain dehydrogenase/reductase (SDR) family, aldehyde dehydrogenase, formamidase, amidase, flavohaemoprotein, a putative translation initiation inhibitor and a protein with similarity to a filamentous fungal conidiation-specific protein. The remaining six ORFs are likely to encode proteins, either because they have sequence similarity with hypothetical proteins or because they are known to be transcribed. Introns are scarce in the sequenced region: only three ORFs contain introns, with only one having multiple introns. The sequenced region also contains a single Tf1 transposon long terminal repeat (LTR). The sequence is derived from cosmid clones c869, c922 and c1039 and has been submitted to the EMBL database under entries SPAC869 (Accession No. AL132779), SPAC922 (AL133522) and SPAC1039 (AL133521).

    Yeast (Chichester, England) 2001;18;4;355-61

  • Gene discovery in Plasmodium chabaudi by genome survey sequencing.

    Janssen CS, Barrett MP, Lawson D, Quail MA, Harris D, Bowman S, Phillips RS and Turner CM

    Division of Infection & Immunity, IBLS, University of Glasgow, G12 8QQ, Glasgow, UK.

    The first genome survey sequencing of the rodent malaria parasite Plasmodium chabaudi is presented here. In 766 sequences, 131 putative gene sequences have been identified by sequence similarity database searches. Further, 7 potential gene families, four of which have not previously been described, were discovered. These genes may be important in understanding the biology of malaria, as well as offering potential new drug targets. We have also identified a number of candidate minisatellite sequences that could be helpful in genetic studies. Genome survey sequencing in P. chabaudi is a productive strategy in further developing this in vivo model of malaria, in the context of the malaria genome projects.

    Molecular and biochemical parasitology 2001;113;2;251-60

  • Karyotyping mouse chromosomes by multiplex-FISH (M-FISH).

    Jentsch I, Adler ID, Carter NP and Speicher MR

    Institut für Anthropologie und Humangenetik, Ludwig-Maximilians-Universität München, Germany.

    Karyotyping of mouse chromosomes is a skillful art, which is laborious work even for experienced cytogeneticists. With the growing number of mouse models for human diseases, there is an increasing demand for automated mouse karyotyping systems. Here, such a karyotyping system for mouse chromosomes based on the multiplex-fluorescence in-situ hybridization (M-FISH) technology is shown. The system was tested on a number of individual mice with numerical and structural aberrations and its reproducibility and robustness verified. Mouse M-FISH should be a valuable tool for the analysis of chromosomal rearrangements in mice.

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 2001;9;3;211-4

  • Functional annotation of a full-length mouse cDNA collection.

    Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, Adachi J, Fukuda S, Aizawa K, Izawa M, Nishi K, Kiyosawa H, Kondo S, Yamanaka I, Saito T, Okazaki Y, Gojobori T, Bono H, Kasukawa T, Saito R, Kadota K, Matsuda H, Ashburner M, Batalov S, Casavant T, Fleischmann W, Gaasterland T, Gissi C, King B, Kochiwa H, Kuehl P, Lewis S, Matsuo Y, Nikaido I, Pesole G, Quackenbush J, Schriml LM, Staubli F, Suzuki R, Tomita M, Wagner L, Washio T, Sakai K, Okido T, Furuno M, Aono H, Baldarelli R, Barsh G, Blake J, Boffelli D, Bojunga N, Carninci P, de Bonaldo MF, Brownstein MJ, Bult C, Fletcher C, Fujita M, Gariboldi M, Gustincich S, Hill D, Hofmann M, Hume DA, Kamiya M, Lee NH, Lyons P, Marchionni L, Mashima J, Mazzarelli J, Mombaerts P, Nordone P, Ring B, Ringwald M, Rodriguez I, Sakamoto N, Sasaki H, Sato K, Schönbach C, Seya T, Shibata Y, Storch KF, Suzuki H, Toyo-oka K, Wang KH, Weitz C, Whittaker C, Wilming L, Wynshaw-Boris A, Yoshida K, Hasegawa Y, Kawaji H, Kohtsuki S, Hayashizaki Y and RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium

    Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, Yokohama Institute, Kanagawa, Japan.

    The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

    Nature 2001;409;6821;685-90

  • Detailed structure of Pneumocystis carinii chromosome ends.

    Keely SP, Wakefield AE, Cushion MT, Smulian AG, Hall N, Barrell BG and Stringer JR

    Department of Molecular Genetics, Biochemistry & Microbiology, University of Cincinnati, OH, USA.

    Funded by: FIC NIH HHS: TW01200-02; NIAID NIH HHS: R01AI36701, R01AI44651

    The Journal of eukaryotic microbiology 2001;Suppl;118S-120S

  • The use of functional genomics in C. elegans for studying human development and disease.

    Kuwabara PE and O'Neil N

    The Sanger Centre, Hinxton, UK.

    The 100 Mb Caenorhabditis elegans genome sequence is the first animal genome to be sequenced in its entirety. Many reverse-genetics tools have been developed to mine the genome sequence and to facilitate the jump between the identification of a gene sequence and the understanding of its function. Here we discuss how C. elegans can contribute to understanding of the function of genes involved in human development and disease.

    Journal of inherited metabolic disease 2001;24;2;127-38

  • It ain't over till it's ova: germline sex determination in C. elegans.

    Kuwabara PE and Perry MD

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, UK.

    Sex determination in most organisms involves a simple binary fate choice between male or female development; the outcome of this decision has profound effects on organismal biology, biochemistry and behaviour. In the nematode C. elegans, there is also a binary choice, either male or hermaphrodite. In C. elegans, distinct genetic pathways control somatic and germline sexual cell fate. Both pathways share a common set of globally acting regulatory genes; however, germline-specific regulatory genes also participate in the decision to make male or female gametes. The determination of sexual fate in the germline of the facultative hermaphrodite poses a special problem, because first sperm then oocytes are produced. It has emerged that additional layers of post-transcriptional regulation have been imposed to modulate the activities of the global sex-determining genes, tra-2 and fem-3; the balance between these activities is crucial in controlling sexual cell fate in the hermaphrodite germline.

    BioEssays : news and reviews in molecular, cellular and developmental biology 2001;23;7;596-604

  • Initial sequencing and analysis of the human genome.

    Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J and International Human Genome Sequencing Consortium

    Whitehead Institute for Biomedical Research, Center for Genome Research, Cambridge, Massachusetts 02142, USA.

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

    Nature 2001;409;6822;860-921

  • David Bentley discusses life after the Human Genome Project. Interviewed by Rebecca Lawrence.

    Lawrence R

    David Bentley, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA tel.: +44-1223-834244; fax: +44-1223-494919;, Cambridge, UK

    Drug discovery today 2001;6;1;13-14

  • Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts.

    Lesk AM, Lo Conte L and Hubbard TJ

    Department of Haematology, University of Cambridge Clinical School, Cambridge Institute for Medical Research, Cambridge, United Kingdom.

    In the Novel Fold category, three types of predictions were assessed: three-dimensional structures, secondary structures, and residue-residue contacts. For predictions of three-dimensional models, CASP4 targets included 5 domains or structures with novel folds, and 13 on the borderline between Novel Fold and Fold Recognition categories. These elicited 1863 predictions of these and other targets by methods more general than comparative modeling or fold recognition techniques. The group of Bonneau, Tsai, Ruczinski, and Baker stood out as performing well with the greatest consistency. In many cases, several groups were able to predict fragments of the target correctly-often at a level somewhat larger than standard supersecondary structures-but were not able to assemble fragments into a correct global topology. The methods of Bonneau, Tsai, Ruczinski, and Baker have been successful in addressing the fragment assembly problem for many but not all the target structures.

    Proteins 2001;Suppl 5;98-118

  • A computational scan for U12-dependent introns in the human genome sequence.

    Levine A and Durbin R

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    U12-dependent introns are found in small numbers in most eukaryotic genomes, but their scarcity makes accurate characterisation of their properties challenging. A computational search for U12-dependent introns was performed using the draft version of the human genome sequence. Human expressed sequences confirmed 404 U12-dependent introns within the human genome, a 6-fold increase over the total number of non-redundant U12-dependent introns previously identified in all genomes. Although most of these introns had AT-AC or GT-AG terminal dinucleotides, small numbers of introns with a surprising diversity of termini were found, suggesting that many of the non-canonical introns found in the human genome may be variants of U12-dependent introns and, thus, spliced by the minor spliceosome. Comparisons with U2-dependent introns revealed that the U12-dependent intron set lacks the 'short intron' peak characteristic of U2-dependent introns. Analysis of this U12-dependent intron set confirmed reports of a biased distribution of U12-dependent introns in the genome and allowed the identification of several alternative splicing events as well as a surprising number of apparent splicing errors. This new larger reference set of U12-dependent introns will serve as a resource for future studies of both the properties and evolution of the U12 spliceosome.

    Nucleic acids research 2001;29;19;4006-13

  • Progress in sequencing the mouse genome.

    Lindblad-Toh K, Lander ES, McPherson JD, Waterston RH, Rodgers J and Birney E

    Genesis (New York, N.Y. : 2000) 2001;31;4;137-41

  • Tbx1 haploinsufficieny in the DiGeorge syndrome region causes aortic arch defects in mice.

    Lindsay EA, Vitelli F, Su H, Morishima M, Huynh T, Pramparo T, Jurecic V, Ogunrinu G, Sutherland HF, Scambler PJ, Bradley A and Baldini A

    Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030, USA.

    DiGeorge syndrome is characterized by cardiovascular, thymus and parathyroid defects and craniofacial anomalies, and is usually caused by a heterozygous deletion of chromosomal region 22q11.2 (del22q11) (ref. 1). A targeted, heterozygous deletion, named Df(16)1, encompassing around 1 megabase of the homologous region in mouse causes cardiovascular abnormalities characteristic of the human disease. Here we have used a combination of chromosome engineering and P1 artificial chromosome transgenesis to localize the haploinsufficient gene in the region, Tbx1. We show that Tbx1, a member of the T-box transcription factor family, is required for normal development of the pharyngeal arch arteries in a gene dosage-dependent manner. Deletion of one copy of Tbx1 affects the development of the fourth pharyngeal arch arteries, whereas homozygous mutation severely disrupts the pharyngeal arch artery system. Our data show that haploinsufficiency of Tbx1 is sufficient to generate at least one important component of the DiGeorge syndrome phenotype in mice, and demonstrate the suitability of the mouse for the genetic dissection of microdeletion syndromes.

    Nature 2001;410;6824;97-101

  • Independent regulation of initiation and maintenance phases of Hoxa3 expression in the vertebrate hindbrain involve auto- and cross-regulatory mechanisms.

    Manzanares M, Bel-Vialar S, Ariza-McNaughton L, Ferretti E, Marshall H, Maconochie MM, Blasi F and Krumlauf R

    Division of Developmental Neurobiology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.

    During development of the vertebrate hindbrain, Hox genes play multiple roles in the segmental processes that regulate anteroposterior (AP) patterning. Paralogous Hox genes, such as Hoxa3, Hoxb3 and Hoxd3, generally have very similar patterns of expression, and gene targeting experiments have shown that members of paralogy group 3 can functionally compensate for each other. Hence, distinct functions for individual members of this family may primarily depend upon differences in their expression domains. The earliest domains of expression of the Hoxa3 and Hoxb3 genes in hindbrain rhombomeric (r) segments are transiently regulated by kreisler, a conserved Maf b-Zip protein, but the mechanisms that maintain expression in later stages are unknown. In this study, we have compared the segmental expression and regulation of Hoxa3 and Hoxb3 in mouse and chick embryos to investigate how they are controlled after initial activation. We found that the patterns of Hoxa3 and Hoxb3 expression in r5 and r6 in later stages during mouse and chick hindbrain development were differentially regulated. Hoxa3 expression was maintained in r5 and r6, while Hoxb3 was downregulated. Regulatory comparisons of cis-elements from the chick and mouse Hoxa3 locus in both transgenic mouse and chick embryos have identified a conserved enhancer that mediates the late phase of Hoxa3 expression through a conserved auto/cross-regulatory loop. This block of similarity is also present in the human and horn shark loci, and contains two bipartite Hox/Pbx-binding sites that are necessary for its in vivo activity in the hindbrain. These HOX/PBC sites are positioned near a conserved kreisler-binding site (KrA) that is involved in activating early expression in r5 and r6, but their activity is independent of kreisler. This work demonstrates that separate elements are involved in initiating and maintaining Hoxa3 expression during hindbrain segmentation, and that it is regulated in a manner different from Hoxb3 in later stages. Together, these findings add further strength to the emerging importance of positive auto- and cross-regulatory interactions between Hox genes as a general mechanism for maintaining their correct spatial patterns in the vertebrate nervous system.

    Development (Cambridge, England) 2001;128;18;3595-607

  • Vanin genes are clustered (human 6q22-24 and mouse 10A2B1) and encode isoforms of pantetheinase ectoenzymes.

    Martin F, Malergue F, Pitari G, Philippe JM, Philips S, Chabret C, Granjeaud S, Mattei MG, Mungall AJ, Naquet P and Galland F

    Centre d'Immunologie de Marseille-Luminy, INSERM-CNRS-Université de la Méditerranée, France.

    The mouse Vanin-1 molecule plays a role in thymic reconstitution following damage by irradiation. We recently demonstrated that it is a membrane pantetheinase (EC 3.56.1.-). This molecule is the prototypic member of a larger Vanin family encoded by at least two mouse (Vanin-1 and Vanin-3) and three human (VNN1, VNN2, VNN3) orthologous genes. We now report (1) the structural characterization of the human and mouse Vanin genes and their organization in clusters on the 6q22-24 and 10A2B1 chromosomes, respectively; (2) identification of the human VNN3 gene and the demonstration that the mouse Vanin-3 molecule is secreted by cells, and (3) that the Vanin genes encode different isoforms of the mammalian pantetheinase activity. Thus, the Vanin family represents a novel class of secreted or membrane-associated ectoenzymes. We discuss here their possible role in processes pertaining to tissue repair in the context of oxidative stress.

    Immunogenetics 2001;53;4;296-306

  • Localization of the gene for distal hereditary motor neuronopathy VII (dHMN-VII) to chromosome 2q14.

    McEntagart M, Norton N, Williams H, Teare MD, Dunstan M, Baker P, Houlden H, Reilly M, Wood N, Harper PS, Futreal PA, Williams N and Rahman N

    Institute of Medical Genetics, University Hospital of Wales, Cardiff CF4 4XW, United Kingdom.

    Distal hereditary motor neuronopathy type VII (dHMN-VII) is an autosomal dominant disorder characterized by distal muscular atrophy and vocal cord paralysis. We performed a genomewide linkage search in a large Welsh pedigree with dHMN-VII and established linkage to chromosome 2q14. Analyses of a second family with dHMN-VII confirmed the location of the gene and provided evidence for a founder mutation segregating in both pedigrees. The maximum three-point LOD score in the combined pedigree was 7.49 at D2S274. Expansion of a polyalanine tract in Engrailed-1, a transcription factor strongly expressed in the spinal cord, was excluded as the cause of dHMN-VII.

    American journal of human genetics 2001;68;5;1270-6

  • From mouse to man: generating megabase chromosome rearrangements.

    Mills AA and Bradley A

    Cold Spring Harbor Laboratory, 1 BungtownRoad, Cold Spring Harbor, NY 11724, USA.

    Experimental approaches for deciphering the function of human genes rely heavily on our ability to generate mutations in model organisms such as the mouse. However, because recessive mutations are masked by the wild-type allele in the diploid context, conventional mutagenesis and screening is often laborious and costly. Chromosome engineering combines the power of gene targeting in embryonic stem (ES) cells with Cre--loxP technology to create mice that are functionally haploid in discrete portions of the genome. Chromosome deletions, duplications and inversions can be tagged with visible markers, facilitating strain maintenance. These approaches allow for more refined mutagenesis screens that will greatly accelerate functional mouse genomics and generate mammalian models for developmental processes and cancer.

    Trends in genetics : TIG 2001;17;6;331-9

  • Live observation of fission yeast meiosis in recombination-deficient mutants: a study on achiasmate chromosome segregation.

    Molnar M, Bähler J, Kohli J and Hiraoka Y

    Institute of Cell Biology, University of Bern, Switzerland.

    Regular segregation of homologous chromosomes during meiotic divisions is essential for the generation of viable progeny. In recombination-proficient organisms, chromosome disjunction at meiosis I generally occurs by chiasma formation between the homologs (chiasmate meiosis). We have studied meiotic stages in living rec8 and rec7 mutant cells of fission yeast, with special attention to prophase and the first meiotic division. Both rec8 and rec7 are early recombination mutants, and in rec7 mutants, chromosome segregation at meiosis I occurs without any recombination (achiasmate meiosis). Both mutants showed distinct irregularities in nuclear prophase movements. Additionally, rec7 showed an extended first division of variable length and with single chromosomes changing back and forth between the cell poles. Two other early recombination deficient mutants (rec14 and rec15) showed very similar phenotypes to rec7 during the first meiotic division, and the fidelity of achiasmate chromosome segregation slightly exceeded the expected random level. We discuss possible regulatory mechanisms of fission yeast to deal with achiasmate chromosome segregation.

    Journal of cell science 2001;114;Pt 15;2843-53

  • Critical assessment of methods of protein structure prediction (CASP): round IV.

    Moult J, Fidelis K, Zemla A and Hubbard T

    Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA.

    Funded by: NIGMS NIH HHS: GM/DK61967; NLM NIH HHS: LM07085

    Proteins 2001;Suppl 5;2-7

  • A high-density transcript map of the human dominant optic atrophy OPA1 gene locus and re-evaluation of evidence for a founder haplotype.

    Murton NJ, French L, Toomes C, Joseph SS, Rehman I, Hopkins BL, Inglehearn CF and Churchill AJ

    Molecular Medicine Unit, St. James's University Hospital, Leeds, UK.

    Dominant optic atrophy (DOA, gene OPA1) is the commonest form of inherited optic atrophy. Linkage studies have shown that a locus for this disease lies in a 1.4-cM region at chromosome 3q28-->q29 and have suggested a founder haplotype for as many as 95% of the linked families. To aid the identification of candidate genes for this disease, we have constructed a Bacterial Artificial Chromosome (BAC) contig covering approximately 3.3 Mb and encompassing the OPA1 critical region (flanking markers D3S3669 and D3S3562). This physical map corrects errors in the marker order reported in the literature, allowing the OPA1 critical region to be precisely defined. A reassessment of the founder effect in the light of the revised marker order suggests that it may not be as significant as had previously been suggested. A high-density transcript map was created by precisely mapping genes and expressed sequence tags (ESTs) from GeneMap'99, that have been loosely assigned to the region by radiation hybrid mapping. One known gene (KIAA0567 protein) and 15 ESTs were found to lie within the minimal disease region. Analysis of the sequence data already available from within the OPA1 critical region allowed the identification and mapping of a further 31 ESTs. The work presented in this study provides the basis for the characterisation of candidate genes and the ultimate identification of the gene mutated in DOA.

    Cytogenetics and cell genetics 2001;92;1-2;97-102

  • Prediction targets of CASP4.

    Murzin A and Hubbard TJ

    Centre for Protein Engineering, MRC Centre, Cambridge, UK.

    Proteins 2001;Suppl 5;8-12

  • CASP2 knowledge-based approach to distant homology recognition and fold prediction in CASP4.

    Murzin AG and Bateman A

    Centre for Protein Engineering, MRC Centre, Cambridge, United Kingdom.

    In 1996, in CASP2, we presented a semimanual approach to the prediction of protein structure that was aimed at the recognition of probable distant homology, where it existed, between a given target protein and a protein of known structure (Murzin and Bateman, Proteins 1997; Suppl 1:105-112). Central to our method was the knowledge of all known structural and probable evolutionary relationships among proteins of known structure classified in the SCOP database (Murzin et al., J Mol Biol 1995;247:536-540). It was demonstrated that a knowledge-based approach could compete successfully with the best computational methods of the time in the correct recognition of the target protein fold. Four years later, in CASP4, we have applied essentially the same knowledge-based approach to distant homology recognition, concentrating our effort on the improvement of the completeness and alignment accuracy of our models. The manifold increase of available sequence and structure data was to our advantage, as well as was the experience and expertise obtained through the classification of these data. In particular, we were able to model most of our predictions from several distantly related structures rather than from a single parent structure, and we could use more superfamily characteristic features for the refinement of our alignments. Our predictions for each of the attempted distant homology recognition targets ranked among the few top predictions for each of these targets, with the predictions for the hypothetical protein HI0065 (T0104) and the C-terminal domain of the ABC transporter MalK (T0121C) being particularly successful. We also have attempted the prediction of protein folds of some of the targets tentatively assigned to new superfamilies. The average quality of our fold predictions was far less than the quality of our distant homology recognition models, but for the two targets, chorismate lyase (T0086) and Appr>p cyclic phosphodiesterase (T0094), our predictions achieved the top ranking.

    Proteins 2001;Suppl 5;76-85

  • Breast cancer genetics: what we know and what we need.

    Nathanson KL, Wooster R, Weber BL and Nathanson KN

    Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    Breast cancer results from genetic and environmental factors leading to the accumulation of mutations in essential genes. Genetic predisposition may have a strong, almost singular effect, as with BRCA1 and BRCA2, or may represent the cumulative effects of multiple low-penetrance susceptibility alleles. Here we review high- and low-penetrance breast-cancer-susceptibility alleles and discuss ongoing efforts to identify additional susceptibility genes. Ultimately these discoveries will lead to individualized breast cancer risk assessment and a reduction in breast cancer incidence.

    Nature medicine 2001;7;5;552-6

  • Localisation of a novel region of recurrent amplification in follicular lymphoma to an approximately 6.8 Mb region of 13q32-33.

    Neat MJ, Foot N, Jenner M, Goff L, Ashcroft K, Burford D, Dunham A, Norton A, Lister TA and Fitzgibbon J

    ICRF Medical Oncology Unit, St. Bartholomew's Hospital, Charterhouse Square, London, UK.

    Follicular lymphoma (FL) is characterised by the presence of the t(14;18)(q32;q21) and represents approximately 25% of new cases of non-Hodgkin's lymphoma. While the t(14;18) is a well-documented rearrangement, the role of secondary cytogenetic abnormalities in the development and progression of these tumours remains unclear. Comparative genomic hybridisation was used to characterise changes in DNA copy number in tumour DNA from patients with this malignancy. The mean numbers of deletion and amplification events found in each of the 45 samples studied were 1.8 and 2.3, respectively. Regions of recurrent (>10% tumour samples) gain involved chromosomes 2p13-16 (16%), 7 (20%), 12 (16%), 13q21-33 (18%), 18 (27%), and X (36%) and frequent losses localised to 6q (29%) and 17p (20%). Amplification of chromosome 13 represents a novel finding in FL. The minimal amplified region was refined to a 6.8-Mb interval of 13q32-33 between the BAC clones 88K16 and 44H20 by fluorescence in situ hybridisation studies using metaphase chromosomes derived from tumour material. There are a number of reports in the literature suggesting that amplification of chromosome 13 also occurs in other human cancers. The location of the putative oncogene on 13q described here in follicular and transformed lymphoma may also be important in the evolution of many other malignancies.

    Genes, chromosomes & cancer 2001;32;3;236-43

  • SSAHA: a fast search method for large DNA databases.

    Ning Z, Cox AJ and Mullikin JC

    Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the "hits" for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.

    Genome research 2001;11;10;1725-9

  • An annotator's view.

    Parkhill J

    The Sanger Centre, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

    Microbiology (Reading, England) 2001;147;Pt 1;2

  • Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18.

    Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J, Churcher C, Mungall KL, Bentley SD, Holden MT, Sebaihia M, Baker S, Basham D, Brooks K, Chillingworth T, Connerton P, Cronin A, Davis P, Davies RM, Dowd L, White N, Farrar J, Feltwell T, Hamlin N, Haque A, Hien TT, Holroyd S, Jagels K, Krogh A, Larsen TS, Leather S, Moule S, O'Gaora P, Parry C, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S and Barrell BG

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Salmonella enterica serovar Typhi (S. typhi) is the aetiological agent of typhoid fever, a serious invasive bacterial disease of humans with an annual global burden of approximately 16 million cases, leading to 600,000 fatalities. Many S. enterica serovars actively invade the mucosal surface of the intestine but are normally contained in healthy individuals by the local immune defence mechanisms. However, S. typhi has evolved the ability to spread to the deeper tissues of humans, including liver, spleen and bone marrow. Here we have sequenced the 4,809,037-base pair (bp) genome of a S. typhi (CT18) that is resistant to multiple drugs, revealing the presence of hundreds of insertions and deletions compared with the Escherichia coli genome, ranging in size from single genes to large islands. Notably, the genome sequence identifies over two hundred pseudogenes, several corresponding to genes that are known to contribute to virulence in Salmonella typhimurium. This genetic degradation may contribute to the human-restricted host range for S. typhi. CT18 harbours a 218,150-bp multiple-drug-resistance incH1 plasmid (pHCM1), and a 106,516-bp cryptic plasmid (pHCM2), which shows recent common ancestry with a virulence plasmid of Yersinia pestis.

    Nature 2001;413;6858;848-52

  • Genome sequence of Yersinia pestis, the causative agent of plague.

    Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, Prentice MB, Sebaihia M, James KD, Churcher C, Mungall KL, Baker S, Basham D, Bentley SD, Brooks K, Cerdeño-Tárraga AM, Chillingworth T, Cronin A, Davies RM, Davis P, Dougan G, Feltwell T, Hamlin N, Holroyd S, Jagels K, Karlyshev AV, Leather S, Moule S, Oyston PC, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S and Barrell BG

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The Gram-negative bacterium Yersinia pestis is the causative agent of the systemic invasive infectious disease classically referred to as plague, and has been responsible for three human pandemics: the Justinian plague (sixth to eighth centuries), the Black Death (fourteenth to nineteenth centuries) and modern plague (nineteenth century to the present day). The recent identification of strains resistant to multiple drugs and the potential use of Y. pestis as an agent of biological warfare mean that plague still poses a threat to human health. Here we report the complete genome sequence of Y. pestis strain CO92, consisting of a 4.65-megabase (Mb) chromosome and three plasmids of 96.2 kilobases (kb), 70.3 kb and 9.6 kb. The genome is unusually rich in insertion sequences and displays anomalies in GC base-composition bias, indicating frequent intragenomic recombination. Many genes seem to have been acquired from other bacteria and viruses (including adhesins, secretion systems and insecticidal toxins). The genome contains around 150 pseudogenes, many of which are remnants of a redundant enteropathogenic lifestyle. The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.

    Nature 2001;413;6855;523-7

  • Insertional events as well as translocations may arise during aberrant immunoglobulin switch recombination in a patient with multiple myeloma.

    Pratt G, Fenton JA, Davies FE, Rawstron AC, Richards SJ, Collins JE, Owen RG, Jack AS, Smith GM and Morgan GJ

    Academic Unit of Haematology and Oncology, Department of Haematology, The General Infirmary at Leeds, UK.

    The majority of patients with multiple myeloma have translocations involving the immunoglobulin heavy chain switch regions on chromosome 14q32 and a promiscuous range of partner chromosomes. We describe a patient with an insertion of 132 bp of chromosome 22q12 sequence into the 5' region flanking S(mu) on chromosome 14q32. The 132 bp region from chromosome 22q12 contains the whole of exon 3 from a novel gene of unknown function in man. The significance of such insertional events remains unclear. The description of insertional events occurring as a result of abnormal switch recombination suggests that, in myeloma, dysregulation of oncogenes may occur by a mechanism other than chromosomal translocation.

    British journal of haematology 2001;112;2;388-91

  • The cluster of BTN genes in the extended major histocompatibility complex.

    Rhodes DA, Stammers M, Malcherek G, Beck S and Trowsdale J

    Department of Immunology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, United Kingdom.

    We sequenced the 170-kb cluster of BTN genes in the extended major histocompatibility complex region, 4 Mb telomeric of human leukocyte antigen class I genes, at 6p22.1. The cluster consists of seven genes belonging to the expanding B7/butyrophilin-like group, a subset of the immunoglobulin gene superfamily. The main complex is composed of six genes, from two subfamilies, BTN2 and BTN3, arranged in pairs. This alternating pattern must have evolved by duplications of an original block of two genes, one from each subfamily. The sequences from the two subfamilies share approximately 50% amino acid identity. By analysis of repeat elements within each block, these duplications may be dated to approximately 100 million years ago, at about the time of the branching of the Rodentia and Primate lineages. The single BTN1A1 (butyrophilin) gene was positioned approximately 25 kb centromeric to the cluster. Each gene covers approximately 12 kb and consists of seven (BTN2 subfamily) or nine (BTN3 subfamily) coding exons. The predicted leader sequence, immunoglobulin-like IgV (variable)/IgC (constant) ectodomains, and the predicted transmembrane domain are encoded on separate exons and are separated from a B30.2 domain by a variable number of very short exons, 21 and 27 nucleotides in length. BTN transcripts were detected in all tissues examined. Alternative splicing, involving particularly the carboxyl-terminal B30.2 domain, was a notable feature. Most transcripts of BTN2 subfamily genes contained this domain, whereas BTN3 genes did not. Using immunofluorescence, we showed surface expression of BTN-green fluorescent protein fusions in mammalian cell transfectants.

    Genomics 2001;71;3;351-62

  • The mouse genome sequence: status and prospects.

    Rogers J and Bradley A

    Genomics 2001;77;3;117-8

  • Analysis of 41 kb of the DNA sequence from the right arm of chromosome II of Schizosaccharomyces pombe.

    Sánchez M, Revuelta JL, del Rey F, Gwilliam R, Skelton J, Churcher C, Rajandream MA, Wood V, Barrell B, Lyne R, Reinhardt R, Borzym K, Beck A, Moreno S and Domínguez A

    Departamento de Microbiología y Genética, Instituto de Microbiología Bioquímica/CSIC. Universidad de Salamanca, 37071 Salamanca, Spain.

    We report the complete sequence of cosmid c18A7 (41 046 bp insert), located on the right arm of chromosome II of the Schizosaccharomyces pombe genome. The sequence, which partially overlaps with cosmids SPBC4F6 and SPBC336, contains 16 open reading frames (ORFs) capable of coding for proteins of at least 100 amino acid residues in length (one partial) and one small nucleolar RNA (snoRNA). Four known genes were found: swi10 (encoding a mating-type switching protein also involved in nucleotide excision repair); dim1 (encoding a dimethyladenosine transferase); arf1 (encoding ADP-ribosylation factor 1); and pol3 (cdc6) the partial fragment, encoding the 125 kDa catalytic subunit of the DNA polymerase type B. Six ORFs similar to known proteins were found. They include a transporter of the major facilitator superfamily class, a vacuolar sorting protein, an asparagine synthase, a nuclear protein, a reticulum oxidoreductin and a heat shock protein. Each protein product of the other six ORFs has conserved domains and can be assigned a molecular, but not a biological, function. The sequence has been submitted to the EMBL database under Accession No. AL080287.

    Yeast (Chichester, England) 2001;18;12;1111-6

  • A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.

    Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D and International SNP Map Working Group

    Cold Spring Harbor, New York 11724, USA.

    We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

    Nature 2001;409;6822;928-33

  • Identification and cloning of Lmairk, a member of the Aurora/Ipl1p protein kinase family, from the human protozoan parasite Leishmania.

    Siman-Tov MM, Ivens AC and Jaffe CL

    Department of Parasitology, Hebrew University, Hadassah Medical School, Jerusalem, Israel.

    Lmairk, a gene encoding a member of the Aurora/Ipl1p family of protein kinases (AIRK), was cloned from the protozoan parasite Leishmania major. Aurora kinases are key enzymes involved in the regulation of normal chromosome segregation during mitosis and cytokenesis of eukaryotic cells. This single-copy gene located on L. major chromosome 28 encodes a 301 amino acid polypeptide. All 11 conserved eukaryotic protein kinase catalytic subdomains are present and the proposed AIRK signature sequence was identified in the activation loop between subdomains VII and VIII. Lmairk is expressed, as an approximately 2.4 kb message, in at least three different species of Leishmania. This report represents the first identification of an AIRK from the trypanosomatid family of early divergent eukaryotes.

    Biochimica et biophysica acta 2001;1519;3;241-5

  • Cyclin D3 is a target gene of t(6;14)(p21.1;q32.3) of mature B-cell malignancies.

    Sonoki T, Harder L, Horsman DE, Karran L, Taniguchi I, Willis TG, Gesk S, Steinemann D, Zucca E, Schlegelberger B, Solé F, Mungall AJ, Gascoyne RD, Siebert R and Dyer MJ

    Academic Department of Haematology and Cytogenetics, Institute of Cancer Research, Sutton, United Kingdom.

    Chromosomal translocation t(6;14)(p21.1;q32.3) has been reported as a rare but recurrent event not only in myeloma and plasma cell leukemia but also in diffuse large B-cell non-Hodgkin lymphoma (B-NHL) (diffuse large B-cell lymphoma [DLBCL]) and splenic lymphoma with villous lymphocytes (SLVL); however, the nature of the target gene(s) has not been determined. This study identified t(6;14)(p21.1;q32.3) in 3 cases of transformed extranodal marginal zone B-NHL, in 1 case of SLVL, and in 1 case of a low-grade B-cell lymphoproliferative disorder. In a sixth case, a CD5(+) DLBCL, the translocation was identified by molecular cloning in the absence of cytogenetically detectable change. Two chromosomal translocation breakpoints were cloned by using long-distance inverse polymerase chain reaction methods. Comparison with the genomic sequence for chromosome 6p21.1 showed breakpoints approximately 59 and 73.5 kilobases 5' of the cyclin D3 (CCND3) gene with no other identifiable transcribed sequences in the intervening region. Although Southern blotting with derived genomic 6p21.1 probes failed to detect other rearrangements, fluorescent in situ hybridization assays, using BAC (bacterial artificial chromosome) clones spanning and flanking the CCND3 locus, along with probes for IGH confirmed localization of 6p21.1 breakpoints within the same region, as well as fusion of the CCND3 and IGH loci. Furthermore, in all cases, high-level expression of CCND3 was demonstrated at RNA and/or protein levels by Northern and Western blotting and by immunohistochemistry. These data implicate CCND3 as a dominant oncogene in the pathogenesis and transformation in several histologic subtypes of mature B-cell malignancies with t(6;14)(p21.1;q32.3) and suggest that CCND3 overexpression seen in about 10% of DLBCL cases may have a genetic basis.

    Funded by: NCI NIH HHS: 5 U01 CA84967-02

    Blood 2001;98;9;2837-44

  • WormBase: network access to the genome and biology of Caenorhabditis elegans.

    Stein L, Sternberg P, Durbin R, Thierry-Mieg J and Spieth J

    Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA.

    WormBase ( is a web-based resource for the Caenorhabditis elegans genome and its biology. It builds upon the existing ACeDB database of the C.elegans genome by providing data curation services, a significantly expanded range of subject areas and a user-friendly front end.

    Funded by: NHGRI NIH HHS: P41HG02223

    Nucleic acids research 2001;29;1;82-6

  • The human gene for mannan-binding lectin-associated serine protease-2 (MASP-2), the effector component of the lectin route of complement activation, is part of a tightly linked gene cluster on chromosome 1p36.2-3.

    Stover C, Endo Y, Takahashi M, Lynch NJ, Constantinescu C, Vorup-Jensen T, Thiel S, Friedl H, Hankeln T, Hall R, Gregory S, Fujita T and Schwaeble W

    Department of Microbiology and Immunology, University of Leicester, Leicester, UK.

    The proteases of the lectin pathway of complement activation, MASP-1 and MASP-2, are encoded by two separate genes. The MASP1 gene is located on chromosome 3q27, the MASP2 gene on chromosome 1p36.23-31. The genes for the classical complement activation pathway proteases, C1r and C1s, are linked on chromosome 12p13. We have shown that the MASP2 gene encodes two gene products, the 76 kDa MASP-2 serine protease and a plasma protein of 19 kDa, termed MAp19 or sMAP. Both gene products are components of the lectin pathway activation complex. We present the complete primary structure of the human MASP2 gene and the tight cluster that this locus forms with non-complement genes. A comparison of the MASP2 gene with the previously characterised C1s gene revealed identical positions of introns separating orthologous coding sequences, underlining the hypothesis that the C1s and MASP2 genes arose by exon shuffling from one ancestral gene.

    Genes and immunity 2001;2;3;119-27

  • Society and the human genome. Sir Frederick Gowland Hopkins Memorial Lecture.

    Sulston J and International Human Genome Sequencing Consortium

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1RQ, U.K.

    In June 2000, the draft sequence of the human genome was announced. It is, and will be for some years, incomplete, but the vast majority is now available. Currently about a third is finished (including two complete chromosomes); the rest has good coverage, but not long-range continuity. First-pass analysis indicates, among other things, fewer genes than expected: about 40000 now looks a likely number. This uncertainty illustrates the difficulty of interpretation: the sequence is not an end in itself, but a resource to be continually reanalysed as our biological understanding increases. That is the scientific reason for releasing it promptly, fully and freely. The social reasons for doing so are even more compelling.

    Biochemical Society transactions 2001;29;Pt 2;27-31

  • A sequence-based integrated map of chromosome 22.

    Tapper WJ, Morton NE, Dunham I, Ke X and Collins A

    Human Genetics Research Division, Southampton General Hospital, Southampton SO16 6YD, UK.

    The near-completion of the sequence for chromosome 22q revolutionizes map integration. We describe a sequence-based integrated map containing 968 loci including 516 known or predicted gene sequences, 317 STSs not included in these sequences, and 135 nonexpressed multinucleotide polymorphisms. The published sequence spans 34.6 Mb, inclusive of gaps estimated to total 1.1 Mb, compared with a top-down estimate of 43 Mb. This discrepancy is discussed, but will not be resolved until more of the genome is analyzed. The radiation hybrid map has 5% error in order and 34% error in location exceeding 1 Mb. The utility of a composite location based on evidence other than sequence is limited to regions not yet sequenced. A genetic map conditional on sequence order was constructed from pairwise lods. Its length of 74.8 cM in males and 80.2 cM in females is slightly less than the previous estimate not constrained by sequence order. Five recombination hot spots are detected, with differences in location between the sexes. Male recombination correlates with repetitive DNA, whereas female recombination does not. It remains to be seen whether this is true for other human chromosomes. An algorithm to improve the fit of cytogenetic bands sequence location reduces the discrepancies in cytogenetic assignment from 61 to 38. This sequence-based integrated map is represented in the genetic location database (LDB2000), which is available at

    Genome research 2001;11;7;1290-5

  • An integrated cytogenetic, radiation-hybrid, and comparative map of dog chromosome 5.

    Thomas R, Breen M, Deloukas P, Holmes NG and Binns MM

    Genetics Section, Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk, CB8 7UU, UK.

    The development of a detailed genome map for the domestic dog (Canis familiaris, CFA) is a prerequisite for the continued use of this species as a model system for the study of inherited traits. We present an integrated cytogenetic, radiation-hybrid, and comparative map of dog Chromosome (Chr) 5 (CFA 5). The map comprises 14 gene markers, selected from loci previously mapped within the corresponding evolutionarily conserved chromosome segments (ECCS) of the human genome. Large-insert clones representing each marker were first isolated and mapped by fluorescence in situ hybridization (FISH) analysis to determine their subchromosomal localization on CFA 5. Thirteen gene markers were subsequently mapped by using a commercially available whole genome radiation hybrid (WG-RH) panel for the dog. Nine anonymous markers were also assigned to CFA 5 by both FISH and WG-RH analysis. The 22 markers formed six RH-linkage groups, spanning each of the four ECCS comprising this 99 megabase chromosome. All cytogenetic, WG-RH, and comparative mapping data were in agreement and were combined to determine both the most likely locus order within each linkage group, and also the gross relative orientation of the corresponding ECCS. This study provides a resource for the transfer of information from the human transcript map to that of the dog, and extends existing data regarding the structural relationships between CFA 5 and its evolutionary counterparts within the human genome.

    Mammalian genome : official journal of the International Mammalian Genome Society 2001;12;5;371-5

  • Secondary DNA structure analysis of the coding strand switch regions of five Leishmania major Friedlin chromosomes.

    Tosato V, Ciarloni L, Ivens AC, Rajandream MA, Barrell BG and Bruschi CV

    Department of Biology, University of Trieste, Italy.

    As part of the EULEISH international genome project, a region of 74,674 nucleotides from chromosome 21 of Leishmania major Friedlin was subcloned and sequenced; and 31 new coding sequences were predicted. Of particular interest was a unique coding strand switching region covering 1.6 kb of DNA; and this was subjected to further investigation. Bioinformatic analysis of this region revealed an unusually high AT composition, a lack of putative hairpins and a strong curvature of the DNA in agreement with the structural characteristics of similar regions of other Leishmania chromosomes. These observations and a comparison with the secondary DNA structure of four other Leishmania chromosomes and chromosomes of different organisms could suggest a functional role of this region in transcription and mitotic division.

    Current genetics 2001;40;3;186-94

  • Salmonella enterica serovar Typhi possesses a unique repertoire of fimbrial gene sequences.

    Townsend SM, Kramer NE, Edwards R, Baker S, Hamlin N, Simmonds M, Stevens K, Maloy S, Parkhill J, Dougan G and Bäumler AJ

    Department of Medical Microbiology and Immunology, College of Medicine, Texas A&M University, College Station, Texas 77843, USA.

    Salmonella enterica serotype Typhi differs from nontyphoidal Salmonella serotypes by its strict host adaptation to humans and higher primates. Since fimbriae have been implicated in host adaptation, we investigated whether the serotype Typhi genome contains fimbrial operons which are unique to this pathogen or restricted to typhoidal Salmonella serotypes. This study established for the first time the total number of fimbrial operons present in an individual Salmonella serotype. The serotype Typhi CT18 genome, which has been sequenced by the Typhi Sequencing Group at the Sanger Centre, contained a type IV fimbrial operon, an orthologue of the agf operon, and 12 putative fimbrial operons of the chaperone-usher assembly class. In addition to sef, fim, saf, and tcf, which had been described previously in serotype Typhi, we identified eight new putative chaperone-usher-dependent fimbrial operons, which were termed bcf, sta, stb, ste, std, stc, stg, and sth. Hybridization analysis performed with 16 strains of Salmonella reference collection C and 22 strains of Salmonella reference collection B showed that all eight putative fimbrial operons of serotype Typhi were also present in a number of nontyphoidal Salmonella serotypes. Thus, a simple correlation between host range and the presence of a single fimbrial operon seems at present unlikely. However, the serotype Typhi genome differed from that of all other Salmonella serotypes investigated in that it contained a unique combination of putative fimbrial operons.

    Funded by: NIAID NIH HHS: AI40124, AI44170

    Infection and immunity 2001;69;5;2894-901

  • Analysis and assessment of comparative modeling predictions in CASP4.

    Tramontano A, Leplae R and Morea V

    Department of Biochemical Sciences "A. Rossi Fanelli," University of Rome "La Sapienza," Rome, Italy.

    We describe here the results of our analysis of the comparative modeling predictions submitted to the fourth round of Critical Assessment of Structure Prediction (CASP4). On the basis of a numerical evaluation of the models, we assessed their ability to predict the overall fold correctly, the relative orientation of domains in multidomain proteins, the conformation of the side chains, the loop regions, and the biologically important residues of the targets. We also discuss the performance of automatic prediction servers and compare the results of CASP4 with those obtained in CASP3.

    Proteins 2001;Suppl 5;22-38

  • The genomic context of natural killer receptor extended gene families.

    Trowsdale J, Barten R, Haude A, Stewart CA, Beck S and Wilson MJ

    Department of Pathology, University of Cambridge, UK.

    The two sets of inhibitory and activating natural killer (NK) receptor genes belong either to the Ig or to the C-type lectin superfamilies. Both are extensive and diverse, comprising genes of varying degrees of relatedness, indicative of a process of iterative duplication. We have constructed gene maps to help understand how and when NK receptor genes developed and the nature of their polymorphism. A cluster of over 15 C-type lectin genes, the natural killer complex is located on human chromosome 12p13.1, syntenic with a region in mouse that borders multiple Ly49 loci. The equivalent locus in man is occupied by a single pseudogene, LY49L. The immunoglobulin superfamily of loci, the leukocyte receptor complex (LRC), on chromosome 19q13.4, contains many polymorphic killer cell immunoglobulin-like receptor (KIR) genes as well as multiple related sequences. These include immunoglobulin-like transcript (ILT) (or leukocyte immunoglobulin-like receptor genes), leukocyte-associated inhibitory receptor genes (LAIR), NKp46, Fc alphaR and the platelet glycoprotein receptor VI locus, which encodes a collagen-binding molecule. KIRs are expressed mostly on NK cells and some T cells. The other LRC loci are more widely expressed. Further centromeric of the LRC are sets of additional loci with weak sequence similarity to the KIRs, including the extensive CD66(CEA) and Siglec families. The LRC-syntenic region in mice contains no orthologues of KIRs. Some of the KIR genes are highly polymorphic in terms of sequence as well as for presence/absence of genes on different haplotypes. Some anchor loci, such as KIR2DL4, are present on most haplotypes. A few ILT loci, such as ILT5 and ILT8, are polymorphic, but only ILT6 exhibits presence/absence variation. This knowledge of the genomic organisation of the extensive NK superfamilies underpins efforts to understand the functions of the encoded NK receptor molecules. It leads to the conclusion that the functional homology of human KIR and mouse Ly49 genes arose by convergent evolution. NK receptor immunogenetics has interesting parallels with the major histocompatibility complex (MHC) in which some of the polymorphic genes are ligands for NK molecules. There are hints of an ancient genetic relationship between NK receptor genes and MHC-paralogous regions on chromosomes 1, 9 and 19. The picture that emerges from both complexes is of eternal evolutionary restlessness, presumably in response to resistance to disease.

    Immunological reviews 2001;181;20-38

  • Matroshka and ectopic polymorphisms: Two new classes of DNA sequence variation identified at the Van der Woude syndrome locus on 1q32-q41.

    Watanabe Y, Murray JC, Bjork BC, Bird CP, Chiang PW, Gregory SG, Kurnit DM and Schutte BC

    Department of Pediatrics, University of Iowa, Iowa City, Iowa, USA.

    Van der Woude syndrome (VWS) is an orofacial clefting disorder with an autosomal dominant pattern of inheritance. In our efforts to clone the VWS gene, 900 kb of genomic sequence from the VWS candidate region at chromosome 1q32-q41 was analyzed for new DNA sequence variants. We observed that in clone CTA-321i20 a 7922 bp sequence is absent relative to the sequence present in PAC clone RP4-782d21 at positions 1669-9590, suggesting the presence of a deletion/insertion (del/ins) polymorphism. Embedded in this 7922 bp region was a TTCC short tandem repeat (STR). Genotype analysis showed that both the internal STR and the (del/ins) mutation were true polymorphisms. This is a novel example of intraallelic variation, a polymorphism within a polymorphism, and we suggest that it be termed a "Matroshka" polymorphism. Further genetic and DNA sequence analysis indicated that the ancestral state of the 1669-9590 del/ins polymorphism was the insertion allele and that the original deletion mutation probably occurred only once. A second class of novel DNA sequence variation was discovered on chromosome 5 that shared a 328 bp identical sequence with this region on chromosome 1. A single nucleotide polymorphism (SNP) was detected by SSCP using a pair of primers derived from the chromosome 1 sequence. Surprisingly, these primers also amplified the identical locus on chromosome 5, and the SNP was only located on chromosome 5. Since the probe unexpectedly detected alleles from another locus, we suggest that this type of sequence variant be termed an "ectopic" polymorphism. These two novel classes of DNA sequence polymorphisms have the potential to confound genetic and DNA sequence analysis and may also contribute to variation in disease phenotypes.

    Funded by: NICHD NIH HHS: HD27748; NIDCR NIH HHS: DE08559, DE13076

    Human mutation 2001;18;5;422-34

  • Detailed molecular analysis of 1p36 in neuroblastoma.

    White PS, Thompson PM, Seifried BA, Sulman EP, Jensen SJ, Guo C, Maris JM, Hogarty MD, Allen C, Biegel JA, Matise TC, Gregory SG, Reynolds CP and Brodeur GM

    Division of Oncology, Children's Hospital of Philadelphia, Pennsylvania 19104-4318, USA.

    Background: Several lines of evidence es tablish that chromosome band 1p36 is frequently deleted in neuroblastoma primary tumors and cell lines, suggesting that a tumor suppressor gene within this region is involved in the development of this tumor.

    Procedure: We analyzed the status of 1p36 in primary neuroblastomas and cell lines to define the region of consistent rearrangement.

    Results: Loss of heterozygosity (LOH) studies of primary neuro blastomas identified allelic loss in 135 of 503 tumors (27%), with the smallest region of overlap (SRO) defined distal to D15214 (1p36.3). No homozygous deletions were detected at 120 loci mapping to 1p36.1-p36.3 in a panel of 46 neuroblastoma cell lines. A recently identified patient with neuroblastoma was found to have a constitutional deletion within 1p36.2-p36.3, and this deletion, when combined with the LOH results, defined a smaller SRO of one megabase within 1p36.3. We constructed a comprehensive integrated map of chromosome 1 containing 11,000 markers and large-insert clones, a high-resolution radiation hybrid (RH) map of 1p36, and a P1-artificial chromosome (PAC) contig spanning the SRO, to further characterize the region of interest. Over 768 kb (75%) of the SRO has been sequenced to completion. Further analysis of distal 1p identified 113 transcripts localizing to 1p36, 21 of which were mapped within the SRO.

    Conclusion: This analysis will identify suitable positional candidate transcripts for mutational screening and subsequent identification of the 1p36.3 neuroblastoma suppressor gene.

    Funded by: NCI NIH HHS: CA-39771

    Medical and pediatric oncology 2001;36;1;37-41

  • Characterization of clustered MHC-linked olfactory receptor genes in human and mouse.

    Younger RM, Amadou C, Bethel G, Ehlers A, Lindahl KF, Forbes S, Horton R, Milne S, Mungall AJ, Trowsdale J, Volz A, Ziegler A and Beck S

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Olfactory receptor (OR) loci frequently cluster and are present on most human chromosomes. They are members of the seven transmembrane receptor (7-TM) superfamily and, as such, are part of one of the largest mammalian multigene families, with an estimated copy number of up to 1000 ORs per haploid genome. As their name implies, ORs are known to be involved in the perception of odors and possibly also in other, nonolfaction-related, functions. Here, we report the characterization of ORs that are part of the MHC-linked OR clusters in human and mouse (partial sequence only). These clusters are of particular interest because of their possible involvement in olfaction-driven mate selection. In total, we describe 50 novel OR loci (36 human, 14 murine), making the human MHC-linked cluster the largest sequenced OR cluster in any organism so far. Comparative and phylogenetic analyses confirm the cluster to be MHC-linked but divergent in both species and allow the identification of at least one ortholog that will be useful for future regulatory and functional studies. Quantitative feature analysis shows clear evidence of duplications of blocks of OR genes and reveals the entire cluster to have a genomic environment that is very different from its neighboring regions. Based on in silico transcript analysis, we also present evidence of extensive long-distance splicing in the 5'-untranslated regions and, for the first time, of alternative splicing within the single coding exon of ORs. Taken together with our previous finding that ORs are also polymorphic, the presented data indicate that the expression, function, and evolution of these interesting genes might be more complex than previously thought.

    Genome research 2001;11;4;519-30

  • Comparison of human genetic and sequence-based physical maps.

    Yu A, Zhao C, Fan Y, Jang W, Mungall AJ, Deloukas P, Olsen A, Doggett NA, Ghebranious N, Broman KW and Weber JL

    Center for Medical Genetics, Marshfield Medical Research Foundation, Wisconsin 54449, USA.

    Recombination is the exchange of information between two homologous chromosomes during meiosis. The rate of recombination per nucleotide, which profoundly affects the evolution of chromosomal segments, is calculated by comparing genetic and physical maps. Human physical maps have been constructed using cytogenetics, overlapping DNA clones and radiation hybrids; but the ultimate and by far the most accurate physical map is the actual nucleotide sequence. The completion of the draft human genomic sequence provides us with the best opportunity yet to compare the genetic and physical maps. Here we describe our estimates of female, male and sex-average recombination rates for about 60% of the genome. Recombination rates varied greatly along each chromosome, from 0 to at least 9 centiMorgans per megabase (cM Mb(-1)). Among several sequence and marker parameters tested, only relative marker position along the metacentric chromosomes in males correlated strongly with recombination rate. We identified several chromosomal regions up to 6 Mb in length with particularly low (deserts) or high (jungles) recombination rates. Linkage disequilibrium was much more common and extended for greater distances in the deserts than in the jungles.

    Nature 2001;409;6822;951-3

  • Engineering chromosomal rearrangements in mice.

    Yu Y and Bradley A

    Program in Developmental Biology, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030,

    The combination of gene-targeting techniques in mouse embryonic stem cells and the Cre/loxP site-specific recombination system has resulted in the emergence of chromosomal-engineering technology in mice. This advance has opened up new opportunities for modelling human diseases that are associated with chromosomal rearrangements. It has also led to the generation of visibly marked deletions and balancer chromosomes in mice, which provide essential reagents for maximizing the efficiency of large-scale mutagenesis efforts and which will accelerate the functional annotation of mammalian genomes, including the human genome.

    Nature reviews. Genetics 2001;2;10;780-90

  • Nonredundant roles of the mPer1 and mPer2 genes in the mammalian circadian clock.

    Zheng B, Albrecht U, Kaasik K, Sage M, Lu W, Vaishnav S, Li Q, Sun ZS, Eichele G, Bradley A and Lee CC

    Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.

    Mice carrying a null mutation in the Period 1 (mPer1) gene were generated using embryonic stem cell technology. Homozygous mPer1 mutants display a shorter circadian period with reduced precision and stability. Mice deficient in both mPer1 and mPer2 do not express circadian rhythms. While mPER2 regulates clock gene expression at the transcriptional level, mPER1 is dispensable for the rhythmic RNA expression of mPer1 and mPer2 and may instead regulate mPER2 at a posttranscriptional level. Studies of clock-controlled genes (CCGs) reveal a complex pattern of regulation by mPER1 and mPER2, suggesting independent controls by the two proteins over some output pathways. Genes encoding key enzymes in heme biosynthesis are under circadian control and are regulated by mPER1 and mPER2. Together, our studies show that mPER1 and mPER2 have distinct and complementary roles in the mouse clock mechanism.

    Cell 2001;105;5;683-94

* quick link -