Sanger Institute - Publications 1999

Number of papers published in 1999: 69

  • The SIS domain: a phosphosugar-binding domain.

    Bateman A

    The Sanger Centre, Wellcome Trust Genome Campus, Cambridge, UK CB10

    Funded by: Wellcome Trust

    Trends in biochemical sciences 1999;24;3;94-5

  • The PLAT domain: a new piece in the PKD1 puzzle.

    Bateman A and Sandford R

    Current biology : CB 1999;9;16;R588-90

  • Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins.

    Bateman A, Birney E, Durbin R, Eddy SR, Finn RD and Sonnhammer EL

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families. Release 3.1 is a major update of the Pfam database and contains 1313 families which are available on the World Wide Web in Europe at and, and in the US at Over 54% of proteins in SWISS-PROT-35 and SP-TrEMBL-5 match a Pfam family. The primary changes of Pfam since release 2.1 are that we now use the more advanced version 2 of the HMMER software, which is more sensitive and provides expectation values for matches, and that it now includes proteins from both SP-TrEMBL and SWISS-PROT.

    Funded by: Wellcome Trust

    Nucleic acids research 1999;27;1;260-2

  • Sequence organisation of the class II region of the human MHC.

    Beck S and Trowsdale J

    Sanger Centre, Cambridge, UK.

    We present the genomic organisation of the extended class II region of the human MHC. This initial sequence, which is nearing completion, spans about 1.2 Mbp and is at present a composite of more than one haplotype. The sequencing of single haplotypes is planned for the future. The current sequence encompasses all of the known class II genes at the DP, DO, DM, DQ and DR loci as well as the transporter associated with antigen processing (TAP)/low molecular weight protein (LMP) antigen processing genes and the Tapasin locus, at the extended centromeric end.

    Funded by: Wellcome Trust

    Immunological reviews 1999;167;201-10

  • From genomics to epigenomics: a loftier view of life.

    Beck S, Olek A and Walter J

    The Sanger Centre, Cambridge, UK.

    Nature biotechnology 1999;17;12;1144

  • THE CAENORHABDITIS ELEGANS GENOME: A Guide in The Post Genomics Age.

    Bird DM, Opperman CH, Jones SJ and Baillie DL

    Plant Nematode Genetics Group, Department of Plant Pathology, North Carolina State University, Box 7616, Raleigh, North Carolina 27695; e-mail:;

    The completion of the entire genome sequence of the free-living nematode, Caenorhabditis elegans is a tremendous milestone in modern biology. Not only will scientists be poring over data mined from this resource, but techniques and methodologies developed along the way have changed the way we can approach biological questions. The completion of the C. elegans genomic sequence will be of particular importance to scientists working on parasitic nematodes. In many cases, these nematode species present intractable challenges to those interested in their biology and genetics. The data already compared from parasites to the C. elegans database reveals a wealth of opportunities for parasite biologists. It is likely that many of the same genes will be present in parasites and that these genes will have similar functions. Additional information regarding differences between free-living and parasitic species will provide insight into the evolution and nature of parasitism. Finally, genetic and genomic approaches to the study of parasitic nematodes now have a clearly marked path to follow.

    Funded by: Wellcome Trust

    Annual review of phytopathology 1999;37;247-65

  • The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum.

    Bowman S, Lawson D, Basham D, Brown D, Chillingworth T, Churcher CM, Craig A, Davies RM, Devlin K, Feltwell T, Gentles S, Gwilliam R, Hamlin N, Harris D, Holroyd S, Hornsby T, Horrocks P, Jagels K, Jassal B, Kyes S, McLean J, Moule S, Mungall K, Murphy L, Oliver K, Quail MA, Rajandream MA, Rutter S, Skelton J, Squares R, Squares S, Sulston JE, Whitehead S, Woodward JR, Newbold C and Barrell BG

    Pathogen Sequencing Unit, Sanger Centre, Wellcome Trust Genome Campus, Hinxton, UK.

    Analysis of Plasmodium falciparum chromosome 3, and comparison with chromosome 2, highlights novel features of chromosome organization and gene structure. The sub-telomeric regions of chromosome 3 show a conserved order of features, including repetitive DNA sequences, members of multigene families involved in pathogenesis and antigenic variation, a number of conserved pseudogenes, and several genes of unknown function. A putative centromere has been identified that has a core region of about 2 kilobases with an extremely high (adenine + thymidine) composition and arrays of tandem repeats. We have predicted 215 protein-coding genes and two transfer RNA genes in the 1,060,106-base-pair chromosome sequence. The predicted protein-coding genes can be divided into three main classes: 52.6% are not spliced, 45.1% have a large exon with short additional 5' or 3' exons, and 2.3% have a multiple exon structure more typical of higher eukaryotes.

    Funded by: Wellcome Trust

    Nature 1999;400;6744;532-8

  • The DAPI banded karyotype of the domestic dog (Canis familiaris) generated using chromosome-specific paint probes.

    Breen M, Bullerdiek J and Langford CF

    Animal Health Trust, Kentford, Suffolk, UK.

    The domestic dog (Canis familiaris) is widely used as a model in the study of human disease. However, many of the 78 chromosomes comprising the canine karyotype are extremely difficult to identify reliably by classical cytogenetics. This has been a major hindrance to molecular cytogenetic studies of this species. The Animal Health Trust and the Sanger Centre have developed a set of canine whole chromosome-specific fluorescence in situ hybridisation (FISH) probes (chromosome paints). We have used these chromosome paints to identify unequivocally each chromosome in a metaphase spread. An increasing number of laboratories are making use of cooled CCD cameras and sophisticated software for FISH mapping. Consequently, there is a major trend towards the use of DAPI banding for concurrent chromosome identification during FISH analyses in a range of species. Here we present, for the first time, a complete DAPI banded karyotype of the dog in which each chromosome has been accurately placed, together with a 460-band DAPI ideogram. These data will facilitate the accurate assignment of FISH-mapped loci to all chromosomes comprising the karyotype and form the basis for an agreed standard of the dog karyotype.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 1999;7;5;401-6

  • FISH mapping and identification of canine chromosomes.

    Breen M, Langford CF, Carter NP, Holmes NG, Dickens HF, Thomas R, Suter N, Ryder EJ, Pope M and Binns MM

    Preventive Medicine, Animal Health Trust, Newmarket, Suffolk, U.K.

    The karyotype of the domestic dog (Canis familiaris) is widely accepted as one of the most difficult mammalian karyotypes to work with. The dog has a total of 78 chromosomes; all 76 autosomes are acrocentric in morphology and show only a gradual decrease in length. Standardization of the canine karyotype has been performed in two stages. The first stage dealt only with chromosomes 1-21 which can be readily identified by conventional G-banding techniques. The remaining 17 autosomal pairs have proven to be very difficult to reliably identify by banding alone. To facilitate the identification of all canine chromosomes, chromosome-specific paint probes have been produced by DOP-PCR from flow-sorted dog chromosomes. Each paint probe has been used for FISH to identify the corresponding chromosome(s), allowing precise identification of all 78 canine chromosomes. The identification of the undesignated 17 autosomal pairs has been agreed upon by the standardization committee during the second stage of their role. Cosmid clones containing microsatellite markers may now be conclusively assigned to their chromosomal origin by simultaneous dual-color FISH with the corresponding paint probe. In this way a collection of chromosome-specific cosmid clones is being constructed, comprising at least one marker per chromosome, which will allow anchoring of existing and future linkage groups to the physical map.

    Funded by: Wellcome Trust

    The Journal of heredity 1999;90;1;27-30

  • Reciprocal chromosome painting reveals detailed regions of conserved synteny between the karyotypes of the domestic dog (Canis familiaris) and human.

    Breen M, Thomas R, Binns MM, Carter NP and Langford CF

    Centre for Preventive Medicine, Animal Health Trust, Lanwades Park, Kentford, CB8 7UU, United Kingdom.

    The domestic dog is increasingly being recognized as a useful model for human disease. The aim of this study was to conduct the first detailed whole-genome comparison of human and dog using bidirectional heterologous chromosome painting (reciprocal Zoo-FISH) analysis. We used whole-chromosome paint probes produced from degenerate oligonucleotide-primed PCR amplification of high-resolution bivariate flow-sorted human and dog chromosomes. No fewer than 68 evolutionarily conserved segments were identified between the dog and the human karyotypes. The use of elongated metaphase chromosomes for both species allowed the boundaries of each evolutionarily conserved segment to be determined to subband resolution. The distribution of conserved segments is discussed, as are the applications of these data in refining the current status of the dog genome map.

    Funded by: Wellcome Trust

    Genomics 1999;61;2;145-55

  • Charting the genome.

    Brooksbank RA

    Molecular medicine today 1999;5;1;7

  • The structure of a PKD domain from polycystin-1: implications for polycystic kidney disease.

    Bycroft M, Bateman A, Clarke J, Hamill SJ, Sandford R, Thomas RL and Chothia C

    MRC Centre for Protein Engineering, Lensfield Road, Cambridge CB2 1EW.

    Most cases of autosomal dominant polycystic kidney disease (ADPKD) are the result of mutations in the PKD1 gene. The PKD1 gene codes for a large cell-surface glycoprotein, polycystin-1, of unknown function, which, based on its predicted domain structure, may be involved in protein-protein and protein-carbohydrate interactions. Approximately 30% of polycystin-1 consists of 16 copies of a novel protein module called the PKD domain. Here we show that this domain has a beta-sandwich fold. Although this fold is common to a number of cell-surface modules, the PKD domain represents a distinct protein family. The tenth PKD domain of human and Fugu polycystin-1 show extensive conservation of surface residues suggesting that this region could be a ligand-binding site. This structure will allow the likely effects of missense mutations in a large part of the PKD1 gene to be determined.

    Funded by: Wellcome Trust

    The EMBO journal 1999;18;2;297-305

  • The hairless gene of the mouse: relationship of phenotypic effects with expression profile and genotype.

    Cachón-González MB, San-José I, Cano A, Vega JA, García N, Freeman T, Schimmang T and Stoye JP

    National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.

    Various mutations of the hairless (hr) gene of mice result in hair loss and other integument defects. To examine the role of the hr gene in mouse development, the expression profile of hr has been determined by in situ hybridisation and correlated to the nature of genetic changes and morphological abnormalities in different mutant animals. Four variant alleles have been characterised at the molecular level. hr/hr mice produce reduced, but significant, levels of hr mRNA whereas other alleles contain mutations which would be expected to preclude the synthesis of functional product, demonstrating a correlation between allelic variation at the hr locus and phenotypic severity. hr expression was shown to be widespread and temporally regulated. It was identified in novel tissues such as cartilage, developing tooth, inner ear, retina, and colon as well as in skin and brain. Analysis of mice homozygous for the rhino allele of hairless revealed that, although no morphological defects were detectable in many tissues normally expressing hr, previously undescribed abnormalities were present in several tissues including inner ear, retina, and colon. These findings indicate that the hairless gene product plays a wider role in development than previously suspected. Dev Dyn 1999;216:113-126.

    Funded by: Wellcome Trust

    Developmental dynamics : an official publication of the American Association of Anatomists 1999;216;2;113-26

  • A new member of the IL-1 receptor family highly expressed in hippocampus and involved in X-linked mental retardation.

    Carrié A, Jun L, Bienvenu T, Vinet MC, McDonell N, Couvert P, Zemni R, Cardona A, Van Buggenhout G, Frints S, Hamel B, Moraine C, Ropers HH, Strom T, Howell GR, Whittaker A, Ross MT, Kahn A, Fryns JP, Beldjord C, Marynen P and Chelly J

    INSERM Unité 129-ICGM, CHU Cochin, 24 Rue du Faubourg Saint Jacques, 75014 Paris, France.

    We demonstrate here the importance of interleukin signalling pathways in cognitive function and the normal physiology of the CNS. Thorough investigation of an MRX critical region in Xp22.1-21.3 enabled us to identify a new gene expressed in brain that is responsible for a non-specific form of X-linked mental retardation. This gene encodes a 696 amino acid protein that has homology to IL-1 receptor accessory proteins. Non-overlapping deletions and a nonsense mutation in this gene were identified in patients with cognitive impairment only. Its high level of expression in post-natal brain structures involved in the hippocampal memory system suggests a specialized role for this new gene in the physiological processes underlying memory and learning abilities.

    Funded by: Wellcome Trust

    Nature genetics 1999;23;1;25-31

  • Genetic definition and sequence analysis of Arabidopsis centromeres.

    Copenhaver GP, Nickel K, Kuromori T, Benito MI, Kaul S, Lin X, Bevan M, Murphy G, Harris B, Parnell LD, McCombie WR, Martienssen RA, Marra M and Preuss D

    University of Chicago, Department of Molecular Genetics and Cell Biology, 1103 East 57 Street, Chicago, IL 60637, USA.

    High-precision genetic mapping was used to define the regions that contain centromere functions on each natural chromosome in Arabidopsis thaliana. These regions exhibited dramatic recombinational repression and contained complex DNA surrounding large arrays of 180-base pair repeats. Unexpectedly, the DNA within the centromeres was not merely structural but also encoded several expressed genes. The regions flanking the centromeres were densely populated by repetitive elements yet experienced normal levels of recombination. The genetically defined centromeres were well conserved among Arabidopsis ecotypes but displayed limited sequence homology between different chromosomes, excluding repetitive DNA. This investigation provides a platform for dissecting the role of individual sequences in centromeres in higher eukaryotes.

    Science (New York, N.Y.) 1999;286;5449;2468-74

  • New collaborations make pharmacogenomics a SNP.

    Dawson E

    Molecular medicine today 1999;5;7;280

  • SNP maps: more markers needed?

    Dawson E

    Molecular medicine today 1999;5;10;419-20

  • Use of cosmid-derived and chromosome-specific canine microsatellites.

    Dickens HF, Holmes NG, Ryder E, Breen M, Thomas R, Suter N, Sampson J, Langford CF, Ross M, Carter NP and Binns MM

    Centre for Preventive Medicine, Animal Health Trust, Suffolk, England.

    The majority of microsatellite markers being used to generate the emerging genetic linkage maps of the dog are derived from small-insert, random clones. While such markers are easy to generate, they have the disadvantage that they cannot easily be physically mapped by fluorescence in situ hybridization (FISH), making it difficult to assess the extent of genome coverage represented by such maps. In contrast, microsatellite markers from large-insert libraries enable the linkage groups within which they fall to be physically anchored to specific chromosomes. One aim of our work is to identify at least one microsatellite-containing cosmid clone for each canine chromosome, to ensure that linkage groups exist for all chromosomes. This is particularly important for a species with as complex a karyotype as the dog. Locating two cosmids on each chromosome would allow the orientation of the linkage groups to be established. Chromosomal locations of cosmid clones containing microsatellites have been determined by FISH and confirmed using canine chromosome-specific paints. Microsatellite sequences have been genotyped on the DogMap reference family. Microsatellites derived from flow-sorted, chromosome-specific libraries represent another source of useful markers. Initial studies have been carried out on the canine X chromosome, on which markers were underrepresented in our initial studies.

    Funded by: Wellcome Trust

    The Journal of heredity 1999;90;1;52-4

  • The DNA sequence of human chromosome 22.

    Dunham I, Shimizu N, Roe BA, Chissoe S, Hunt AR, Collins JE, Bruskiewich R, Beare DM, Clamp M, Smink LJ, Ainscough R, Almeida JP, Babbage A, Bagguley C, Bailey J, Barlow K, Bates KN, Beasley O, Bird CP, Blakey S, Bridgeman AM, Buck D, Burgess J, Burrill WD, O'Brien KP et al.

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Knowledge of the complete genomic DNA sequence of an organism allows a systematic approach to defining its genetic components. The genomic sequence provides access to the complete structures of all genes, including those without known function, their control elements, and, by inference, the proteins they encode, as well as all other biologically important sequences. Furthermore, the sequence is a rich and permanent source of information for the design of further biological studies of the organism and for the study of evolution through cross-species sequence comparison. The power of this approach has been amply demonstrated by the determination of the sequences of a number of microbial and model organisms. The next step is to obtain the complete sequence of the entire human genome. Here we report the sequence of the euchromatic part of human chromosome 22. The sequence obtained consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.

    Nature 1999;402;6761;489-95

  • Analysis of gene expression in single cells.

    Freeman TC, Lee K and Richardson PJ

    Gene Expression Group, Wellcome Trust Genome Campus, The Sanger Centre, Hinxton, CB10 1SA, UK.

    A cell's structural and functional characteristics are dependent on the specific complement of genes it expresses. The ability to study and compare gene usage at the cellular level will therefore provide valuable insights into cell physiology. Such analyses are complicated by problems associated with sample collection, sample size and the limited sensitivity of expression assays. Advances have been made in approaches to the collection of cellular material and the performance of single-cell gene expression analysis. Recent development in global amplification of mRNA may soon permit expression analyses of single cells to be performed on DNA microarrays.

    Current opinion in biotechnology 1999;10;6;579-82

  • [WWWMGS: an integrated server for molecular-genetic studies].

    Frolov AS, Lavriushev SV, Grigorovich DA, Kel AE, Ptitsyn AA, Kolchanov NA, Podkolodnyĭ NL, Solov'ev VV, Milanesi L, Bourne P et al.

    Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, Russia.

    We report an integrative technology for molecular biology studies in the field of transcription regulation by using Internet. A set of databases, programs, and systems are included into WWWMGS Web server. For example, the use of TRRD database information for site prediction is described. Using this method, the computer system SeqAnn was developed. The system performs the "real time" searching for prediction of initiation transcription site position according to database information. WWWMGS is available at URL:

    Biofizika 1999;44;5;832-6

  • Tdr2, a new zebrafish transposon of the Tc1 family.

    Göttgens B, Barton LM, Grafham D, Vaudin M and Green AR

    Department of Haematology, Cambridge University, MRC Centre, Hills Road, Cambridge, UK.

    We describe here Tdr2, a new class of Tc1-like transposons in zebrafish. Tdr2 was identified from the genomic sequence of a zebrafish PAC (P1 artificial chromosome) clone, and fragments of Tdr2 were found in several zebrafish EST (expressed sequence tag) sequences. Predicted translation of the Tdr2 transposase gene showed that it was most closely related to Caenorhabditis elegans Tc3A, suggesting an ancient origin of the Tdr2 transposon. Tdr2 spans 1. 1kb and is flanked by inverted repeats of approx. 100bp. The 5' repeat is itself composed of an inverted repeat, raising the possibility of the formation of a cruciform DNA structure. Tdr2 transposons may facilitate the development of novel transposon-based tools for the genetic analysis of zebrafish.

    Funded by: Wellcome Trust

    Gene 1999;239;2;373-9

  • Localisation of a gene for transient neonatal diabetes mellitus to an 18.72 cR3000 (approximately 5.4 Mb) interval on chromosome 6q.

    Gardner RJ, Mungall AJ, Dunham I, Barber JC, Shield JP, Temple IK and Robinson DO

    Wessex Regional Genetics Laboratory, Salisbury District Hospital, Wiltshire, UK.

    Transient neonatal diabetes mellitus (TNDM) is a rare condition which presents with intrauterine growth retardation, dehydration, and failure to thrive. The condition spontaneously resolves before 1 year of age but predisposes patients to type 2 diabetes later in life. We have previously shown that, in some cases, TNDM is associated with paternal uniparental disomy (UPD) of chromosome 6 and suggested that an imprinted gene responsible for TNDM lies within a region of chromosome 6q. By analysing three families, two with duplications (family A and patient C) and one with several affected subjects with normal karyotypes (family B), we have further defined the TNDM critical region. In patient A, polymorphic microsatellite repeat analysis identified a duplicated region of chromosome 6, flanked by markers D6S472 and D6S311. This region was identified on the Sanger Centre's chromosome 6 radiation hybrid map ( and spanned approximately 60 cR3000. Using markers within the region, 418 unique P1 derived artificial chromosomes (PACs) have been isolated and used to localise the distal breakpoints of the two duplications. Linkage analysis of the familial case with a normal karyotype identified a recombination within the critical region. This recombination has been identified on the radiation hybrid map and defines the proximal end of the region of interest. We therefore propose that an imprinted gene for TNDM lies within an 18.72 cR3000 (approximately 5.4 Mb) interval on chromosome 6q24.1-q24.3 between markers D6S1699 and D6S1010.

    Funded by: Wellcome Trust

    Journal of medical genetics 1999;36;3;192-6

  • New insertion sequences and a novel repeated sequence in the genome of Mycobacterium tuberculosis H37Rv.

    Gordon SV, Heym B, Parkhill J, Barrell B and Cole ST

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, Paris, France.

    The genome sequence of Mycobacterium tuberculosis H37Rv was found to contain 56 loci with homology to insertion sequences (ISs). As well as the previously described IS6110, IS1081, IS1547 and IS-like elements, new ISs belonging to the IS3, IS5, IS21, IS30, IS110, IS256 and ISL3 families were identified. In addition, six ISs created a grouping of their own to form a new family (the IS1535 family). Elements with similarity to ISs in other actinomycetes were identified, suggesting the movement of ISs between related genera. The location of ISs on the chromosome revealed that an approximately 600 kb region close to the origin of replication lacks ISs, pointing to the possible detrimental effect of insertions in this area. Analysis of the distribution of ISs through the tubercle strains Mycobacterium africanum, M. microti, M. bovis, M. bovis BCG Pasteur, M. tuberculosis H37Ra, M. tuberculosis CSU#93 and 29 clinical isolates revealed that only IS1532, IS1533, 1S1534, and IS1561' were absent from some of the strains tested. A novel repeated sequence, the REP13E12 family, is described that is present in seven copies on the M. tuberculosis H37Rv chromosome and which contains a probable phage attachment site. This study therefore offers an insight into the possible role of ISs and repetitive elements in the evolution of the M. tuberculosis genome, as well as identifying genetic markers that may be useful for phylogenetic and epidemiological analysis of the tubercle complex.

    Funded by: Wellcome Trust

    Microbiology (Reading, England) 1999;145 ( Pt 4);881-92

  • A combined approach of conventional and molecular cytogenetics for detailed karyotypic analysis of the small cell lung carcinoma cell line U2020.

    Heppell-Parton AC, Nacheva E, Carter NP and Rabbitts PH

    Medical Research Council Centre, Addenbrooke's Hospital, Cambridge, United Kingdom.

    Until recently the ability to analyze complex karyotypic rearrangements was totally dependent upon light microscopy of G-banded chromosomes. Developments in the area of molecular cytogenetics have revolutionized such analysis, making it possible to determine the nature of complex rearrangements. An extensive analysis has been made of the small cell lung carcinoma (SCLC) cell line U2020, using a combined approach of conventional and molecular cytogenetics, enabling a highly detailed karyotype to be constructed revealing rearrangements previously undetected by G-banding alone. This approach offers the opportunity to reassess other tumor karyotypes, particularly those of high complexity found in solid tumors, for tumor-specific consistent rearrangements indecipherable by conventional karyotyping.

    Funded by: Wellcome Trust

    Cancer genetics and cytogenetics 1999;108;2;110-9

  • Elucidation of the mechanism of homozygous deletion of 3p12-13 in the U2020 cell line reveals the unexpected involvement of other chromosomes.

    Heppell-Parton AC, Nacheva E, Carter NP, Bergh J, Ogilvie D and Rabbitts PH

    Medical Research Council Centre, Cambridge, England, United Kingdom.

    Homozygous deletions in tumor cells have been useful in the localization and validation of tumor suppressor genes. We have described a homozygous deletion in a lung cancer cell line (U2020) which is located within the most proximal of the three regions on the short arm of chromosome 3 believed to be lost in lung cancer development. Construction of a YAC contig map indicates that the deletion spans around 8 Mb, but no large deletion was apparent on conventional cytogenetic analysis of the cell line. To investigate this paradox, whole chromosome, arm-specific, and regional paints have been used. This analysis has revealed that genetic loss has occurred by complex rearrangements of chromosomes 3, rather than simple interstitial deletion. These studies emphasize the power of molecular cytogenetics to disclose unsuspected tumor-specific translocations within the extremely complex karyotypes characteristic of solid tumors.

    Funded by: Wellcome Trust

    Cancer genetics and cytogenetics 1999;111;2;105-10

  • Regulation of hmp gene transcription in Mycobacterium tuberculosis: effects of oxygen limitation and nitrosative and oxidative stress.

    Hu Y, Butcher PD, Mangan JA, Rajandream MA and Coates AR

    Department of Medical Microbiology, St. George's Hospital Medical School, London SW17 0RE, United Kingdom.

    The Mycobacterium tuberculosis hmp gene encodes a protein which is homologous to flavohemoglobin in Escherichia coli. Northern blotting analysis demonstrated that hmp transcription increased when a microaerophilic culture became oxygen limited as it entered stationary phase at 20 days. There was a fivefold increase of the hmp transcripts during early stationary phase compared with the value which was observed in the exponential growth phase. This induction of hmp transcription was not due to changes in the mRNA stability since the half-life of hmp mRNA was very short in a 20-day microaerophilic culture. No induction of hmp mRNA was observed during entry into stationary phase when the culture was continuously aerated. hmp transcription was induced after a short exposure of a late-exponential-phase culture to anaerobic conditions. These data indicate that oxygen limitation is the trigger for hmp gene transcription. In addition, when a microaerophilic culture entered into the stationary phase at 20 days, transcription of hmp increased to a small extent after exposure to S-nitrosoglutathione (a nitric oxide [NO] releaser) and sodium nitroprusside (an NO+ donor) and decreased after exposure to paraquat (a superoxide generator) and H2O2. In log phase (4 days) and late stationary phase (40 days), the transcription of hmp was unaffected by nitrosative and oxidative stress. Three primer extension products were observed. The -10 region is 100% identical to that of promoter T3 in mycobacteria and shows a strong similarity to the -10 sequence of hmp and rpoS promoters in E. coli. These observations of hmp mRNA induction in response to O2 limitation and nitrosative stress suggest that the hmp gene of M. tuberculosis may have a role in protection of the organism from NO killing under microaerophilic conditions.

    Journal of bacteriology 1999;181;11;3486-93

  • RMS/coverage graphs: a qualitative method for comparing three-dimensional protein structure predictions.

    Hubbard TJ

    Sanger Centre, Hinxton, Cambridgeshire, United Kingdom.

    Evaluating a set of protein structure predictions is difficult as each prediction may omit different residues and different parts of the structure may have different accuracies. A method is described that captures the best results from a large number of alternative sequence-dependent structural superpositions between a prediction and the experimental structure and represents them as a single line on a graph. Applied to CASP2 and CASP3 data the best predictions stand out visually in most cases, as judged by manual inspection. The results from this method applied to CASP data are available from the URLs http:/(/)PredictionCenter. and http:/(/) approximately th/casp/.

    Funded by: Wellcome Trust

    Proteins 1999;Suppl 3;15-21

  • SCOP: a Structural Classification of Proteins database.

    Hubbard TJ, Ailey B, Brenner SE, Murzin AG and Chothia C

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of all known proteins structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database, so far. The database can be used as a source of data to calibrate sequence search algorithms and for the generation of population statistics on protein structures. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.

    Funded by: Wellcome Trust

    Nucleic acids research 1999;27;1;254-6

  • The Leishmania genome comes of Age.

    Ivens AC and Blackwell JM

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The Leishmania Genome Network (LGN) was born in Rio de Janeiro, Brazil in 1994. In the short period that has elapsed since then, the LGN has focused solely on the acquisition of the resources, and hence data, that have enabled a rational approach to genomic sequencing of the reference strain, Leishmania major Friedlin. This has now been achieved. In this review, Alasdair Ivens and Jennie Blackwell, secretary and chairman of the LGN, respectively, re-examine the approaches that were adopted, comment on some of the interesting data that have been obtained and introduce some genome-wide approaches that will facilitate functional studies of the parasite.

    Funded by: Wellcome Trust

    Parasitology today (Personal ed.) 1999;15;6;225-31

  • Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs.

    Jareborg N, Birney E and Durbin R

    The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    A data set of 77 genomic mouse/human gene pairs has been compiled from the EMBL nucleotide database, and their corresponding features determined. This set was used to analyze the degree of conservation of noncoding sequences between mouse and human. A new alignment algorithm was developed to cope with the fact that large parts of noncoding sequences are not alignable in a meaningful way because of genetic drift. This new algorithm, DNA Block Aligner (DBA), finds colinear-conserved blocks that are flanked by nonconserved sequences of varying lengths. The noncoding regions of the data set were aligned with DBA. The proportion of the noncoding regions covered by blocks >60% identical was 36% for upstream regions, 50% for 5' UTRs, 23% for introns, and 56% for 3' UTRs. These blocks of high identity were more or less evenly distributed across the length of the features, except for upstream regions in which the first 100 bp upstream of the transcription start site was covered in up to 70% of the gene pairs. This data set complements earlier sets on the basis of cDNA sequences and will be useful for further comparative studies. [This paper contains supplementary data that can be found at [corrected]].

    Funded by: Wellcome Trust

    Genome research 1999;9;9;815-24

  • The human family of Deafness/Dystonia peptide (DDP) related mitochondrial import proteins.

    Jin H, Kendall E, Freeman TC, Roberts RG and Vetrie DL

    GKT Medical School, Guy's Hospital, London, SE1 9RT, United Kingdom.

    The gene responsible for the human genetic neurodegenerative disorder DFN-1/MTS encodes a small protein known as deafness/dystonia peptide (DDP). It bears a strong resemblance to a recently characterized set of zinc-binding yeast proteins (Tim8p, Tim9p, Tim10p, Tim12p, and Tim13p) that are implicated in the import of a class of transmembrane carrier proteins from the cytoplasm to the mitochondrial inner membrane. We describe here the human complement of DDP/Tim-like proteins and establish the likely orthologous relationships between sequences from human, yeast, and other organisms. We also describe the expression patterns and chromosomal locations of their genes, which are candidate loci for autosomal recessive neurodegenerative disorders.

    Funded by: Wellcome Trust

    Genomics 1999;61;3;259-67

  • The chicken B locus is a minimal essential major histocompatibility complex.

    Kaufman J, Milne S, Göbel TW, Walker BA, Jacob JP, Auffray C, Zoorob R and Beck S

    Institute for Animal Health, Compton, UK.

    Here we report the sequence of the region that determines rapid allograft rejection in chickens, the chicken major histocompatibility complex (MHC). This 92-kilobase region of the B locus contains only 19 genes, making the chicken MHC roughly 20-fold smaller than the human MHC. Virtually all the genes have counterparts in the human MHC, defining a minimal essential set of MHC genes conserved over 200 million years of divergence between birds and mammals. They are organized differently, with the class III region genes located outside the class II and class I region genes. The absence of proteasome genes is unexpected and might explain unusual peptide-binding specificities of chicken class I molecules. The presence of putative natural killer receptor gene(s) is unprecedented and might explain the importance of the B locus in the response to the herpes virus responsible for Marek's diseases. The small size and simplicity of the chicken MHC allows co-evolution of genes as haplotypes over considerable periods of time, and makes it possible to study the striking MHC-determined pathogen-specific disease resistance at the molecular level.

    Funded by: Wellcome Trust

    Nature 1999;401;6756;923-5

  • Integrated databases and computer systems for studying eukaryotic gene expression.

    Kolchanov NA, Ponomarenko MP, Frolov AS, Ananko EA, Kolpakov FA, Ignatieva EV, Podkolodnaya OA, Goryachkovskaya TN, Stepanenko IL, Merkulova TI, Babenko VV, Ponomarenko YV, Kochetov AV, Podkolodny NL, Vorobiev DV, Lavryushev SV, Grigorovich DA, Kondrakhin YV, Milanesi L, Wingender E, Solovyev V and Overton GC

    Institute of Cytology & Genetics, Siberian Branch of the Russian Academy of Sciences, Prosp. Lavrentieva 10, Novosibirsk 630090, Russia.

    Motivation: The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences.

    Results: The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins.

    Availability: GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at ++.

    Funded by: NCRR NIH HHS: 5-R01-RR04026-09; Wellcome Trust

    Bioinformatics (Oxford, England) 1999;15;7-8;669-86

  • [GeneExpress: an integrator for databases and computer systems accessible by the Internet and intended for studying eukaryotic gene expression].

    Kolchanov NA, Ponomarenko MP, Kel AE, Kondrakhin IuV, Frolov AS, Kolpakov FA, Goriachkovskaia TN, Kel OV, Anan'ko EA, Ignat'eva EV et al.

    Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, Russia.

    We have developed GeneExpress that is the WWW-oriented integrator for the databases and systems supporting the investigation of gene expression. The total number of the Web-based resources integrated is 30. The database GeneNet on molecular events forming gene networks was assigned its integrative core. To navigate all these WWW-available resources, the SRS, HTML, and Java viewers were developed,

    Biofizika 1999;44;5;837-41

  • Mutations in SLC19A2 cause thiamine-responsive megaloblastic anaemia associated with diabetes mellitus and deafness.

    Labay V, Raz T, Baron D, Mandel H, Williams H, Barrett T, Szargel R, McDonald L, Shalata A, Nosaka K, Gregory S and Cohen N

    Department of Genetics, Tamkin Human Molecular Genetics Research Facility, Technion-Israel Institute of Technology, Bruce Rappaport Faculty of Medicine, Haifa.

    Thiamine-responsive megaloblastic anaemia (TRMA), also known as Rogers syndrome, is an early onset, autosomal recessive disorder defined by the occurrence of megaloblastic anaemia, diabetes mellitus and sensorineural deafness, responding in varying degrees to thiamine treatment (MIM 249270). We have previously narrowed the TRMA locus from a 16-cM to a 4-cM interval on chromosomal region 1q23.3 (refs 3,4) and this region has been further refined to a 1.4-cM interval. Previous studies have suggested that deficiency in a high-affinity thiamine transporter may cause this disorder. Here we identify the TRMA gene by positional cloning. We assembled a P1-derived artificial chromosome (PAC) contig spanning the TRMA candidate region. This clarified the order of genetic markers across the TRMA locus, provided 9 new polymorphic markers and narrowed the locus to an approximately 400-kb region. Mutations in a new gene, SLC19A2, encoding a putative transmembrane protein homologous to the reduced folate carrier proteins, were found in all affected individuals in six TRMA families, suggesting that a defective thiamine transporter protein (THTR-1) may underlie the TRMA syndrome.

    Funded by: Wellcome Trust

    Nature genetics 1999;22;3;300-4

  • Data mining parasite genomes: haystack searching with a computer.

    Lawson D

    Pathogen Sequencing Unit, Sanger Centre, Hinxton, Cambridge, UK.

    A number of genomes of parasitic organisms are presently being sequenced in the public domain, including Plasmodium falciparum, Leishmania major and Trypanosoma brucei with the likelihood of at least expressed sequence tag (EST) projects for several filarial and apicomplexan species. The early and timely release of sequence data to the community via the World Wide Web (www), and the public databases, (EMBL and GENBANK), forms an invaluable resource. Data mining, or 'haystack searching' this resource is becoming more fruitful to all members of the scientific community as the volume of data, diversity of genomes sampled, and accessibility increase.

    Funded by: Wellcome Trust

    Parasitology 1999;118 Suppl;S15-8

  • A molecular cytogenetic clone resource for chromosome 22.

    Leversha MA, Dunham I and Carter NP

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Funded by: Wellcome Trust

    Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 1999;7;7;571-3

  • Novel mode of ligand binding by the SH2 domain of the human XLP disease gene product SAP/SH2D1A.

    Li SC, Gish G, Yang D, Coffey AJ, Forman-Kay JD, Ernberg I, Kay LE and Pawson T

    Program in Molecular Biology and Cancer, Department of Molecular and Medical Genetics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, University of Toronto, Canada.

    Background: The Src homology 2 (SH2) domains of cytoplasmic signaling proteins generally bind phosphotyrosine (pTyr) sites in the context of carboxy-terminal residues. SAP (also known as SH2D1A or DSHP), the product of the gene that is mutated in human X-linked lymphoproliferative (XLP) disease, comprises almost exclusively a single SH2 domain, which may modulate T-cell signaling by engaging T-cell co-activators such as SLAM, thereby blocking binding of other signaling proteins that contain SH2 domains. The SAP-SLAM interaction can occur in a phosphorylation-independent manner.

    Results: To characterize the interaction between SAP and SLAM, we synthesized peptides corresponding to the SAP-binding site at residue Y281 in SLAM. Both phosphorylated and non-phosphorylated versions of an 11-residue SLAM peptide bound SAP, with dissociation constants of 150 nM and 330 nM, respectively. SLAM phosphopeptides that were truncated either at the amino or carboxyl terminus bound with high affinity to SAP, suggesting that the SAP SH2 domain recognizes both amino-terminal and carboxy-terminal sequences relative to the pTyr residue. These results were confirmed by nuclear magnetic resonance (NMR) studies on (15)N- and (13)C-labeled SAP complexed with three SLAM peptides: an amino-terminally truncated phosphopeptide, a carboxy-terminally truncated phosphopeptide and a non-phosphorylated Tyr-containing full-length peptide.

    Conclusions: The SAP SH2 domain has a unique specificity. Not only does it bind peptides in a phosphorylation-independent manner, it also recognizes a pTyr residue either preceded by amino-terminal residues or followed by carboxy-terminal residues. We propose that the three 'prongs' of a peptide ligand (the amino and carboxyl termini and the pTyr) can engage the SAP SH2 domain, accounting for its unusual properties. These data point to the flexibility of modular protein-interaction domains.

    Current biology : CB 1999;9;23;1355-62

  • Isolation and characterization of human and mouse ZIRTL, a member of the IRT1 family of transporters, mapping within the epidermal differentiation complex.

    Lioumi M, Ferguson CA, Sharpe PT, Freeman T, Marenholz I, Mischke D, Heizmann C and Ragoussis J

    Division of Medical and Molecular Genetics, Department of Craniofacial Development, The Guy's, King's College and St. Thomas' Hospitals' Medical and Dental School, London, United Kingdom.

    We report the precise mapping and characterization of ZIRTL (zinc-iron regulated transporter-like) gene, the first mammalian member of an extensive family of divalent metal ion transporters, comprising IRT1 and ZIP1, ZIP2, ZIP3, and ZIP4 in plants and ZRT1 and ZRT2 in yeast. The human gene maps at the telomeric end of the epidermal differentiation complex (EDC), within chromosomal band 1q21, while the mouse gene maps within the mouse EDC, on mouse chromosome 3, between S100A9 and S100A13. The structure of the human gene has been determined, and message was detected in most adult and fetal tissues including the epidermis. The mouse gene is developmentally regulated and found expressed in fetal and adult suprabasal epidermis, osteoblasts, small intestine, and salivary gland.

    Funded by: Wellcome Trust

    Genomics 1999;62;2;272-80

  • Psoriasis upregulated phorbolin-1 shares structural but not functional similarity to the mRNA-editing protein apobec-1.

    Madsen P, Anant S, Rasmussen HH, Gromov P, Vorum H, Dumanski JP, Tommerup N, Collins JE, Wright CL, Dunham I, MacGinnitie AJ, Davidson NO and Celis JE

    Department of Medical Biochemistry, University of Aarhus, Denmark.

    Earlier studies of psoriatic and normal primary keratinocytes treated with phorbol 12-myristate-1-acetate identified two low-molecular-weight proteins, termed phorbolin-1 (20 kDa; pI 6.6) and phorbolin-2 (17.6 kDa; pI 6.5). As a first step towards elucidating the role of these proteins in psoriasis, we report here the molecular cloning and chromosomal mapping of phorbolin-1 and a related cDNA that codes for a protein exhibiting a similar amino acid sequence. The phorbolins were mapped to position 22q13 immediately centromeric to the c-sis proto-oncogene. Transient expression of the phorbolin-1 cDNA in COS cells and by in vitro transcription/translation, yielded polypeptides that comigrated with phorbolins-1 and -2. Comparative sequence analysis revealed 22% overall identity and a similarity of 44% of the phorbolins to apobec-1, the catalytic subunit of the mammalian apolipoprotein B mRNA editing enzyme; however, recombinant-expressed phorbolin-1 exhibited no cytidine deaminase activity, using either a monomeric nucleoside or apolipoprotein B cRNA as substrate, and failed to bind an AU-rich RNA template. Whereas the precise function of the phorbolins remains to be elucidated, the current data suggest that it is unlikely to include a role in the post-transcriptional modification of RNA in a manner analogous to that described for apobec-1.

    Funded by: NHLBI NIH HHS: HL-38180; NIDDK NIH HHS: DK-42030; Wellcome Trust

    The Journal of investigative dermatology 1999;113;2;162-9

  • The isolation and high-resolution chromosomal mapping of human SOX14 and SOX21; two members of the SOX gene family related to SOX1, SOX2, and SOX3.

    Malas S, Duthie S, Deloukas P and Episkopou V

    MRC Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital, London W12 ONN, UK.

    Funded by: Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 1999;10;9;934-7

  • NME6: a new member of the nm23/nucleoside diphosphate kinase gene family located on human chromosome 3p21.3.

    Mehus JG, Deloukas P and Lambeth DO

    Department of Biochemistry and Molecular Biology, University of North Dakota School of Medicine and Health Sciences, Grand Forks 58202, USA.

    The NME (nm23/nucleoside diphosphate kinase) gene family in human is involved in the phosphorylation of nucleoside diphosphates and a variety of regulatory phenomena associated with development, oncogenic transformation, and metastasis. Here we report the cDNA sequence for a sixth member of this family, NME6. The cDNA sequence predicts a 186-residue protein that includes the characteristic active site motif of a nucleoside diphosphate (NDP) kinase, as well as the other residues previously identified as crucial for nucleotide binding and catalysis. The NME6 protein sequence is only 34-41% identical to the five previously reported human NME proteins, and is similarly related to prokaryotic and primitive eukaryotic NDP kinases. Compared to typical proteins of this family such as NME1 and NME2, NME6 has three additional residues located in the Kpn loop, and a 22-residue extension at the COOH-terminal. Using radiation hybrid mapping, the NME6 gene was localized to chromosome 3p21.3. The 1.3-kb transcript of NME6 is expressed at a moderately low level in many human tissues, and is most abundant in kidney, prostate, ovary, intestine, and spleen. Homologous cDNAs were also cloned and sequenced for rat and mouse. The sequence of the first 171 residues of the mouse homologue (Nm23-M6) is 94% identical to the deduced human NME6 protein.

    Funded by: NIGMS NIH HHS: GM54260

    Human genetics 1999;104;6;454-9

  • Techview: DNA sequencing. Sequencing the genome, fast.

    Mullikin JC and McMurragy AA

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs, UK.

    Science (New York, N.Y.) 1999;283;5409;1867-9

  • A YAC-based physical map of the mouse genome.

    Nusbaum C, Slonim DK, Harris KL, Birren BW, Steen RG, Stein LD, Miller J, Dietrich WF, Nahf R, Wang V, Merport O, Castle AB, Husain Z, Farino G, Gray D, Anderson MO, Devine R, Horton LT, Ye W, Wu X, Kouyoumjian V, Zemsteva IS, Wu Y, Collymore AJ, Courtney DF, Tam J, Cadman M, Haynes AR, Heuston C, Marsland T, Southwell A, Trickett P, Strivens MA, Ross MT, Makalowski W, Xu Y, Boguski MS, Carter NP, Denny P, Brown SD, Hudson TJ and Lander ES

    Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA.

    A physical map of the mouse genome is an essential tool for both positional cloning and genomic sequencing in this key model system for biomedical research. Indeed, the construction of a mouse physical map with markers spaced at an average interval of 300 kb is one of the stated goals of the Human Genome Project. Here we report the results of a project at the Whitehead Institute/MIT Center for Genome Research to construct such a physical map of the mouse. We built the map by screening sequenced-tagged sites (STSs) against a large-insert yeast artificial chromosome (YAC) library and then integrating the STS-content information with a dense genetic map. The integrated map shows the location of 9,787 loci, providing landmarks with an average spacing of approximately 300 kb and affording YAC coverage of approximately 92% of the mouse genome. We also report the results of a project at the MRC UK Mouse Genome Centre targeted at chromosome X. The project produced a YAC-based map containing 619 loci (with 121 loci in common with the Whitehead map and 498 additional loci), providing especially dense coverage of this sex chromosome. The YAC-based physical map directly facilitates positional cloning of mouse mutations by providing ready access to most of the genome. More generally, use of this map in addition to a newly constructed radiation hybrid (RH) map provides a comprehensive framework for mouse genomic studies.

    Funded by: Wellcome Trust

    Nature genetics 1999;22;4;388-93

  • Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction.

    Orengo CA, Bray JE, Hubbard T, LoConte L and Sillitoe I

    Department of Biochemistry and Molecular Biology, University College, London, United Kingdom.

    CASP3 saw a substantial increase in the volume of ab initio 3D prediction data, with 507 datasets for fifteen selected targets and sixty-one groups participating. As with CASP2, methods ranged from computationally intensive strategies that attempt to recreate the physical and chemical forces involved in protein folding to the more recent knowledge-based approaches. These exploit information from the structure databases, extracting potentially similar fragments and/or distance constraints derived from multiple sequence alignments. The knowledge-based approaches generally gave more consistently successful predictions across the range of targets, particularly that of the Baker group (Bystroff and Baker, J Mol Biol 1998;281:565-577; Simons et al. Proteins Suppl 1999;3:171-176), which used a fragment library. In the secondary structure prediction category, the most successful approaches built on the concepts used in PHD (Rost et al. Comput Appl Biosci 1994;10:53-60), an accepted standard in this field. Like PHD, they exploit neural networks but have different strategies for incorporating multiple sequence data or position-dependent weight matrices for training the networks. Analysis of the contact data, for which only six groups participated, suggested that as yet this data provides a rather weak signal. However, in combination with other types of prediction data it can sometimes be a useful constraint for identifying the correct structure.

    Funded by: Wellcome Trust

    Proteins 1999;Suppl 3;149-70

  • 'Going wrong with confidence': misleading sequence analyses of CiaB and clpX.

    Pallen M, Wren B and Parkhill J

    Molecular microbiology 1999;34;1;195

  • The human LARGE gene from 22q12.3-q13.1 is a new, distinct member of the glycosyltransferase gene family.

    Peyrard M, Seroussi E, Sandberg-Nordqvist AC, Xie YG, Han FY, Fransson I, Collins J, Dunham I, Kost-Alimova M, Imreh S and Dumanski JP

    Department of Molecular Medicine, Karolinska Hospital, S-171 76 Stockholm, Sweden.

    Meningioma, a tumor of the meninges covering the central nervous system, shows frequent loss of material from human chromosome 22. Homozygous and heterozygous deletions in meningiomas defined a candidate region of >1 Mbp in 22q12.3-q13.1 and directed us to gene cloning in this segment. We characterized a new member of the N-acetylglucosaminyltransferase gene family, the LARGE gene. It occupies >664 kilobases and is one of the largest human genes. The predicted 756-aa N-acetylglucosaminyltransferase encoded by LARGE displays features that are absent in other glycosyltransferases. The human like-acetylglucosaminyltransferase polypeptide is much longer and contains putative coiled-coil domains. We characterized the mouse LARGE ortholog, which encodes a protein 97.75% identical with the human counterpart. Both genes reveal ubiquitous expression as assessed by Northern blot analysis and in situ histochemistry. Chromosomal mapping of the mouse gene reveals that mouse chromosome 8C1 corresponds to human 22q12.3-q13.1. Abnormal glycosylation of proteins and glycosphingolipids has been shown as a mechanism behind an increased potential of tumor formation and/or progression. Human tumors overexpress ganglioside GD3 (NeuAcalpha2,8NeuAcalpha2, 3Galbeta1,4Glc-Cer), which in meningiomas correlates with deletions on chromosome 22. It is the first time that a glycosyltransferase gene is involved in tumor-specific genomic rearrangements. An abnormal function of the human like-acetylglucosaminyltransferase protein may be linked to the development/progression of meningioma by altering the composition of gangliosides and/or by effect(s) on other glycosylated molecules in tumor cells.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 1999;96;2;598-603

  • Genetic basis for lipopolysaccharide O-antigen biosynthesis in bordetellae.

    Preston A, Allen AG, Cadisch J, Thomas R, Stevens K, Churcher CM, Badcock KL, Parkhill J, Barrell B and Maskell DJ

    Centre for Veterinary Science, Department of Clinical Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, Cambridge CB10 1SA, United Kingdom.

    Bordetella bronchiseptica and Bordetella parapertussis express a surface polysaccharide, attached to a lipopolysaccharide, which has been called O antigen. This structure is absent from Bordetella pertussis. We report the identification of a large genetic locus in B. bronchiseptica and B. parapertussis that is required for O-antigen biosynthesis. The locus is replaced by an insertion sequence in B. pertussis, explaining the lack of O-antigen biosynthesis in this species. The DNA sequence of the B. bronchiseptica locus has been determined and the presence of 21 open reading frames has been revealed. We have ascribed putative functions to many of these open reading frames based on database searches. Mutations in the locus in B. bronchiseptica and B. parapertussis prevent O-antigen biosynthesis and provide tools for the study of the role of O antigen in infections caused by these bacteria.

    Funded by: Wellcome Trust

    Infection and immunity 1999;67;8;3763-7

  • Sequencing. Gels and genomes.

    Rogers J

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Science (New York, N.Y.) 1999;286;5439;429

  • Molecular analysis of a spontaneous dystrophin 'knockout' dog.

    Schatzberg SJ, Olby NJ, Breen M, Anderson LV, Langford CF, Dickens HF, Wilton SD, Zeiss CJ, Binns MM, Kornegay JN, Morris GE and Sharp NJ

    College of Veterinary Medicine, North Carolina State University, Raleigh 27606, USA.

    We have determined the molecular basis for skeletal myopathy and dilated cardiomyopathy in two male German short-haired pointer (GSHP) littermates. Analysis of skeletal muscle demonstrated a complete absence of dystrophin on Western blot analysis. PCR analysis of genomic DNA revealed a deletion encompassing the entire dystrophin gene. Molecular cytogenetic analysis of lymphocytes from the dam and both dystrophic pups confirmed a visible deletion in the p21 region of the affected canine X chromosome. Utrophin is up-regulated in the skeletal muscle, but does not appear to ameliorate the dystrophic canine phenotype. This new canine model should further our understanding of the physiological and biochemical processes in Duchenne muscular dystrophy.

    Funded by: Wellcome Trust

    Neuromuscular disorders : NMD 1999;9;5;289-95

  • [Molecular cloning of some components of the translation apparatus of fission yeast Schizosaccharomyces pombe and a list of its cytoplasm ic proteins genes].

    Shpakovskiĭ GV, Baranova GM, Wood V, Gwilliam RG, Shematorova EK, Korol'chuk OL and Lebedenko EN

    Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.

    Full-length cDNAs of four new genes encoding cytoplasmic ribosomal proteins L14 and L20 (large ribosomal subunit) and S1 and S27 (small ribosomal subunit) were isolated and sequenced during the analysis of the fission yeast Schizosaccharomyces pombe genome. One of the Sz. pombe genes encoding translation elongation factor EF-2 was also cloned and its precise position on chromosome I established. A unified nomenclature was proposed, and the list of all known genetic determinants encoding cytoplasmic ribosomal proteins of Sz. pombe was compiled. By now, 76 genes/cDNAs encoding different ribosomal proteins have been identified in the fission yeast genome. Among them, 35 genes are duplicated and three homologous genes are identified for each of the ribosomal proteins L2, L16, P1, and P2.

    Bioorganicheskaia khimiia 1999;25;6;450-63

  • Nonmethylated transposable elements and methylated genes in a chordate genome.

    Simmen MW, Leitgeb S, Charlton J, Jones SJ, Harris BR, Clark VH and Bird A

    Institute of Cell and Molecular Biology, University of Edinburgh, The King's Buildings, Edinburgh EH9 3JR, UK.

    The genome of the invertebrate chordate Ciona intestinalis was found to be a stable mosaic of methylated and nonmethylated domains. Multiple copies of an apparently active long terminal repeat retrotransposon and a long interspersed element are nonmethylated and a large fraction of abundant short interspersed elements are also methylation free. Genes, by contrast, are predominantly methylated. These data are incompatible with the genome defense model, which proposes that DNA methylation in animals is primarily targeted to endogenous transposable elements. Cytosine methylation in this urochordate may be preferentially directed to genes.

    Funded by: Wellcome Trust

    Science (New York, N.Y.) 1999;283;5405;1164-7

  • The Caenorhabditis elegans genes egl-27 and egr-1 are similar to MTA1, a member of a chromatin regulatory complex, and are redundantly required for embryonic patterning.

    Solari F, Bateman A and Ahringer J

    University of Cambridge, Department of Genetics, Downing Street, Cambridge CB2 3EH, UK.

    We show here that two functionally redundant Caenorhabditis elegans genes, egl-27 and egr-1, have a fundamental role in embryonic patterning. When both are inactivated, cells in essentially all regions of the embryo fail to be properly organised. Tissue determination and differentiation are unaffected and many zygotic patterning genes are expressed normally, including HOX genes. However, hlh-8, a target of the HOX gene mab-5, is not expressed. egl-27 and egr-1 are members of a gene family that includes MTA1, a human gene with elevated expression in metastatic carcinomas. MTA1 is a component of a protein complex with histone deacetylase and nucleosome remodelling activities. We propose that EGL-27 and EGR-1 function as part of a chromatin regulatory complex required for the function of regional patterning genes.

    Funded by: Wellcome Trust

    Development (Cambridge, England) 1999;126;11;2483-94

  • INFOGENE: a database of known gene structures and predicted genes and proteins in sequences of genome sequencing projects.

    Solovyev VV and Salamov AA

    The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK.

    INFOGENE is a database of known and predicted gene structures with descriptions of basic functional signals and gene components. It provides a possibility to create compilations of sequences with a given gene feature as well as to accumulate and analyze predicted genes in finished and unfinished sequences from genome sequencing projects. Protein sequence similarity searches in the database of predicted proteins is offered through the BLASTP program. INFOGENE is realized under the Sequence Retrieval System that provides useful links with the other informational databases. The database is available through the WWW server of the Computational Genomics Group at

    Funded by: Wellcome Trust

    Nucleic acids research 1999;27;1;248-50

  • Comparative analyses of the Dominant megacolon-SOX10 genomic interval in mouse and human.

    Southard-Smith EM, Collins JE, Ellison JS, Smith KJ, Baxevanis AD, Touchman JW, Green ED, Dunham I and Pavan WJ

    Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, Maryland 20892-4470, USA.

    Funded by: Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 1999;10;7;744-9

  • High-resolution landmark framework for the sequence-ready mapping of Xq23-q26.1.

    Steingruber HE, Dunham A, Coffey AJ, Clegg SM, Howell GR, Maslen GL, Scott CE, Gwilliam R, Hunt PJ, Sotheran EC, Huckle EJ, Hunt SE, Dhami P, Soderlund C, Leversha MA, Bentley DR and Ross MT

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, UK.

    We have established a landmark framework map over 20-25 Mb of the long arm of the human X chromosome using yeast artificial chromosome (YAC) clones. The map has approximately one landmark per 45 kb of DNA and stretches from DXS7531 in proximal Xq23 to DXS895 in proximal Xq26, connecting to published framework maps on its proximal and distal sides. There are three gaps in the framework map resulting from the failure to obtain clone coverage from the YAC resources available. Estimates of the maximum sizes of these gaps have been obtained. The four YAC contigs have been positioned and oriented using somatic-cell hybrids and fluorescence in situ hybridization, and the largest is estimated to cover approximately 15 Mb of DNA. The framework map is being used to assemble a sequence-ready map in large-insert bacterial clones, as part of an international effort to complete the sequence of the X chromosome. PAC and BAC contigs currently cover 18 Mb of the region, and from these, 12 Mb of finished sequence is available.

    Genome research 1999;9;8;751-62

  • Gene organisation, sequence variation and isochore structure at the centromeric boundary of the human MHC.

    Stephens R, Horton R, Humphray S, Rowen L, Trowsdale J and Beck S

    Immunology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK.

    We have mapped and sequenced the region immediately centromeric of the human major histocompatibility complex (MHC). A cluster of 13 genes/pseudogenes was identified in a 175 kb PAC linking the TAPASIN locus with the class II region. It includes two novel human genes (BING4 and SACM2L) and a thus far unnoticed human leucocyte antigen (HLA) class II pseudogene, termed HLA-DPA3. Analysis of the G+C content revealed an isochore boundary which, together with the previously reported telomeric boundary, defines the MHC class II region as one of the first completely sequenced isochores in the human genome. Comparison of the sequence with limited sequence from other cell lines shows that the high sequence variation found within the classical class II region extends beyond the identified isochore boundary leading us to propose the concept of an "extended MHC". By comparative analysis, we have precisely identified the mouse/human synteny breakpoint at the centromeric end of the extended MHC class II region between the genes HSET and PHF1.

    Funded by: Wellcome Trust

    Journal of molecular biology 1999;291;4;789-99

  • Analysis of the proteome of Mycobacterium tuberculosis in silico.

    Tekaia F, Gordon SV, Garnier T, Brosch R, Barrell BG and Cole ST

    Unite de Génétique Moléculaire des Levures, Institut Pasteur, Paris, France.

    Novel bioinformatics routines have been used to provide a more detailed definition of the proteome of Mycobacterium tuberculosis H37Rv. Over half of the current proteins result from gene duplication or domain shuffling events while one-sixth show no similarity to polypeptides described in other organisms. Prominent among the genes that appear to have been duplicated on numerous occasions are those involved in fatty acid metabolism, regulation of gene expression, and the unusually glycine-rich PE and PPE proteins. Protein similarity analysis, coupled with inspection of the genetic neighbourhood, was used to explore possible functional relatedness. This uncovered four large mce operons whose proteins may mediate initial interactions between the tubercle bacillus and host cells, together with a cluster of genes that might encode components of a structure required for secretion of ESAT-6 like proteins. Close linkage of the mmpL genes, encoding large membrane proteins, with those required for fatty acid metabolism suggests involvement in lipid transport. Compared to free-living bacteria, M. tuberculosis has a significantly smaller transport protein repertoire and this may reflect its intracellular lifestyle.

    Tubercle and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease 1999;79;6;329-42

  • Zoo-FISH analysis of dog chromosome 5: identification of conserved synteny with human and cat chromosomes.

    Thomas R, Breen M, Langford CF and Binns MM

    Centre for Preventive Medicine, Animal Health Trust, Newmarket, Suffolk, UK.

    Conserved segments of synteny between the human genome and chromosome 5 (CFA 5) of the domestic dog (Canis familiaris) have been identified by reciprocal chromosome painting analysis. A CFA 5 paint probe was applied to human metaphase spreads, revealing distinct hybridisation sites on human (HSA) chromosomes 1, 11, 16, and 17. Paint probes for these human chromosomes were then hybridised to dog metaphase spreads, identifying the regions of CFA 5 with which homology is shared with the corresponding human chromosome. Application of the CFA 5 paint probe to metaphase spreads of the domestic cat (Felis catus, FCA) demonstrated hybridisation to cat chromosomes C1, D1, E1, and E2. Dog PCR primers for type 1 markers known to lie in the corresponding regions of HSA 11, 16, and 17 were used to isolate dog BAC clones representing four genes. Fluorescence in situ hybridisation analysis confirmed their localisation to CFA 5 and suggested that two of the conserved segments lie in opposing orientations on CFA 5, compared to the human chromosome concerned. A third segment appears to lie in the same orientation on both human and dog chromosomes. No suitable gene markers were available for analysis of the fourth segment. The significance of these findings is discussed with reference to current and future dog genome mapping efforts.

    Funded by: Wellcome Trust

    Cytogenetics and cell genetics 1999;87;1-2;4-10

  • Detection of polymorphism in the RING3 gene by high-throughput fluorescent SSCP analysis.

    Thorpe KL, Schafer AJ, Génin E, Trowsdale J and Beck S

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    We describe the use of a high-throughput, fluorescent, polymorphism-detection system, based on single-strand conformation polymorphism to screen for polymorphism in the RING3 gene. This is the first extensive mutation screen of this major histocompatibility complex-linked gene, and the entire coding region and intron-exon junctions were examined by multiplexing over 3000 polymerase chain reaction products. These techniques should be applicable for analysis of variation in other human genes. Investigation of DNA from acute lymphoblastic leukemia (ALL) and chronic myeloid leukemia (CML) patients, as well as healthy individuals revealed low levels of polymorphism across the RING3 gene. Comparison of the distribution of genotypes at each polymorphic site between patients and healthy individuals revealed a single site which significantly deviates from Hardy-Weinberg proportions.

    Funded by: Wellcome Trust

    Immunogenetics 1999;49;4;256-65

  • Mapping and complex expression pattern of the human NPAP60L nucleoporin gene.

    Trichet V, Shkolny D, Dunham I, Beare D and McDermid HE

    Department of Biological Sciences, University of Alberta, Edmonton, Canada.

    From a clone mapping to human chromosome 22q13.3, we have identified NPAP60L, the human homolog of the rat nuclear pore-associated protein gene, Npap60. The expression pattern of the human copy is much more complex that that of the rat, although conservation of the potential specific function of NPAP60L in male germ cells can be seen for one of the five transcripts. The exon-intron organization of the NPAP60L gene shows the presence of at least three alternate 3' ends, and Northern analysis indicates the possible presence of alternate 5' ends. Somatic cell hybrid mapping revealed additional related copies of NPAP60L on human chromosomes 5, 6, and 14, although it is not known if these are functional genes.

    Funded by: Wellcome Trust

    Cytogenetics and cell genetics 1999;85;3-4;221-6

  • Characterization of the Lmo4 gene encoding a LIM-only protein: genomic organization and comparative chromosomal mapping.

    Tse E, Grutz G, Garner AA, Ramsey Y, Carter NP, Copeland N, Gilbert DJ, Jenkins NA, Agulnick A, Forster A and Rabbitts TH

    MRC Laboratory of Molecular Biology, Division of Protein and Nucleic Acid Chemistry, Hills Road, Cambridge CB2 2 QH, UK.

    LIM-only (LMO) proteins are transcription regulators that function by mediating protein-protein interaction and include the T cell oncogenes encoding LMO1 and LMO2. The oncogenic functions of LMO1 and LMO2 are thought to be mediated by interaction with LDB1 since they form a multimeric protein complex(es). A new member of the Lmo family, Lmo4, has also recently been identified via its interaction with Ldb1. Sequence analysis of the mouse Lmo4 gene shows that it spans about 18 kb and consists of at least six exons, including two alternatively spliced 5' exons. Unlike Lmo1, the two 5' exons of Lmo4 do not encode protein. Comparison of the Lmo4 gene structure with the other LMO family members shows the exon structure of Lmo4 differs in the position of exon junctions encoding the second LIM domain and in a novel exon-intron junction at the penultimate codon of the gene. Lmo4 is thus the least conserved known member of the LIM-only family in both nucleotide sequence and exon structure. Physical mapping of the Lmo4/LMO4 genes has shown mouse Lmo4 is located on Chromosome (Chr) 3 and human LMO4 on Chr 1p22.3. This chromosome location is of interest as it occurs in a region that is deleted in a number of human cancers, indicating a possible role of LMO4 in tumorigenesis, like its relatives LMO1 and LMO2.

    Funded by: Wellcome Trust

    Mammalian genome : official journal of the International Mammalian Genome Society 1999;10;11;1089-94

  • Identification and characterization of the human homologue (RAI2) of a mouse retinoic acid-induced gene in Xp22.

    Walpole SM, Hiriyana KT, Nicolaou A, Bingham EL, Durham J, Vaudin M, Ross MT, Yates JR, Sieving PA and Trump D

    Department of Medical Genetics, University of Cambridge, Cambridge Institute for Medical Research, Addenbrooke's Hospital, United Kingdom.

    We have identified a novel human gene during studies of a 1.3-Mb region of Xp22 between DXS418 and DXS999. A PAC contig spanning the region was constructed, sequenced, and analyzed by gene and exon prediction programs and by homology searches. Further investigation of predicted exons from PAC clone 389A20 led to the identification of a single-exon gene, designated RAI2 (retinoic acid-induced 2). RAI2 mapped 28 kb centromeric to marker DXS7996, between DXS7996 and DXS7997, and was transcribed from centromere to telomere. Northern blot analysis and reverse transcription-polymerase chain reaction analysis revealed expression of a 2.5-kb transcript in four fetal tissues (brain, lung, kidney, and heart) and eight adult tissues (heart, brain, placenta, lung, skeletal muscle, kidney, pancreas, and retina) but not in fetal or adult liver. The 530-amino-acid protein (57 kDa predicted mass) displays 94% homology with a mouse retinoic acid-induced gene product and contains a novel proline-rich (39%) domain of 68 amino acids. Retinoic acid is involved in vertebrate anteroposterior axis formation and cellular differentiation and has been shown to modulate gene expression controlling early embryonal development, suggesting a developmental role for RAI2. RAI2 remains a candidate gene for diseases mapping to the Xp22 region.

    Funded by: NEI NIH HHS: R01-EY10259; Wellcome Trust

    Genomics 1999;55;3;275-83

  • Report of the fifth international workshop on human chromosome 1 mapping 1999.

    White PS, Forus A, Matise TC, Schutte BC, Spieker N, Stanier P, Vance JM and Gregory SG

    Division of Oncology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104-4318, USA.

    Cytogenetics and cell genetics 1999;87;3-4;143-71

  • How the worm was won. The C. elegans genome sequencing project.

    Wilson RK

    Washington University Genome Sequencing Center, St Louis, MO 63108, USA.

    The genome sequence of the free-living nematode Caenorhabditis elegans is nearly complete, with resolution of the final difficult regions expected over the next few months. This will represent the first genome of a multicellular organism to be sequenced to completion. The genome is approximately 97 Mb in total, and encodes more than 19,099 proteins, considerably more than expected before sequencing began. The sequencing project--a collaboration between the Genome Sequencing Center in St Louis and the Sanger Centre in Hinxton--has lasted eight years, with the majority of the sequence generated in the past four years. Analysis of the genome sequence is just beginning and represents an effort that will undoubtedly last more than another decade. However, some interesting findings are already apparent, indicating that the scope of the project, the approach taken, and the usefulness of having the genetic blueprint for this small organism have been well worth the effort.

    Trends in genetics : TIG 1999;15;2;51-8

  • DNA sequencing and analysis of a 67.4 kb region from the right arm of Schizosaccharomyces pombe chromosome II reveals 28 open reading frames including the genes his5, pol5, ppa2, rip1, rpb8 and skb1.

    Xiang Z, Lyne MH, Wood V, Rajandream MA, Barrell BG and Aves SJ

    School of Biological Sciences, University of Exeter, Washington Singer Laboratories, Perry Road, Exeter EX4 4QG, U.K.

    67 393 bp of contiguous DNA located between markers cdc18 and cdc14 on the right arm of fission yeast chromosome II has been sequenced as part of the European Union Schizosaccharomyces pombe genome sequencing project. The complete sequence, contained in cosmid clones c15C4 and c21H7, has been determined on both strands. Sequence analysis shows that it contains 28 open reading frames capable of coding for proteins, 16 split by one or more introns, but no tRNA, rRNA or transposon sequences. The gene density is one per 2. 4 kb. Six genes have been previously described (his5, pol5, ppa2, rip1, rpb8 and skb1) and 22 are novel. Of the novel genes, 14 have significant similarity with proteins of known function, three have similarities with proteins of unknown function and five show no extensive similarities with known proteins. Sequence similarities suggest that three of the novel genes encode ATP-dependent RNA helicases, two encode transcription factor components and others encode a G-protein, a dehydrogenase, a Rab escort protein, an Abc1-like protein, a lipase, an ATP-binding transport protein, an amino acid permease, an acid phosphatase and a mannosyltransferase.

    Funded by: Wellcome Trust

    Yeast (Chichester, England) 1999;15;10A;893-901

  • Complete sequence and gene map of a human major histocompatibility complex. The MHC sequencing consortium.

    No authors listed

    Here we report the first complete sequence and gene map of a human major histocompatibility complex (MHC), a region on chromosome 6 which is essential to the immune system. When it was discovered over 50 years ago the region was thought to specify histocompatibility genes, but their nature has been resolved only in the last two decades. Although many of the 224 identified gene loci (128 predicted to be expressed) are still of unknown function, we estimate that about 40% of the expressed genes have immune system function. Over 50% of the MHC has been sequenced twice, in different haplotypes, giving insight into the extraordinary polymorphism and evolution of this region. Several genes, particularly of the MHC class II and III regions, can be traced by sequence similarity and synteny to over 700 million years ago, clearly predating the emergence of the adaptive immune system some 400 million years ago. The sequence is expected to be invaluable for the identification of many common disease loci. In the past, the search for these loci has been hampered by the complexity of high gene density and linkage disequilibrium.

    Funded by: Wellcome Trust

    Nature 1999;401;6756;921-3

  • Special issue containing a selection of papers presented at the 1st International Conference on Bioinformatics of Genome Regulation and Structure. BGRS '98. Novosibirsk, Altai Mountains, Russia. 24-31 August 1998.

    No authors listed

    Bioinformatics (Oxford, England) 1999;15;7-8;527-714

* quick link -