Sanger Institute - Publications 2002

Number of papers published in 2002: 102

  • Induced mitotic recombination: a switch in time.

    Adams DJ and Bradley A

    Nature genetics 2002;30;1;6-7

  • The MHC haplotype project: a resource for HLA-linked association studies.

    Allcock RJ, Atrazhev AM, Beck S, de Jong PJ, Elliott JF, Forbes S, Halls K, Horton R, Osoegawa K, Rogers J, Sawcer S, Todd JA, Trowsdale J, Wang Y and Williams S

    University of Cambridge, UK.

    Tissue antigens 2002;59;6;520-1

  • From genomes to vaccines: Leishmania as a model.

    Almeida R, Norrish A, Levick M, Vetrie D, Freeman T, Vilo J, Ivens A, Lange U, Stober C, McCann S and Blackwell JM

    Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2XY.

    The 35 Mb genome of Leishmania should be sequenced by late 2002. It contains approximately 8500 genes that will probably translate into more than 10 000 proteins. In the laboratory we have been piloting strategies to try to harness the power of the genome-proteome for rapid screening of new vaccine candidate. To this end, microarray analysis of 1094 unique genes identified using an EST analysis of 2091 cDNA clones from spliced leader libraries prepared from different developmental stages of Leishmania has been employed. The plan was to identify amastigote-expressed genes that could be used in high-throughput DNA-vaccine screens to identify potential new vaccine candidates. Despite the lack of transcriptional regulation that polycistronic transcription in Leishmania dictates, the data provide evidence for a high level of post-transcriptional regulation of RNA abundance during the developmental cycle of promastigotes in culture and in lesion-derived amastigotes of Leishmania major. This has provided 147 candidates from the 1094 unique genes that are specifically upregulated in amastigotes and are being used in vaccine studies. Using DNA vaccination, it was demonstrated that pooling strategies can work to identify protective vaccines, but it was found that some potentially protective antigens are masked by other disease-exacerbatory antigens in the pool. A total of 100 new vaccine candidates are currently being tested separately and in pools to extend this analysis, and to facilitate retrospective bioinformatic analysis to develop predictive algorithms for sequences that constitute potentially protective antigens. We are also working with other members of the Leishmania Genome Network to determine whether RNA expression determined by microarray analyses parallels expression at the protein level. We believe we are making good progress in developing strategies that will allow rapid translation of the sequence of Leishmania into potential interventions for disease control in humans.

    Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2002;357;1417;5-11

  • Evolutionary analyses of ABC transporters of Dictyostelium discoideum.

    Anjard C, Loomis WF and Dictyostelium Sequencing Consortium

    Center for Molecular Genetics, Division of Biology, University of California--San Diego, La Jolla, California 92093-0368, USA.

    The ABC superfamily of genes is one of the largest in the genomes of both bacteria and eukaryotes. The proteins encoded by these genes all carry a characteristic 200- to 250-amino-acid ATP-binding cassette that gives them their family name. In bacteria they are mostly involved in nutrient import, while in eukaryotes many are involved in export. Seven different families have been defined in eukaryotes based on sequence homology, domain topology, and function. While only 6 ABC genes in Dictyostelium discoideum have been studied in detail previously, sequences from the well-advanced Dictyostelium genome project have allowed us to recognize 68 members of this superfamily. They have been classified and compared to animal, plant, and fungal orthologs in order to gain some insight into the evolution of this superfamily. It appears that many of the genes inferred to have been present in the ancestor of the crown organisms duplicated extensively in some but not all phyla, while others were lost in one lineage or the other.

    Funded by: NIGMS NIH HHS: GM60447

    Eukaryotic cell 2002;1;4;643-52

  • A comprehensive model for familial breast cancer incorporating BRCA1, BRCA2 and other genes.

    Antoniou AC, Pharoah PD, McMullan G, Day NE, Stratton MR, Peto J, Ponder BJ and Easton DF

    CRC Genetic Epidemiology Unit, Institute of Public Health, Strangeways Research Laboratory, Worts Causeway, University of Cambridge, Cambridge CB1 8RN, UK.

    In computing the probability that a woman is a BRCA1 or BRCA2 carrier for genetic counselling purposes, it is important to allow for the fact that other breast cancer susceptibility genes may exist. We used data from both a population based series of breast cancer cases and high risk families in the UK, with information on BRCA1 and BRCA2 mutation status, to investigate the genetic models that can best explain familial breast cancer outside BRCA1 and BRCA2 families. We also evaluated the evidence for risk modifiers in BRCA1 and BRCA2 carriers. We estimated the simultaneous effects of BRCA1, BRCA2, a third hypothetical gene 'BRCA3', and a polygenic effect using segregation analysis. The hypergeometric polygenic model was used to approximate polygenic inheritance and the effect of risk modifiers. BRCA1 and BRCA2 could not explain all the observed familial clustering. The best fitting model for the residual familial breast cancer was the polygenic, although a model with a single recessive allele produced a similar fit. There was also significant evidence for a modifying effect of other genes on the risks of breast cancer in BRCA1 and BRCA2 mutation carriers. Under this model, the frequency of BRCA1 was estimated to be 0.051% (95% CI: 0.021-0.125%) and of BRCA2 0.068% (95% CI: 0.033-0.141%). The breast cancer risk by age 70 years, based on the average incidence over all modifiers was estimated to be 35.3% for BRCA1 and 50.3% for BRCA2. The corresponding ovarian cancer risks were 25.9% for BRCA1 and 9.1% for BRCA2. The findings suggest that several common, low penetrance genes with multiplicative effects on risk may account for the residual non-BRCA1/2 familial aggregation of breast cancer. The modifying effect may explain the previously reported differences between population based estimates for BRCA1/2 penetrance and estimates based on high-risk families.

    Funded by: NCI NIH HHS: 1R01 CA81203-01A1

    British journal of cancer 2002;86;1;76-83

  • Physical and genetic characterization reveals a pseudogene, an evolutionary junction, and unstable loci in distal Xq28.

    Aradhya S, Woffendin H, Bonnen P, Heiss NS, Yamagata T, Esposito T, Bardaro T, Poustka A, D'Urso M, Kenwrick S and Nelson DL

    Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza 902E, Houston, TX 77030, USA.

    A large portion of human Xq28 has been completely characterized but the interval between G6PD and Xqter has remained poorly understood. Because of a lack of stable, high-density clone coverage in this region, we constructed a 1.6-Mb bacterial and P1 artificial chromosome (BAC and PAC, respectively) contig to expedite mapping, structural and evolutionary analysis, and sequencing. The contig helped to reposition previously mismapped genes and to characterize the XAP135 pseudogene near the int22h-2 repeat. BAC clones containing the distal int22h repeats also demonstrated spontaneous rearrangements and sparse coverage, which suggested that they were unstable. Because the int22h repeats are involved in genetic diseases, we examined them in great apes to see if they have always been unstable. Differences in copy number among the apes, due to duplications and deletions, indicated that they have been unstable throughout their evolution. Taking another approach toward understanding the genomic nature of distal Xq28, we examined the homologous mouse region and found an evolutionary junction near the distal int22h loci that separated the human distal Xq28 region into two segments on the mouse X chromosome. Finally, haplotype analysis showed that a segment within Xq28 has resisted excessive interchromosomal exchange through great ape evolution, potentially accounting for the linkage disequilibrium recently reported in this region. Collectively, these data highlight some interesting features of the genomic sequence in Xq28 and will be useful for positional cloning efforts, mouse mutagenesis studies, and further evolutionary analyses.

    Funded by: NICHD NIH HHS: 2 P30 HD24064, 5 R01 HD35617

    Genomics 2002;79;1;31-40

  • The mei3 region of the Schizosaccharomyces pombe genome.

    Aves SJ, Hunt C, Xiang Z, Lyne MH, Wood V, Rajandream MA, Skelton J, Churcher CM, Warren T, Harris D, Gwilliam R and Barrell BG

    School of Biological Sciences, University of Exeter, Washington Singer Laboratories, Perry Road, Exeter EX4 4QG, UK.

    Expression of the mei3 gene is sufficient to induce meiosis in the fission yeast Schizosaccharomyces pombe. The mei3 gene is located 0.64 Mb from the telomere of the left arm of Sz. pombe chromosome II. We have sequenced and analysed 107 kb of DNA from the mei3 genomic region. The sequence includes 14 known genes (bag1-B, csh3, dps1, gpt1, mei3, mfm3, pac1, prp31, rpl38-1, rpn3, rti1, spa1, spm1 and ubc4) and 26 other open reading frames (ORFs) longer than 100 codons: a density of one protein-coding gene per 2.7 kb. Twenty-one of the 40 ORFs (53%) have introns. In addition there is one lone Tf1 transposon long terminal repeat (LTR), tRNA(Trp) and tRNA(Ser) genes and a 5S rRNA gene. 14 of the novel ORFs show sequence similarities which suggest functions of their products, including a coatomer alpha-subunit, a catechol O-methyltransferase, protein kinase, asparagine synthetase, zinc metalloprotease, acetyltransferase, phosphatidylinositol 4-kinase, inositol polyphosphate phosphatase, GTPase-activating protein, permease, pre-mRNA splicing factor, 20S proteasome component and a thioredoxin-like protein. One predicted protein has similarity to the human Cockayne syndrome protein CSA and one with human GTPase XPA binding protein XAB1. Three ORFs are likely to code for proteins because they have sequence similarity with hypothetical proteins, three encode predicted coiled-coil proteins and four are sequence orphans.

    Yeast (Chichester, England) 2002;19;6;521-7

  • The SGS3 protein involved in PTGS finds a family.

    Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Background: Post transcriptional gene silencing (PTGS) is a recently discovered phenomenon that is an area of intense research interest. Components of the PTGS machinery are being discovered by genetic and bioinformatics approaches, but the picture is not yet complete.

    Results: The gene for the PTGS impaired Arabidopsis mutant sgs3 was recently cloned and was not found to have similarity to any other known protein. By a detailed analysis of the sequence of SGS3 we have defined three new protein domains: the XH domain, the XS domain and the zf-XS domain, that are shared with a large family of uncharacterised plant proteins. This work implicates these plant proteins in PTGS.

    Conclusion: The enigmatic SGS3 protein has been found to contain two predicted domains in common with a family of plant proteins. The other members of this family have been predicted to be transcription factors, however this function seems unlikely based on this analysis. A bioinformatics approach has implicated a new family of plant proteins related to SGS3 as potential candidates for PTGS related functions.

    BMC bioinformatics 2002;3;21

  • HMM-based databases in InterPro.

    Bateman A and Haft DH

    Pfam Group, The Wellcome Trust Sanger Institute, Hinxton, UK.

    Protein family databases are an important resource for protein annotation and understanding protein evolution and function. In recent years hidden Markov models (HMMs) have become one of the key technologies used for detection of members of these families. This paper reviews the Pfam, TIGRFAMs and SMART databases that use the profile-HMMs provided by the HMMER package.

    Briefings in bioinformatics 2002;3;3;236-45

  • The Pfam protein families database.

    Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M and Sonnhammer EL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at, in Sweden at, in France at and in the US at The latest version (6.6) of Pfam contains 3071 families, which match 69% of proteins in SWISS-PROT 39 and TrEMBL 14. Structural data, where available, have been utilised to ensure that Pfam families correspond with structural domains, and to improve domain-based annotation. Predictions of non-domain regions are now also included. In addition to secondary structure, Pfam multiple sequence alignments now contain active site residue mark-up. New search tools, including taxonomy search and domain query, greatly add to the functionality and usability of the Pfam resource.

    Nucleic acids research 2002;30;1;276-80

  • Armed to the teeth.

    Bentley S, Holden M, Thomson N and Parkhill J

    Trends in microbiology 2002;10;4;163-4

  • Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2).

    Bentley SD, Chater KF, Cerdeño-Tárraga AM, Challis GL, Thomson NR, James KD, Harris DE, Quail MA, Kieser H, Harper D, Bateman A, Brown S, Chandra G, Chen CW, Collins M, Cronin A, Fraser A, Goble A, Hidalgo J, Hornsby T, Howarth S, Huang CH, Kieser T, Larke L, Murphy L, Oliver K, O'Neil S, Rabbinowitsch E, Rajandream MA, Rutherford K, Rutter S, Seeger K, Saunders D, Sharp S, Squares R, Squares S, Taylor K, Warren T, Wietzorrek A, Woodward J, Barrell BG, Parkhill J and Hopwood DA

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

    Nature 2002;417;6885;141-7

  • The architecture of variant surface glycoprotein gene expression sites in Trypanosoma brucei.

    Berriman M, Hall N, Sheader K, Bringaud F, Tiwari B, Isobe T, Bowman S, Corton C, Clark L, Cross GA, Hoek M, Zanders T, Berberof M, Borst P and Rudenko G

    The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Trypanosoma brucei evades the immune system by switching between Variant Surface Glycoprotein (VSG) genes. The active VSG gene is transcribed in one of approximately 20 telomeric expression sites (ESs). It has been postulated that ES polymorphism plays a role in host adaptation. To gain more insight into ES architecture, we have determined the complete sequence of Bacterial Artificial Chromosomes (BACs) containing DNA from three ESs and their flanking regions. There was variation in the order and number of ES-associated genes (ESAGs). ESAGs 6 and 7, encoding transferrin receptor subunits, are the only ESAGs with functional copies in every ES that has been sequenced until now. A BAC clone containing the VO2 ES sequences comprised approximately half of a 330 kb 'intermediate' chromosome. The extensive similarity between this intermediate chromosome and the left telomere of T. brucei 927 chromosome I, suggests that this previously uncharacterised intermediate size class of chromosomes could have arisen from breakage of megabase chromosomes. Unexpected conservation of sequences, including pseudogenes, indicates that the multiple ESs could have arisen through a relatively recent amplification of a single ES.

    Funded by: NIAID NIH HHS: AI21729; Wellcome Trust: 095161

    Molecular and biochemical parasitology 2002;122;2;131-40

  • Complete sequence and organization of pBtoxis, the toxin-coding plasmid of Bacillus thuringiensis subsp. israelensis.

    Berry C, O'Neil S, Ben-Dov E, Jones AF, Murphy L, Quail MA, Holden MT, Harris D, Zaritsky A and Parkhill J

    Cardiff School of Biosciences, Cardiff University, Cardiff, United Kingdom.

    The entire 127,923-bp sequence of the toxin-encoding plasmid pBtoxis from Bacillus thuringiensis subsp. israelensis is presented and analyzed. In addition to the four known Cry and two known Cyt toxins, a third Cyt-type sequence was found with an additional C-terminal domain previously unseen in such proteins. Many plasmid-encoded genes could be involved in several functions other than toxin production. The most striking of these are several genes potentially affecting host sporulation and germination and a set of genes for the production and export of a peptide antibiotic.

    Applied and environmental microbiology 2002;68;10;5082-95

  • Databases and tools for browsing genomes.

    Birney E, Clamp M and Hubbard T

    European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.

    To maximize the value of genome sequences they need to be integrated with other types of biological data and with each other. The entire collection of data then needs to be made available in a way that is easy to view and mine for complex relationships. The recently determined vertebrate genome sequences of human and mouse are so large that building the infrastructure to manage these datasets is a major challenge. This article reviews the database systems and tools for analysis that have so far been developed to address this.

    Annual review of genomics and human genetics 2002;3;293-310

  • A global analysis of Caenorhabditis elegans operons.

    Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M and Kim SK

    Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Box B121, 4200 E. 9th Avenue, Denver, Colorado 80262, USA.

    The nematode worm Caenorhabditis elegans and its relatives are unique among animals in having operons. Operons are regulated multigene transcription units, in which polycistronic pre-messenger RNA (pre-mRNA coding for multiple peptides) is processed to monocistronic mRNAs. This occurs by 3' end formation and trans-splicing using the specialized SL2 small nuclear ribonucleoprotein particle for downstream mRNAs. Previously, the correlation between downstream location in an operon and SL2 trans-splicing has been strong, but anecdotal. Although only 28 operons have been reported, the complete sequence of the C. elegans genome reveals numerous gene clusters. To determine how many of these clusters represent operons, we probed full-genome microarrays for SL2-containing mRNAs. We found significant enrichment for about 1,200 genes, including most of a group of several hundred genes represented by complementary DNAs that contain SL2 sequence. Analysis of their genomic arrangements indicates that >90% are downstream genes, falling in 790 distinct operons. Our evidence indicates that the genome contains at least 1,000 operons, 2 8 genes long, that contain about 15% of all C. elegans genes. Numerous examples of co-transcription of genes encoding functionally related proteins are evident. Inspection of the operon list should reveal previously unknown functional relationships.

    Nature 2002;417;6891;851-4

  • Mining the mouse genome.

    Bradley A

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Nature 2002;420;6915;512-4

  • A new, expressed multigene family containing a hot spot for insertion of retroelements is associated with polymorphic subtelomeric regions of Trypanosoma brucei.

    Bringaud F, Biteau N, Melville SE, Hez S, El-Sayed NM, Leech V, Berriman M, Hall N, Donelson JE and Baltz T

    Laboratoire de Parasitologie Moléculaire, Université Victor Segalen Bordeaux II, UMR-5016 CNRS, 33076 Bordeaux, France.

    We describe a novel gene family that forms clusters in subtelomeric regions of Trypanosoma brucei chromosomes and partially accounts for the observed clustering of retrotransposons. The ingi and ribosomal inserted mobile element (RIME) non-LTR retrotransposons share 250 bp at both extremities and are the most abundant putatively mobile elements, with about 500 copies per haploid genome. From cDNA clones and subsequently in the T. brucei genomic DNA databases, we identified 52 homologous gene and pseudogene sequences, 16 of which contain a RIME and/or ingi retrotransposon inserted at exactly the same relative position. Here these genes are called the RHS family, for retrotransposon hot spot. Comparison of the protein sequences encoded by RHS genes (21 copies) and pseudogenes (24 copies) revealed a conserved central region containing an ATP/GTP-binding motif and the RIME/ingi insertion site. The RHS proteins share between 13 and 96% identity, and six subfamilies, RHS1 to RHS6, can be defined on the basis of their divergent C-terminal domains. Immunofluorescence and Western blot analyses using RHS subfamily-specific immune sera show that RHS proteins are constitutively expressed and occur mainly in the nucleus. Analysis of Genome Survey Sequence databases indicated that the Trypanosoma brucei diploid genome contains about 280 RHS (pseudo)genes. Among the 52 identified RHS (pseudo)genes, 48 copies are in three RHS clusters located in subtelomeric regions of chromosomes Ia and II and adjacent to the active bloodstream form expression site in T. brucei strain TREU927/4 GUTat10.1. RHS genes comprise the remaining sequence of the size-polymorphic "repetitive region" described for T. brucei chromosome I, and a homologous gene family is present in the Trypanosoma cruzi genome.

    Eukaryotic cell 2002;1;1;137-51

  • BRAF and RAS mutations in human lung cancer and melanoma.

    Brose MS, Volpe P, Feldman M, Kumar M, Rishi I, Gerrero R, Einhorn E, Herlyn M, Minna J, Nicholson A, Roth JA, Albelda SM, Davies H, Cox C, Brignell G, Stephens P, Futreal PA, Wooster R, Stratton MR and Weber BL

    Department of Medicine, Abramson Family Cancer Research Institute, University of Pennsylvania Cancer Center, Philadelphia 19104, USA.

    BRAF encodes a RAS-regulated kinase that mediates cell growth and malignant transformation kinase pathway activation. Recently, we have identified activating BRAF mutations in 66% of melanomas and a smaller percentage of many other human cancers. To determine whether BRAF mutations account for the MAP kinase pathway activation common in non-small cell lung carcinomas (NSCLCs) and to extend the initial findings in melanoma, we screened DNA from 179 NSCLCs and 35 melanomas for BRAF mutations (exons 11 and 15). We identified BRAF mutations in 5 NSCLCs (3%; one V599 and four non-V599) and 22 melanomas (63%; 21 V599 and 1 non-V599). Three BRAF mutations identified in this study are novel, altering residues important in AKT-mediated BRAF phosphorylation and suggesting that disruption of AKT-induced BRAF inhibition can play a role in malignant transformation. To our knowledge, this is the first report of mutations documenting this interaction in human cancers. Although >90% of BRAF mutations in melanoma involve codon 599 (57 of 60), 8 of 9 BRAF mutations reported to date in NSCLC are non-V599 (89%; P < 10(-7)), strongly suggesting that BRAF mutations in NSCLC are qualitatively different from those in melanoma; thus, there may be therapeutic differences between lung cancer and melanoma in response to RAF inhibitors. Although uncommon, BRAF mutations in human lung cancers may identify a subset of tumors sensitive to targeted therapy.

    Funded by: NCI NIH HHS: P50 CA70907

    Cancer research 2002;62;23;6997-7000

  • Mutation analysis of EP300 in colon, breast and ovarian carcinomas.

    Bryan EJ, Jokubaitis VJ, Chamberlain NL, Baxter SW, Dawson E, Choong DY and Campbell IG

    VBCRC Cancer Genetics Laboratory, Peter MacCallum Cancer Institute, Locked Bag No. 1 A'Beckett Street, East Melbourne, Victoria, Australia.

    The putative tumour suppressor gene EP300 is located on chromosome 22q13 which is a region showing frequent loss of heterozygosity (LOH) in colon, breast and ovarian cancers. We analysed 203 human breast, colon and ovarian primary tumours and cell lines for somatic mutations in EP300. LOH across the EP300 locus was detected in 38% of colon, 36% of breast, and 49% of ovarian primary tumours but no somatic mutations in EP300 were identified in any primary tumour. Analysis of 17 colon, 11 breast, and 11 ovarian cancer cell lines identified truncating mutations in 4 colon cancer cell lines (HCT116, HT29, LIM2405 and LIM2412). We confirmed the presence of a previously reported frameshift mutation in HCT116 at codon 1699 and identified a second frameshift mutation at codon 1468. Bi-allelic inactivation of EP300 was also detected in LIM2405 that harbours an insC mutation at codon 927 as well an insA mutation at codon 1468. An insA mutation at codon 1468 was identified in HT29 and a CGA>TGA mutation at codon 86 was identified in LIM2412. Both these lines were heterozygous across the EP300 locus and western blot analysis confirmed the presence of an apparently wild-type protein. Our study has established that genetic inactivation of EP300 is rare in primary colorectal, breast and ovarian cancers. In contrast, mutations are common among colorectal cancer cell lines with 4/17 harbouring homozygous or heterozygous mutations. The rarity of EP300 mutations among these tumour types that show a high frequency of LOH across 22q13 may indicate that another gene is the target of the loss. It is possible that bi-allelic inactivation of EP300 is not necessary and that haploinsufficiency is sufficient to promote tumorigenesis. Alternatively, silencing of EP300 may be achieved by epigenetic mechanisms such as promoter methylation.

    International journal of cancer. Journal international du cancer 2002;102;2;137-41

  • A full-coverage, high-resolution human chromosome 22 genomic microarray for clinical and research applications.

    Buckley PG, Mantripragada KK, Benetkiewicz M, Tapia-Páez I, Diaz De Ståhl T, Rosenquist M, Ali H, Jarbo C, De Bustos C, Hirvelä C, Sinder Wilén B, Fransson I, Thyr C, Johnsson BI, Bruder CE, Menzel U, Hergersberg M, Mandahl N, Blennow E, Wedell A, Beare DM, Collins JE, Dunham I, Albertson D, Pinkel D, Bastian BC, Faruqi AF, Lasken RS, Ichimura K, Collins VP and Dumanski JP

    Department of Genetics and Pathology, Rudbeck laboratory, Uppsala University, 751 85 Uppsala, Sweden

    We have constructed the first comprehensive microarray representing a human chromosome for analysis of DNA copy number variation. This chromosome 22 array covers 34.7 Mb, representing 1.1% of the genome, with an average resolution of 75 kb. To demonstrate the utility of the array, we have applied it to profile acral melanoma, dermatofibrosarcoma, DiGeorge syndrome and neurofibromatosis 2. We accurately diagnosed homozygous/heterozygous deletions, amplifications/gains, IGLV/IGLC locus instability, and breakpoints of an imbalanced translocation. We further identified the 14-3-3 eta isoform as a candidate tumor suppressor in glioblastoma. Two significant methodological advances in array construction were also developed and validated. These include a strictly sequence defined, repeat-free, and non-redundant strategy for array preparation. This approach allows an increase in array resolution and analysis of any locus; disregarding common repeats, genomic clone availability and sequence redundancy. In addition, we report that the application of phi29 DNA polymerase is advantageous in microarray preparation. A broad spectrum of issues in medical research and diagnostics can be approached using the array. This well annotated and gene-rich autosome contains numerous uncharacterized disease genes. It is therefore crucial to associate these genes to specific 22q-related conditions and this array will be instrumental towards this goal. Furthermore, comprehensive epigenetic profiling of 22q-located genes and high-resolution analysis of replication timing across the entire chromosome can be studied using our array.

    Human molecular genetics 2002;11;25;3221-9

  • Genome analysis of an inducible prophage and prophage remnants integrated in the Streptococcus pyogenes strain SF370.

    Canchaya C, Desiere F, McShan WM, Ferretti JJ, Parkhill J and Brüssow H

    Nestlé Research Center, Nestec Ltd. Vers-chez-les-Blanc, CH Lausanne 26, Switzerland.

    The mitomycin C inducible prophage SF370.1 from the highly pathogenic M1 serotype Streptococcus pyogenes isolate SF370 showed a 41-kb-long genome whose genetic organization resembled that of SF11-like pac-site Siphoviridae. Its closest relative was prophage NIH1.1 from an M3 serotype S. pyogenes strain, followed by S. pneumoniae phage MM1 and Lactobacillus phage phig1e, Listeria phage A118, and Bacillus phage SPP1 in a gradient of relatedness. Sequence similarity with the previously described prophages SF370.2 and SF370.3 from the same polylysogenic SF370 strain were mainly limited to the tail fiber genes. As in these two other prophages, SF370.1 encoded likely lysogenic conversion genes between the phage lysin and the right attachment site. The genes encoded the pyrogenic exotoxin C of S. pyogenes and a protein sharing sequence similarity with both DNases and mitogenic factors. The screening of the SF370 genome revealed further prophage-like elements. A 13-kb-long phage remnant SF370.4 encoded lysogeny and DNA replication genes. A closely related prophage remnant was identified in S. pyogenes strain Manfredo at a corresponding genome position. The two prophages differed by internal indels and gene replacements. Four phage-like integrases were detected; three were still accompanied by likely repressor genes. All prophage elements were integrated into coding sequences. The phage sequences complemented the coding sequences in all cases. The DNA repair genes mutL and mutS were separated by the prophage remnant SF370.4; prophage SF370.1 and S. pneumoniae phage MM1 integrated into homologous chromosomal locations. The prophage sequences were interpreted with a hypothesis that predicts elements of cooperation and an arms race between phage and host genomes.

    Funded by: NIAID NIH HHS: AI19034

    Virology 2002;302;2;245-58

  • Mice deficient for the wild-type p53-induced phosphatase gene (Wip1) exhibit defects in reproductive organs, immune function, and cell cycle control.

    Choi J, Nannenga B, Demidov ON, Bulavin DV, Cooney A, Brayton C, Zhang Y, Mbawuike IN, Bradley A, Appella E and Donehower LA

    Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas 77030, USA.

    The Wip1 gene is a serine/threonine phosphatase that is induced in a p53-dependent manner by DNA-damaging agents. We show here that Wip1 message is expressed in moderate levels in all organs, but is present at very high levels in the testes, particularly in the postmeiotic round spermatid compartment of the seminiferous tubules. We have confirmed that Wip1 mRNA is induced by ionizing radiation in mouse tissues in a p53-dependent manner. To further determine the normal biological function of Wip1 in mammalian organisms, we have generated Wip1-deficient mice. Wip1 null mice are viable but show a variety of postnatal abnormalities, including variable male runting, male reproductive organ atrophy, reduced male fertility, and reduced male longevity. Mice lacking Wip1 show increased susceptibility to pathogens and diminished T- and B-cell function. Fibroblasts derived from Wip1 null embryos have decreased proliferation rates and appear to be compromised in entering mitosis. The data are consistent with an important role for Wip1 in spermatogenesis, lymphoid cell function, and cell cycle regulation.

    Funded by: NCI NIH HHS: CA54897

    Molecular and cellular biology 2002;22;4;1094-105

  • Identification of amplified and expressed genes in breast cancer by comparative hybridization onto microarrays of randomly selected cDNA clones.

    Clark J, Edwards S, John M, Flohr P, Gordon T, Maillard K, Giddings I, Brown C, Bagherzadeh A, Campbell C, Shipley J, Wooster R and Cooper CS

    Section of Molecular Carcinogenesis, The Haddow Laboratories, Institute of Cancer Research, Sutton, Surrey, United Kingdom.

    Microarray analysis using sets of known human genes provides a powerful platform for identifying candidate oncogenes involved in DNA amplification events but suffers from the disadvantage that information can be gained only on genes that have been preselected for inclusion on the array. To address this issue, we have performed comparative genome hybridization (CGH) and expression analyses on microarrays of clones, randomly selected from a cDNA library, prepared from a cancer containing the DNA amplicon under investigation. Application of this approach to the BT474 breast carcinoma cell line, which contains amplicons at 20q13, 17q11-21, and 17q22-23, identified 50 amplified and expressed genes, including genes from these regions previously proposed as candidate oncogenes. When considered together with data from microarray expression profiles and Northern analyses, we were able to propose five genes as new candidate oncogenes where amplification in breast cancer cell lines was consistently associated with higher levels of RNA expression. These included the HB01 histone acetyl transferase gene at 17q22-23 and the TRAP100 gene, which encodes a thyroid hormone receptor-associated protein coactivator, at 17q11-21. The results demonstrate the utility of this microarray-based CGH approach in hunting for candidate oncogenes within DNA amplicons.

    Genes, chromosomes & cancer 2002;34;1;104-14

  • Human genome. HapMap launched with pledges of $100 million.

    Couzin J

    Science (New York, N.Y.) 2002;298;5595;941-2

  • Mutations of the BRAF gene in human cancer.

    Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, Teague J, Woffendin H, Garnett MJ, Bottomley W, Davis N, Dicks E, Ewing R, Floyd Y, Gray K, Hall S, Hawes R, Hughes J, Kosmidou V, Menzies A, Mould C, Parker A, Stevens C, Watt S, Hooper S, Wilson R, Jayatilake H, Gusterson BA, Cooper C, Shipley J, Hargrave D, Pritchard-Jones K, Maitland N, Chenevix-Trench G, Riggins GJ, Bigner DD, Palmieri G, Cossu A, Flanagan A, Nicholson A, Ho JW, Leung SY, Yuen ST, Weber BL, Seigler HF, Darrow TL, Paterson H, Marais R, Marshall CJ, Wooster R, Stratton MR and Futreal PA

    Cancer Genome Project, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    Cancers arise owing to the accumulation of mutations in critical genes that alter normal programmes of cell proliferation, differentiation and death. As the first stage of a systematic genome-wide screen for these genes, we have prioritized for analysis signalling pathways in which at least one gene is mutated in human cancer. The RAS RAF MEK ERK MAP kinase pathway mediates cellular responses to growth signals. RAS is mutated to an oncogenic form in about 15% of human cancer. The three RAF genes code for cytoplasmic serine/threonine kinases that are regulated by binding RAS. Here we report BRAF somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers. All mutations are within the kinase domain, with a single substitution (V599E) accounting for 80%. Mutated BRAF proteins have elevated kinase activity and are transforming in NIH3T3 cells. Furthermore, RAS function is not required for the growth of cancer cell lines with the V599E mutation. As BRAF is a serine/threonine kinase that is commonly activated by somatic point mutation in human cancer, it may provide new therapeutic opportunities in malignant melanoma.

    Nature 2002;417;6892;949-54

  • A first-generation linkage disequilibrium map of human chromosome 22.

    Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, Dibling T, Tinsley E, Kirby S, Carter D, Papaspyridonos M, Livingstone S, Ganske R, Lõhmussaar E, Zernant J, Tõnisson N, Remm M, Mägi R, Puurand T, Vilo J, Kurg A, Rice K, Deloukas P, Mott R, Metspalu A, Bentley DR, Cardon LR and Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    DNA sequence variants in specific genes or regions of the human genome are responsible for a variety of phenotypes such as disease risk or variable drug response. These variants can be investigated directly, or through their non-random associations with neighbouring markers (called linkage disequilibrium (LD)). Here we report measurement of LD along the complete sequence of human chromosome 22. Duplicate genotyping and analysis of 1,504 markers in Centre d'Etude du Polymorphisme Humain (CEPH) reference families at a median spacing of 15 kilobases (kb) reveals a highly variable pattern of LD along the chromosome, in which extensive regions of nearly complete LD up to 804 kb in length are interspersed with regions of little or no detectable LD. The LD patterns are replicated in a panel of unrelated UK Caucasians. There is a strong correlation between high LD and low recombination frequency in the extant genetic map, suggesting that historical and contemporary recombination rates are similar. This study demonstrates the feasibility of developing genome-wide maps of LD.

    Nature 2002;418;6897;544-8

  • Computational detection and location of transcription start sites in mammalian genomic DNA.

    Down TA and Hubbard TJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Transcription, the process whereby RNA copies are made from sections of the DNA genome, is directed by promoter regions. These define the transcription start site, and also the set of cellular conditions under which the promoter is active. At least in more complex species, it appears to be common for genes to have several different transcription start sites, which may be active under different conditions. Eukaryotic promoters are complex and fairly diffuse structures, which have proven hard to detect in silico. We show that a novel hybrid machine-learning method is able to build useful models of promoters for >50% of human transcription start sites. We estimate specificity to be >70%, and demonstrate good positional accuracy. Based on the structure of our learned models, we conclude that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.

    Genome research 2002;12;3;458-61

  • Human genome sequences: enigmatic variations.

    Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The sequence of the human genome should be completed in 2003. The next steps are to obtain accurate annotation of the genes within the sequence and to begin to define the sequences of multiple human genomes.

    Mutagenesis 2002;17;6;457-61

  • Linkage and association with type 1 diabetes on chromosome 1q42.

    Ewens KG, Johnson LN, Wapelhorst B, O'Brien K, Gutin S, Morrison VA, Street C, Gregory SG, Spielman RS and Concannon P

    Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.

    Type 1 diabetes is a complex disorder with multiple genetic loci and environmental factors contributing to disease etiology. In the current study, a human type 1 diabetes candidate region on chromosome 1q42 was mapped at high marker density in a panel of 616 multiplex type 1 diabetic families. To facilitate the identification and evaluation of candidate genes, a physical map of the 7-cM region surrounding the maximum logarithm of odds (LOD) score (2.46, P = 0.0004) was constructed. Genes were identified in the 500-kb region surrounding the marker yielding the peak LOD score and evaluated for polymorphism by resequencing. Single-nucleotide polymorphisms (SNPs) identified in these genes as well as other anonymous markers were tested for allelic association with type 1 diabetes by both family-based and case-control methods. A haplotype formed by common alleles at three adjacent markers (D1S225, D1S2383, and D1S251) was preferentially transmitted to affected offspring in type 1 diabetic families (nominal P = 0.006). These findings extend the evidence supporting the existence of a type 1 diabetes susceptibility locus on chromosome 1q42 and identify a candidate region amenable to positional cloning efforts.

    Funded by: NIDDK NIH HHS: DK46618, DK46635

    Diabetes 2002;51;11;3318-25

  • Transcriptional regulation of the stem cell leukemia gene (SCL)--comparative analysis of five vertebrate SCL loci.

    Göttgens B, Barton LM, Chapman MA, Sinclair AM, Knudsen B, Grafham D, Gilbert JG, Rogers J, Bentley DR and Green AR

    Cambridge Institute for Medical Research, Cambridge University, Cambridge, CB2 2XY, United Kingdom.

    The stem cell leukemia (SCL) gene encodes a bHLH transcription factor with a pivotal role in hematopoiesis and vasculogenesis and a pattern of expression that is highly conserved between mammals and zebrafish. Here we report the isolation and characterization of the zebrafish SCL locus together with the identification of three neighboring genes, IER5, MAP17, and MUPP1. This region spans 68 kb and comprises the longest zebrafish genomic sequence currently available for comparison with mammalian, chicken, and pufferfish sequences. Our data show conserved synteny between zebrafish and mammalian SCL and MAP17 loci, thus suggesting the likely genomic domain necessary for the conserved pattern of SCL expression. Long-range comparative sequence analysis/phylogenetic footprinting was used to identify noncoding conserved sequences representing candidate transcriptional regulatory elements. The SCL promoter/enhancer, exon 1, and the poly(A) region were highly conserved, but no homology to other known mouse SCL enhancers was detected in the zebrafish sequence. A combined homology/structure analysis of the poly(A) region predicted consistent structural features, suggesting a conserved functional role in mRNA regulation. Analysis of the SCL promoter/enhancer revealed five motifs, which were conserved from zebrafish to mammals, and each of which is essential for the appropriate pattern or level of SCL transcription.

    Genome research 2002;12;5;749-59

  • Genome sequence of the human malaria parasite Plasmodium falciparum.

    Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM and Barrell B

    The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA.

    The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

    Funded by: NIAID NIH HHS: R01 AI028398; Wellcome Trust: 061524

    Nature 2002;419;6906;498-511

  • Sequence and analysis of chromosome 2 of Dictyostelium discoideum.

    Glöckner G, Eichinger L, Szafranski K, Pachebat JA, Bankier AT, Dear PH, Lehmann R, Baumgart C, Parra G, Abril JF, Guigó R, Kumpf K, Tunggal B, Cox E, Quail MA, Platzer M, Rosenthal A, Noegel AA and Dictyostelium Genome Sequencing Consortium

    IMB Jena, Department of Genome Analysis, Beutenbergstr. 11, 07745 Jena, Germany.

    The genome of the lower eukaryote Dictyostelium discoideum comprises six chromosomes. Here we report the sequence of the largest, chromosome 2, which at 8 megabases (Mb) represents about 25% of the genome. Despite an A + T content of nearly 80%, the chromosome codes for 2,799 predicted protein coding genes and 73 transfer RNA genes. This gene density, about 1 gene per 2.6 kilobases (kb), is surpassed only by Saccharomyces cerevisiae (one per 2 kb) and is similar to that of Schizosaccharomyces pombe (one per 2.5 kb). If we assume that the other chromosomes have a similar gene density, we can expect around 11,000 genes in the D. discoideum genome. A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes. This analysis strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom, placing it close to the base of metazoan evolution.

    Nature 2002;418;6893;79-85

  • A physical map of the mouse genome.

    Gregory SG, Sekhon M, Schein J, Zhao S, Osoegawa K, Scott CE, Evans RS, Burridge PW, Cox TV, Fox CA, Hutton RD, Mullenger IR, Phillips KJ, Smith J, Stalker J, Threadgold GJ, Birney E, Wylie K, Chinwalla A, Wallis J, Hillier L, Carter J, Gaige T, Jaeger S, Kremitzki C, Layman D, Maas J, McGrane R, Mead K, Walker R, Jones S, Smith M, Asano J, Bosdet I, Chan S, Chittaranjan S, Chiu R, Fjell C, Fuhrmann D, Girn N, Gray C, Guin R, Hsiao L, Krzywinski M, Kutsche R, Lee SS, Mathewson C, McLeavy C, Messervier S, Ness S, Pandoh P, Prabhu AL, Saeedi P, Smailus D, Spence L, Stott J, Taylor S, Terpstra W, Tsai M, Vardy J, Wye N, Yang G, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Shvartsbeyn A, Gebregeorgis E, Krol M, Russell D, Overton L, Malek JA, Holmes M, Heaney M, Shetty J, Feldblyum T, Nierman WC, Catanese JJ, Hubbard T, Waterston RH, Rogers J, de Jong PJ, Fraser CM, Marra M, McPherson JD and Bentley DR

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs were aligned to the human genome sequence on the basis of 51,486 homology matches, thus enabling use of the conserved synteny (correspondence between chromosome blocks) of the two genomes to accelerate construction of the mouse map. The map provides a framework for assembly of whole-genome shotgun sequence data, and a tile path of clones for generation of the reference sequence. Definition of the human-mouse alignment at this level of resolution enables identification of a mouse clone that corresponds to almost any position in the human genome. The human sequence may be used to facilitate construction of other mammalian genome maps using the same strategy.

    Funded by: NHGRI NIH HHS: U01 HG002137-03

    Nature 2002;418;6899;743-50

  • The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs.

    Griffiths-Jones S and Bateman A

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: The best quality multiple sequence alignments are generally considered to derive from structural superposition. However, no previous work has studied the relative performance of profile hidden Markov models (HMMs) derived from such alignments. Therefore several alignment methods have been used to generate multiple sequence alignments from 348 structurally aligned families in the HOMSTRAD database. The performance of profile HMMs derived from the structural and sequence-based alignments has been assessed for homologue detection.

    Results: The best alignment methods studied here correctly align nearly 80% of residues with respect to structure alignments. Alignment quality and model sensitivity are found to be dependent on average number, length, and identity of sequences in the alignment. The striking conclusion is that, although structural data may improve the quality of multiple sequence alignments, this does not add to the ability of the derived profile HMMs to find sequence homologues.

    A list of HOMSTRAD families used in this study and the corresponding Pfam families is available at Contact:

    Bioinformatics (Oxford, England) 2002;18;9;1243-9

  • Conservation of long-range synteny and microsynteny between the genomes of two distantly related nematodes.

    Guiliano DB, Hall N, Jones SJ, Clark LN, Corton CH, Barrell BG and Blaxter ML

    Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, UK.

    Background: Comparisons between the genomes of the closely related nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal high rates of rearrangement, with a bias towards within-chromosome events. To assess whether this pattern is true of nematodes in general, we have used genome sequence to compare two nematode species that last shared a common ancestor approximately 300 million years ago: the model C. elegans and the filarial parasite Brugia malayi.

    Results: An 83 kb region flanking the gene for Bm-mif-1 (macrophage migration inhibitory factor, a B. malayi homolog of a human cytokine) was sequenced. When compared to the complete genome of C. elegans, evidence for conservation of long-range synteny and microsynteny was found. Potential C. elegans orthologs for II of the 12 protein-coding genes predicted in the B. malayi sequence were identified. Ten of these orthologs were located on chromosome I, with eight clustered in a 2.3 Mb region. While several, relatively local, intrachromosomal rearrangements have occurred, the order, composition, and configuration of two gene clusters, each containing three genes, was conserved. Comparison of B. malayi BAC-end genome survey sequence to C. elegans also revealed a bias towards intrachromosome rearrangements.

    Conclusions: We suggest that intrachromosomal rearrangement is a major force driving chromosomal organization in nematodes, but is constrained by the interdigitation of functional elements of neighboring genes.

    Genome biology 2002;3;10;RESEARCH0057

  • Genomic organization of human CDS2 and evaluation as a candidate gene for corneal hereditary endothelial dystrophy 2 on chromosome 20p13.

    Halford S, Inglis S, Gwilliam R, Spencer P, Mohamed M, Ebenezer ND and Hunt DM

    Experimental eye research 2002;75;5;619-23

  • Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13.

    Hall N, Pain A, Berriman M, Churcher C, Harris B, Harris D, Mungall K, Bowman S, Atkin R, Baker S, Barron A, Brooks K, Buckee CO, Burrows C, Cherevach I, Chillingworth C, Chillingworth T, Christodoulou Z, Clark L, Clark R, Corton C, Cronin A, Davies R, Davis P, Dear P, Dearden F, Doggett J, Feltwell T, Goble A, Goodhead I, Gwilliam R, Hamlin N, Hance Z, Harper D, Hauser H, Hornsby T, Holroyd S, Horrocks P, Humphray S, Jagels K, James KD, Johnson D, Kerhornou A, Knights A, Konfortov B, Kyes S, Larke N, Lawson D, Lennard N, Line A, Maddison M, McLean J, Mooney P, Moule S, Murphy L, Oliver K, Ormond D, Price C, Quail MA, Rabbinowitsch E, Rajandream MA, Rutter S, Rutherford KM, Sanders M, Simmonds M, Seeger K, Sharp S, Smith R, Squares R, Squares S, Stevens K, Taylor K, Tivey A, Unwin L, Whitehead S, Woodward J, Sulston JE, Craig A, Newbold C and Barrell BG

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.

    Nature 2002;419;6906;527-31

  • Searching for clues.

    Holden M, Sebaihia M, Bentley S and Parkhill J

    Trends in microbiology 2002;10;8;354-5

  • Split personalities.

    Holden M, Sebaihia M, Cerdeño-Tárraga A and Parkhill J

    Trends in microbiology 2002;10;3;115

  • The genome sequence of the malaria mosquito Anopheles gambiae.

    Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH and Hoffman SL

    Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.

    Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

    Funded by: NIAID NIH HHS: R01AI44273, U01AI48846, U01AI50687

    Science (New York, N.Y.) 2002;298;5591;129-49

  • QuickTree: building huge Neighbour-Joining trees of protein sequences.

    Howe K, Bateman A and Durbin R

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    We have written a fast implementation of the popular Neighbor-Joining tree building algorithm. QuickTree allows the reconstruction of phylogenies for very large protein families (including the largest Pfam alignment containing 27000 HIV GP120 glycoprotein sequences) that would be infeasible using other popular methods.

    Bioinformatics (Oxford, England) 2002;18;11;1546-7

  • GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

    Howe KL, Chothia T and Durbin R

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    We describe a method (implemented in a program, GAZE) for assembling arbitrary evidence for individual gene components (features) into predictions of complete gene structures. Our system is generic in that both the features themselves, and the model of gene structure against which potential assemblies are validated and scored, are external to the system and supplied by the user. GAZE uses a dynamic programming algorithm to obtain the highest scoring gene structure according to the model and posterior probabilities that each input feature is part of a gene. A novel pruning strategy ensures that the algorithm has a run-time effectively linear in sequence length. To demonstrate the flexibility of our system in the incorporation of additional evidence into the gene prediction process, we show how it can be used to both represent nonstandard gene structures (in the form of trans-spliced genes in Caenorhabditis elegans), and make use of similarity information (in the form of Expressed Sequence Tag alignments), while requiring no change to the underlying software. GAZE is available at

    Genome research 2002;12;9;1418-27

  • Biological information: making it accessible and integrated (and trying to make sense of it).

    Hubbard T

    Sanger Institute, Cambridgeshire, UK.

    The availability of the genome sequences of human and mouse, human sequence variation data and other large genetic data sets will lead to a revolution in understanding of the human machine and the treatment of its diseases. The success of the international genome sequencing consortiums shows what can be achieved by well coordinated large scale public domain projects and the benefits of data access to all. It is already clear that the availability of this sequence is having a huge impact on research worldwide. Complete genome sequences provide a framework to pull all biological data together such that each piece has the potential to say something about biology as a whole. Biology is too complex for any organisation to have a monopoly of ideas or data, so the collection, analysis and access to this data can be contributed to by research institutes around the world. However, although it is possible for all this data to be accessible to all through the internet, the more organisations provide data or analysis separately, the harder it becomes for anyone to collect and integrate the results. To address these problems of intergration of data, open standards for biological data exchange, such as the 'Distributed Annotation System' (DAS) are being developed and bioinformatics (Dowell et al., 2001) as a whole is now being strongly driven by the open source software (OSS) model for collaborative software development (Hubbard and Birney, 1999). The leading provider of human genome annotation, the Ensembl project (, is entirely an OSS project and has been widely adopted by academic and commerical organisations alike (Hubbard et al., 2002). Accurate automatic annotation of features such as genes in vertebrate genomes currently relies on supporting evidence in the form of homologies to mRNAs, ESTs or protein. However, it appears that sufficient high quality experimentally curated annotation now exists to be used as a substrate for machine learning algorithms to create effective models of biological signal sequences (Down and Hubbard, 2002). Is there hope for ab initio prediction methods after all?

    Bioinformatics (Oxford, England) 2002;18 Suppl 2;S140

  • The Ensembl genome database project.

    Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I and Clamp M

    The Wellcome Trust Sanger Institute and European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The Ensembl ( database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

    Nucleic acids research 2002;30;1;38-41

  • An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22.

    Jarmuz A, Chester A, Bayliss J, Gisbourne J, Dunham I, Scott J and Navaratnam N

    RNA Editing Group, MRC Clinical Sciences Centre, Imperial College, Hammersmith Hospital Campus, Du Cane Road, London, W12 ONN, UK.

    The cytidine (C) to uridine (U) editing of apolipoprotein (apo) B mRNA is mediated by tissue-specific, RNA-binding cytidine deaminase APOBEC1. APOBEC1 is structurally homologous to Escherichia coli cytidine deaminase (ECCDA), but has evolved specific features required for RNA substrate binding and editing. A signature sequence for APOBEC1 has been used to identify other members of this family. One of these genes, designated APOBEC2, is found on chromosome 6. Another gene corresponds to the activation-induced deaminase (AID) gene, which is located adjacent to APOBEC1 on chromosome 12. Seven additional genes, or pseudogenes (designated APOBEC3A to 3G), are arrayed in tandem on chromosome 22. Not present in rodents, this locus is apparently an anthropoid-specific expansion of the APOBEC family. The conclusion that these new genes encode orphan C to U RNA-editing enzymes of the APOBEC family comes from similarity in amino acid sequence with APOBEC1, conserved intron/exon organization, tissue-specific expression, homodimerization, and zinc and RNA binding similar to APOBEC1. Tissue-specific expression of these genes in a variety of cell lines, along with other evidence, suggests a role for these enzymes in growth or cell cycle control.

    Genomics 2002;79;3;285-96

  • Characterisation and distribution of a cryptic Salmonella typhi plasmid pHCM2.

    Kidgell C, Pickard D, Wain J, James K, Diem Nga LT, Diep TS, Levine MM, O'Gaora P, Prentice MB, Parkhill J, Day N, Farrar J and Dougan G

    Centre for Molecular Microbiology and Infection, Department of Biological Sciences, Imperial College of Science, Technology and Medicine, London SW7 2AY, UK.

    pHCM2 is a 106 kbp cryptic plasmid harboured by Salmonella typhi CT18, originally isolated from a typhoid patient in Vietnam. The genome of S. typhi CT18, including pHCM2, has recently been completely sequenced and annotated. Bioinformatic analysis revealed that 57% of the coding sequences (CDSs) encoded on pHCM2 display over 97% DNA sequence identity to the virulence-associated plasmid of Yersinia pestis, pFra. pHCM2 encodes no obvious virulence-associated determinants or antibiotic resistance genes but does encode a wide array of putative genes directly related to DNA metabolism and replication. PCR analysis of a series of S. typhi isolates from Vietnam detected pHCM2-related DNA sequences in some S. typhi isolated before, but not after, 1994. Similar pHCM2-related sequences were also detected in S. typhi isolated from other regions of South East Asia and Pakistan but not elsewhere in the world.

    Plasmid 2002;47;3;159-71

  • The sterol-sensing domain: multiple families, a unique role?

    Kuwabara PE and Labouesse M

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The "sterol-sensing domain" (SSD) is conserved across phyla and is present in several membrane proteins, such as Patched (a Hedgehog receptor) and NPC-1 (the protein defective in Niemann-Pick type C1 disease). The role of the SSD is perhaps best understood from the standpoint of its involvement in cholesterol homeostasis. This article discusses how the SSD appears to function as a regulatory domain involved in linking vesicle trafficking and protein localization with such varied processes as cholesterol homeostasis, cell signalling and cytokinesis.

    Trends in genetics : TIG 2002;18;4;193-201

  • MaxBench: evaluation of sequence and structure comparison methods.

    Leplae R and Hubbard TJ

    Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    Summary: MaxBench is a web-based system available for evaluating the results of sequence and structure comparison methods, based on the SCOP protein domain classification. The system makes it easy for developers to both compare the overall performance of their methods to standard algorithms and investigate the results of individual comparisons.


    Bioinformatics (Oxford, England) 2002;18;3;494-5

  • The human homologue of unc-93 maps to chromosome 6q27 - characterisation and analysis in sporadic epithelial ovarian cancer.

    Liu Y, Dodds P, Emilion G, Mungall AJ, Dunham I, Beck S, Wells RS, Charnock FM and Ganesan TS

    Cancer Research UK Molecular Oncology Laboratories, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK.

    Background: In sporadic ovarian cancer, we have previously reported allele loss at D6S193 (62%) on chromosome 6q27, which suggested the presence of a putative tumour suppressor gene. Based on our data and that from another group, the minimal region of allele loss was between D6S264 and D6S149 (7.4 cM). To identify the putative tumour suppressor gene, we established a physical map initially with YACs and subsequently with PACs/BACs from D6S264 to D6S149. To accelerate the identification of genes, we sequenced the entire contig of approximately 1.1 Mb. Seven genes were identified within the region of allele loss between D6S264 and D6S149.

    Results: The human homologue of unc-93 (UNC93A) in C. elegans was identified to be within the interval of allele loss centromeric to D6S149. This gene is 24.5 kb and comprises of 8 exons. There are two transcripts with the shorter one due to splicing out of exon 4. It is expressed in testis, small intestine, spleen, prostate, and ovary. In a panel of 8 ovarian cancer cell lines, UNC93A expression was detected by RT-PCR which identified the two transcripts in 2/8 cell lines. The entire coding sequence was examined for mutations in a panel of ovarian tumours and ovarian cancer cell lines. Mutations were identified in exons 1, 3, 4, 5, 6 and 8. Only 3 mutations were identified specifically in the tumour. These included a c.452G>A (W151X) mutation in exon 3, c.676C>T (R226X) in exon 5 and c.1225G>A(V409I) mutation in exon 8. However, the mutations in exon 3 and 5 were also present in 6% and 2% of the normal population respectively. The UNC93A cDNA was shown to express at the cell membrane and encodes for a protein of 60 kDa.

    Conclusions: These results suggest that no evidence for UNC93A as a tumour suppressor gene in sporadic ovarian cancer has been identified and further research is required to evaluate its normal function and role in the pathogenesis of ovarian cancer.

    BMC genetics 2002;3;20

  • Physical and transcript map of the region between D6S264 and D6S149 on chromosome 6q27, the minimal region of allele loss in sporadic epithelial ovarian cancer.

    Liu Y, Emilion G, Mungall AJ, Dunham I, Beck S, Le Meuth-Metzinger VG, Shelling AN, Charnock FM and Ganesan TS

    ICRF Molecular Oncology Laboratories, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK.

    We have previously shown a high frequency of allele loss at D6S193 (62%) on chromosomal arm 6q27 in ovarian tumours and mapped the minimal region of allele loss between D6S297 and D6S264 (3 cM). We isolated and mapped a single non-chimaeric YAC (17IA12, 260-280 kb) containing D6S193 and D6S297. A further extended bacterial contig (between D6S264 and D6S149) has been established using PACs and BACs and a transcript map has been established. We have mapped six new markers to the YAC; three of them are ESTs (WI-15078, WI-8751, and TCP10). We have isolated three cDNA clones of EST WI-15078 and one clone contains a complete open reading frame. The sequence shows homology to a new member of the ribonuclease family. The other two clones are splice variants of this new gene. The gene is expressed ubiquitously in normal tissues. It is expressed in 4/8 ovarian cancer cell lines by Northern analysis. The gene encodes for a 40 kDa protein. Direct sequencing of the gene in all the eight ovarian cancer cell lines did not identify any mutations. Clonogenic assays were performed by transfecting the full-length gene in to ovarian cancer cell lines and no suppression of growth was observed.

    Oncogene 2002;21;3;387-99

  • SCOP database in 2002: refinements accommodate structural genomics.

    Lo Conte L, Brenner SE, Hubbard TJ, Chothia C and Murzin AG

    MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK.

    The SCOP (Structural Classification of Proteins) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. Protein domains in SCOP are grouped into species and hierarchically classified into families, superfamilies, folds and classes. Recently, we introduced a new set of features with the aim of standardizing access to the database, and providing a solid basis to manage the increasing number of experimental structures expected from structural genomics projects. These features include: a new set of identifiers, which uniquely identify each entry in the hierarchy; a compact representation of protein domain classification; a new set of parseable files, which fully describe all domains in SCOP and the hierarchy itself. These new features are reflected in the ASTRAL compendium. The SCOP search engine has also been updated, and a set of links to external resources added at the level of domain entries. SCOP can be accessed at

    Nucleic acids research 2002;30;1;264-7

  • The transcriptional program of meiosis and sporulation in fission yeast.

    Mata J, Lyne R, Burns G and Bähler J

    The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK.

    Sexual reproduction requires meiosis to produce haploid gametes, which in turn can fuse to regenerate a diploid organism. We have studied the transcriptional program that drives this developmental process in Schizosaccharomyces pombe using DNA microarrays. Here we show that hundreds of genes are regulated in successive waves of transcription that correlate with major biological events of meiosis and sporulation. Each wave is associated with specific promoter motifs. Clusters of neighboring genes (mostly close to telomeres) are co-expressed early in the process, which reflects a more global control of these genes. We find that two Atf-like transcription factors are essential for the expression of late genes and formation of spores, and identify dozens of potential Atf target genes. Comparison with the meiotic program of the distantly related Saccharomyces cerevisiae reveals an unexpectedly small shared meiotic transcriptome, suggesting that the transcriptional regulation of meiosis evolved independently in both species.

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    Nature genetics 2002;32;1;143-7

  • A candidate gene for congenital bilateral isolated ptosis identified by molecular analysis of a de novo balanced translocation.

    McMullan TW, Crolla JA, Gregory SG, Carter NP, Cooper RA, Howell GR and Robinson DO

    Southampton University School of Medicine, Human Genetics Division, Southampton General Hospital, Southampton SO16 6YD, UK.

    Ptosis is defined as drooping of the upper eyelid and can impair full visual acuity. It occurs in a number of forms including congenital bilateral isolated ptosis, which may be familial and for which two linkage groups are known on chromosomes 1p32-34.1 and Xq24-27.1. We describe the analysis of the chromosome breakpoints in a patient with congenital bilateral isolated ptosis and a de novo balanced translocation 46,XY,t(1;8)(p34.3;q21.12). Both breakpoints were localized by fluorescence in situ hybridisation with yeast artificial chromosomes, bacterial artificial chromosomes and P1 artificial chromosomes. The derived chromosomes were isolated by flow-sorting, amplified by degenerate oligonucleotide-primed polymerase chain reaction and analyzed by sequence tagged sites amplification to map the breakpoints at a resolution that enabled molecular characterization by DNA sequencing. The 1p breakpoint lies ~13 Mb distal to the previously reported linkage locus at 1p32-1p34.1 and does not disrupt a coding sequence, whereas the chromosome 8 breakpoint disrupts a gene homologous to the mouse zfh-4gene. Murine zfh-4 codes for a zinc finger homeodomain protein and is a transcription factor expressed in both muscle and nerve tissue. Human ZFH-4 is therefore a candidate gene for congenital bilateral isolated ptosis.

    Human genetics 2002;110;3;244-50

  • Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations.

    Meijers-Heijboer H, van den Ouweland A, Klijn J, Wasielewski M, de Snoo A, Oldenburg R, Hollestelle A, Houben M, Crepin E, van Veghel-Plandsoen M, Elstrodt F, van Duijn C, Bartels C, Meijers C, Schutte M, McGuffog L, Thompson D, Easton D, Sodha N, Seal S, Barfoot R, Mangion J, Chang-Claude J, Eccles D, Eeles R, Evans DG, Houlston R, Murday V, Narod S, Peretz T, Peto J, Phelan C, Zhang HX, Szabo C, Devilee P, Goldgar D, Futreal PA, Nathanson KL, Weber B, Rahman N, Stratton MR and CHEK2-Breast Cancer Consortium

    Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands.

    Mutations in BRCA1 and BRCA2 confer a high risk of breast and ovarian cancer, but account for only a small fraction of breast cancer susceptibility. To find additional genes conferring susceptibility to breast cancer, we analyzed CHEK2 (also known as CHK2), which encodes a cell-cycle checkpoint kinase that is implicated in DNA repair processes involving BRCA1 and p53 (refs 3,4,5). We show that CHEK2(*)1100delC, a truncating variant that abrogates the kinase activity, has a frequency of 1.1% in healthy individuals. However, this variant is present in 5.1% of individuals with breast cancer from 718 families that do not carry mutations in BRCA1 or BRCA2 (P = 0.00000003), including 13.5% of individuals from families with male breast cancer (P = 0.00015). We estimate that the CHEK2(*)1100delC variant results in an approximately twofold increase of breast cancer risk in women and a tenfold increase of risk in men. By contrast, the variant confers no increased cancer risk in carriers of BRCA1 or BRCA2 mutations. This suggests that the biological mechanisms underlying the elevated risk of breast cancer in CHEK2 mutation carriers are already subverted in carriers of BRCA1 or BRCA2 mutations, which is consistent with participation of the encoded proteins in the same pathway.

    Nature genetics 2002;31;1;55-9

  • Comparative ab initio prediction of gene structures using pair HMMs.

    Meyer IM and Durbin R

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    We present a novel comparative method for the ab initio prediction of protein coding genes in eukaryotic genomes. The method simultaneously predicts the gene structures of two un-annotated input DNA sequences which are homologous to each other and retrieves the subsequences which are conserved between the two DNA sequences. It is capable of predicting partial, complete and multiple genes and can align pairs of genes which differ by events of exon-fusion or exon-splitting. The method employs a probabilistic pair hidden Markov model. We generate annotations using our model with two different algorithms: the Viterbi algorithm in its linear memory implementation and a new heuristic algorithm, called the stepping stone, for which both memory and time requirements scale linearly with the sequence length. We have implemented the model in a computer program called DOUBLESCAN. In this article, we introduce the method and confirm the validity of the approach on a test set of 80 pairs of orthologous DNA sequences from mouse and human. More information can be found at:

    Bioinformatics (Oxford, England) 2002;18;10;1309-18

  • Membrane-bound progesterone receptors contain a cytochrome b5-like ligand-binding domain.

    Mifsud W and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    Background: Membrane-associated progesterone receptors (MAPRs) are thought to mediate a number of rapid cellular effects not involving changes in gene expression. They do not show sequence similarity to any of the classical steroid receptors. We were interested in identifying distant homologs of MAPR better to understand their biological roles.

    Results: We have identified MAPRs as distant homologs of cytochrome b5. We have also found regions homologous to cytochrome b5 in the mammalian HERC2 ubiquitin transferase proteins and a number of fungal chitin synthases.

    Conclusions: In view of these findings, we propose that the heme-binding cytochrome b5 domain served as a template for the evolution of membrane-associated binding pockets for non-heme ligands.

    Genome biology 2002;3;12;RESEARCH0068

  • Conditional inactivation of p63 by Cre-mediated excision.

    Mills AA, Qi Y and Bradley A

    Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA.

    Genesis (New York, N.Y. : 2000) 2002;32;2;138-41

  • Mutations in the LGI1/Epitempin gene on 10q24 cause autosomal dominant lateral temporal epilepsy.

    Morante-Redolat JM, Gorostidi-Pagola A, Piquer-Sirerol S, Sáenz A, Poza JJ, Galán J, Gesk S, Sarafidou T, Mautner VF, Binelli S, Staub E, Hinzmann B, French L, Prud'homme JF, Passarelli D, Scannapieco P, Tassinari CA, Avanzini G, Martí-Massó JF, Kluwe L, Deloukas P, Moschonas NK, Michelucci R, Siebert R, Nobile C, Pérez-Tur J and López de Munain A

    Unitat de Genètica Molecular, Institut de Biomedicina de València-CSIC, Jaume Roig 11, E-46010 València, Spain.

    Autosomal dominant lateral temporal epilepsy (EPT; OMIM 600512) is a form of epilepsy characterized by partial seizures, usually preceded by auditory signs. The gene for this disorder has been mapped by linkage studies to chromosomal region 10q24. Here we show that mutations in the LGI1 gene segregate with EPT in two families affected by this disorder. Both mutations introduce premature stop codons and thus prevent the production of the full-length protein from the affected allele. By immunohistochemical studies, we demonstrate that the LGI1 protein, which contains several leucine-rich repeats, is expressed ubiquitously in the neuronal cell compartment of the brain. Moreover, we provide evidence for genetic heterogeneity within this disorder, since several other families with a phenotype consistent with this type of epilepsy lack mutations in the LGI1 gene.

    Human molecular genetics 2002;11;9;1119-28

  • Cloning and characterization of the common fragile site FRA6F harboring a replicative senescence gene and frequently deleted in human tumors.

    Morelli C, Karayianni E, Magnanini C, Mungall AJ, Thorland E, Negrini M, Smith DI and Barbanti-Brodano G

    Department of Experimental and Diagnostic Medicine, Section of Microbiology and Center for Biotechnology, University of Ferrara, I-44100 Ferrara, Italy.

    The common fragile site FRA6F, located at 6q21, is an extended region of about 1200 kb, with two hot spots of breakage each spanning about 200 kb. Transcription mapping of the FRA6F region identified 19 known genes, 10 within the FRA6F interval and nine in a proximal or distal position. The nucleotide sequence of FRA6F is rich in repetitive elements (LINE1 and LINE2, Alu, MIR, MER and endogenous retroviral sequences) as well as in matrix attachment regions (MARs), and shows several DNA segments with increased helix flexibility. We found that tight clusters of stem-loop structures were localized exclusively in the two regions with greater frequency of breakage. Chromosomal instability at FRA6F probably depends on a complex interaction of different factors, involving regions of greater DNA flexibility and MARs. We propose an additional mechanism of fragility at FRA6F, based on stem-loop structures which may cause delay or arrest in DNA replication. A senescence gene likely maps within FRA6F, as suggested by detection of deletion and translocation breakpoints involving this fragile site in immortal human-mouse cell hybrids and in SV40-immortalized human fibroblasts containing a human chromosome 6 deleted at q21. Deletion breakpoints within FRA6F are common in several types of human leukemias and solid tumors, suggesting the presence of a tumor suppressor gene in the region. Moreover, a gene associated to hereditary schizophrenia maps within FRA6F. Therefore, FRA6F may represent a landmark for the identification and cloning of genes involved in senescence, leukemia, cancer and schizophrenia.

    Oncogene 2002;21;47;7266-76

  • The significance of performance ranking in CASP--response to Marti-Renom et al.

    Moult J, Fidelis K, Zemla A, Hubbard T and Tramontano A

    Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville 20850, USA.

    Structure (London, England : 1993) 2002;10;3;291-2; discussion 292-3

  • Initial sequencing and comparative analysis of the mouse genome.

    Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC and Lander ES

    Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

    Nature 2002;420;6915;520-62

  • InterPro: an integrated documentation resource for protein families, domains and functional sites.

    Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley R, Courcelle E, Durbin R, Falquet L, Fleischmann W, Gouzy J, Griffith-Jones S, Haft D, Hermjakob H, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Orchard S, Pagni M, Peyruc D, Ponting CP, Servant F, Sigrist CJ and InterPro Consortium

    EMBL Outstation, European Bioinformatics Institute, Hinxton, Cambridge, UK.

    The exponential increase in the submission of nucleotide sequences to the nucleotide sequence database by genome sequencing centres has resulted in a need for rapid, automatic methods for classification of the resulting protein sequences. There are several signature and sequence cluster-based methods for protein classification, each resource having distinct areas of optimum application owing to the differences in the underlying analysis methods. In recognition of this, InterPro was developed as an integrated documentation resource for protein families, domains and functional sites, to rationalise the complementary efforts of the individual protein signature database projects. The member databases - PRINTS, PROSITE, Pfam, ProDom, SMART and TIGRFAMs - form the InterPro core. Related signatures from each member database are unified into single InterPro entries. Each InterPro entry includes a unique accession number, functional descriptions and literature references, and links are made back to the relevant member database(s). Release 4.0 of InterPro (November 2001) contains 4,691 entries, representing 3,532 families, 1,068 domains, 74 repeats and 15 sites of post-translational modification (PTMs) encoded by different regular expressions, profiles, fingerprints and hidden Markov models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (2,141,621 InterPro hits from 586,124 SWISS-PROT and TrEMBL protein sequences). The database is freely accessible for text- and sequence-based searches.

    Briefings in bioinformatics 2002;3;3;225-35

  • Identification and characterization of a novel human brain-specific gene, homologous to S. scrofa tmp83.5, in the chromosome 10q24 critical region for temporal lobe epilepsy and spastic paraplegia.

    Nobile C, Hinzmann B, Scannapieco P, Siebert R, Zimbello R, Perez-Tur J, Sarafidou T, Moschonas NK, French L, Deloukas P, Ciccodicola A, Gesk S, Poza JJ, Lo Nigro C, Seri M, Schlegelberger B, Rosenthal A, Valle G, Lopez de Munain A, Tassinari CA and Michelucci R

    CNR-Centro di Studio per la Biologia e Fisiopatologia Muscolare, Dipartimento di Scienze Biomediche Sperimentali, Viale G. Colombo 3, 35121 Padova, Italy.

    We describe the structure, genomic organization, and some transcription features of a human brain-specific gene previously localized to the genomic region involved in temporal lobe epilepsy and spastic paraplegia on chromosome 10q24. The gene, which consists of six exons disseminated over 16 kb of genomic DNA, is highly homologous to the porcine tmp83.5 gene and encodes a putative transmembrane protein of 141 amino acids. Unlike its porcine homolog, from which two mRNAs with different 5'-sequences are transcribed, the human gene apparently encodes three mRNA species with 3'-untranslated regions of different sizes. Mutation analysis of its coding sequence in families affected with temporal lobe epilepsy or spastic paraplegia linked to 10q24 do not support the involvement of this gene in either diseases.

    Gene 2002;282;1-2;87-94

  • Epigenomics: genome-wide study of methylation phenomena.

    Novik KL, Nimmrich I, Genc B, Maier S, Piepenbrock C, Olek A and Beck S

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Epigenetics is one of the key areas of future research that can elucidate how genomes work. It combines genetics and the environment to address complex biological systems such as the plasticity of our genome. While all nucleated human cells carry the same genome, they express different genes at different times. Much of this is governed by epigenetic changes resulting in differential methylation of our genome--or different epigenomes. Individual studies over the past decades have already established the involvement of DNA methylation in imprinting, gene regulation, chromatin structure, genome stability and disease, especially cancer. Now, in the wake of the Human Genome Project (HGP), epigenetic phenomena can be studied genome-wide and are giving rise to a new field, epigenomics. Here, we review the current and future potential of this field and introduce the pilot study towards the Human Epigenome Project (HEP).

    Current issues in molecular biology 2002;4;4;111-28

  • Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

    Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, Yamanaka I, Kiyosawa H, Yagi K, Tomaru Y, Hasegawa Y, Nogami A, Schönbach C, Gojobori T, Baldarelli R, Hill DP, Bult C, Hume DA, Quackenbush J, Schriml LM, Kanapin A, Matsuda H, Batalov S, Beisel KW, Blake JA, Bradt D, Brusic V, Chothia C, Corbani LE, Cousins S, Dalla E, Dragani TA, Fletcher CF, Forrest A, Frazer KS, Gaasterland T, Gariboldi M, Gissi C, Godzik A, Gough J, Grimmond S, Gustincich S, Hirokawa N, Jackson IJ, Jarvis ED, Kanai A, Kawaji H, Kawasawa Y, Kedzierski RM, King BL, Konagaya A, Kurochkin IV, Lee Y, Lenhard B, Lyons PA, Maglott DR, Maltais L, Marchionni L, McKenzie L, Miki H, Nagashima T, Numata K, Okido T, Pavan WJ, Pertea G, Pesole G, Petrovsky N, Pillai R, Pontius JU, Qi D, Ramachandran S, Ravasi T, Reed JC, Reed DJ, Reid J, Ring BZ, Ringwald M, Sandelin A, Schneider C, Semple CA, Setou M, Shimada K, Sultana R, Takenaka Y, Taylor MS, Teasdale RD, Tomita M, Verardo R, Wagner L, Wahlestedt C, Wang Y, Watanabe Y, Wells C, Wilming LG, Wynshaw-Boris A, Yanagisawa M, Yang I, Yang L, Yuan Z, Zavolan M, Zhu Y, Zimmer A, Carninci P, Hayatsu N, Hirozane-Kishikawa T, Konno H, Nakamura M, Sakazume N, Sato K, Shiraki T, Waki K, Kawai J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Imotani K, Ishii Y, Itoh M, Kagawa I, Miyazaki A, Sakai K, Sasaki D, Shibata K, Shinagawa A, Yasunishi A, Yoshino M, Waterston R, Lander ES, Rogers J, Birney E, Hayashizaki Y, FANTOM Consortium and RIKEN Genome Exploration Research Group Phase I &amp; II Team

    [1] Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.

    Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.

    Nature 2002;420;6915;563-73

  • The Human Genome Project: a player's perspective.

    Olson MV

    Department of Medicine, UW Genome Center, University of Washington, Box 352145, Seattle, WA 98195, USA.

    Journal of molecular biology 2002;319;4;931-42

  • BRCA1 and BRCA2 germline mutations in Sardinian breast cancer families and their implications for genetic counseling.

    Palmieri G, Palomba G, Cossu A, Pisano M, Dedola MF, Sarobba MG, Farris A, Olmeo N, Contu A, Pasca A, Satta MP, Persico I, Carboni AA, Cossu-Rocca P, Contini M, Mangion J, Stratton MR and Tanda F

    Institute of Molecular Genetics, Consiglio Nazionale Ricerche, Alghero, Italy.

    Background: The Sardinian population is genetically homogeneous and could be useful in understanding better the genetics of a complex disease like breast cancer (BC).

    Using a screening assay based on a combination of single-strand conformation polymorphism, denaturing high-performance liquid chromatography and sequence analysis, 47 Sardinian families with three or more BC cases were screened for germline mutations in BRCA1 and BRCA2 genes.

    Results: Three BRCA1/2 germline sequence variants were identified. While BRCA2-Ile3412Val is a missense variant with unknown functional significance, BRCA2-8765delAG and BRCA1-Lys505ter are two deleterious mutations (due to their predicted effects on protein truncation), which were found in seven families (15%). BRCA2-8765delAG was found in six of eight (75%) BRCA1/2-positive families and seven of 501 (1.4%) unselected and consecutively collected BC patients. Prevalence of BRCA1/2 mutations in BC families was significantly correlated with the total number of female BCs (P <0.01) and increased by the presence of (i) at least one case of ovarian or male BC, or (ii) three generations affected, or (iii) bilateral BC.

    Conclusions: Identification of such features should address BC patients and their families to genetic counseling and BRCA1/2 mutational analysis. In addition, this is the first report of a detailed BRCA1/2 mutation screening in Sardinia, having immediate implications for the clinical management of BC families.

    Annals of oncology : official journal of the European Society for Medical Oncology / ESMO 2002;13;12;1899-907

  • The importance of complete genome sequences.

    Parkhill J

    Trends in microbiology 2002;10;5;219-20; author reply 220

  • Mutation of TBCE causes hypoparathyroidism-retardation-dysmorphism and autosomal recessive Kenny-Caffey syndrome.

    Parvari R, Hershkovitz E, Grossman N, Gorodischer R, Loeys B, Zecic A, Mortier G, Gregory S, Sharony R, Kambouris M, Sakati N, Meyer BF, Al Aqeel AI, Al Humaidan AK, Al Zanhrani F, Al Swaid A, Al Othman J, Diaz GA, Weiner R, Khan KT, Gordon R, Gelb BD and HRD/Autosomal Recessive Kenny-Caffey Syndrome Consortium

    Department of Developmental Molecular Genetics, Soroka Medical Center and Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva 84105, Israel.

    The syndrome of congenital hypoparathyroidism, mental retardation, facial dysmorphism and extreme growth failure (HRD or Sanjad-Sakati syndrome; OMIM 241410) is an autosomal recessive disorder reported almost exclusively in Middle Eastern populations. A similar syndrome with the additional features of osteosclerosis and recurrent bacterial infections has been classified as autosomal recessive Kenny-Caffey syndrome (AR-KCS; OMIM 244460). Both traits have previously been mapped to chromosome 1q43-44 (refs 5,6) and, despite the observed clinical variability, share an ancestral haplotype, suggesting a common founder mutation. We describe refinement of the critical region to an interval of roughly 230 kb and identification of deletion and truncation mutations of TBCE in affected individuals. The gene TBCE encodes one of several chaperone proteins required for the proper folding of alpha-tubulin subunits and the formation of alpha-beta-tubulin heterodimers. Analysis of diseased fibroblasts and lymphoblastoid cells showed lower microtubule density at the microtubule-organizing center (MTOC) and perturbed microtubule polarity in diseased cells. Immunofluorescence and ultrastructural studies showed disturbances in subcellular organelles that require microtubules for membrane trafficking, such as the Golgi and late endosomal compartments. These findings demonstrate that HRD and AR-KCS are chaperone diseases caused by a genetic defect in the tubulin assembly pathway, and establish a potential connection between tubulin physiology and the development of the parathyroid.

    Nature genetics 2002;32;3;448-52

  • Mapping and identification of essential gene functions on the X chromosome of Drosophila.

    Peter A, Schöttler P, Werner M, Beinert N, Dowe G, Burkert P, Mourkioti F, Dentzer L, He Y, Deak P, Benos PV, Gatt MK, Murphy L, Harris D, Barrell B, Ferraz C, Vidal S, Brun C, Demaille J, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Borkova D, Miñana B, Kafatos FC, Bolshakov S, Sidén-Kiamos I, Papagiannakis G, Spanos L, Louis C, Madueño E, de Pablos B, Modolell J, Bucheton A, Callister D, Campbell L, Henderson NS, McMillan PJ, Salles C, Tait E, Valenti P, Saunders RD, Billaud A, Pachter L, Klapper R, Janning W, Glover DM, Ashburner M, Bellen HJ, Jäckle H and Schäfer U

    Max-Planck-Institut für Biophysikalische Chemie, Abt. Molekulare Entwicklungsbiologie, Am Fassberg, 37077 Göttingen, Germany.

    The Drosophila melanogaster genome consists of four chromosomes that contain 165 Mb of DNA, 120 Mb of which are euchromatic. The two Drosophila Genome Projects, in collaboration with Celera Genomics Systems, have sequenced the genome, complementing the previously established physical and genetic maps. In addition, the Berkeley Drosophila Genome Project has undertaken large-scale functional analysis based on mutagenesis by transposable P element insertions into autosomes. Here, we present a large-scale P element insertion screen for vital gene functions and a BAC tiling map for the X chromosome. A collection of 501 X-chromosomal P element insertion lines was used to map essential genes cytogenetically and to establish short sequence tags (STSs) linking the insertion sites to the genome. The distribution of the P element integration sites, the identified genes and transcription units as well as the expression patterns of the P-element-tagged enhancers is described and discussed.

    EMBO reports 2002;3;1;34-8

  • Restricting genome data won't stop bioterrorism.

    Read TD and Parkhill J

    Nature 2002;417;6887;379

  • Nucleotide sequence, genome organisation and phylogenetic analysis of Indian citrus ringspot virus. Brief report.

    Rustici G, Milne RG and Accotto GP

    Istituto di Virologia Vegetale, Consiglio Nazionale delle Ricerche, Torino, Italy.

    The sequence of the single-stranded RNA genome of Indian citrus ringspot virus (ICRSV) consists of 7560 nucleotides. It contains six open reading frames (ORFs) which encode putative proteins of 187.3, 25, 12, 6.4, 34 and 23 kDa respectively. ORF1 encodes a polypeptide that contains all the elements of a replicase; ORFs 2, 3 and 4 compose a triple-gene block; ORF5 encodes the capsid protein; the function of ORF6 is unknown. Phylogenetic analysis of the complete genome and each ORF separately, and database searches indicate that ICRSV, though showing some similarities to potexviruses, is significantly different, as in the presence of ORF6, the genome and CP sizes, and particle morphology. These differences favour its inclusion in a new virus genus.

    Archives of virology 2002;147;11;2215-24

  • Characterisation of transcripts from the human cytomegalovirus genes TRL7, UL20a, UL36, UL65, UL94, US3 and US34.

    Scott GM, Barrell BG, Oram J and Rawlinson WD

    Virology Division, Department of Microbiology SEALS, Prince of Wales Hospital, Randwick NSW, Australia.

    The genome of human cytomegalovirus (HCMV) has been studied extensively in some regions, but not others. In this study, transcripts of the genome were further characterised for open reading frames (ORFs) TRL7, UL36, UL65, UL94, US3 and US34, and for the previously unrecognised ORF, UL20a. Reverse transcription-PCR demonstrated the presence of spliced transcripts from the putative glycoprotein gene, UL20a, at early and late times post-infection. US3 full-length and spliced transcripts, including a previously unidentified transcript (US3ii), were described at immediate early times. Sequencing of the complete ORFs of UL20a and US3 from 21 clinical isolates showed that US3 is well conserved in all isolates (97-100% identity), whereas UL20a shows more variation at the nucleotide level, with 90-100% identity. The limits of transcription, and splice donor and acceptor sequences for UL20a and US3 were conserved in all isolates, indicating likely conservation of mRNA splicing patterns. Sequencing a late cDNA library identified the limits of transcription for ORFs TRL7, UL94 and US34 and transcription from the TRL7 ORF was confirmed by northern blotting. Transcripts were found that were congruent with UL36 and UL65, but these differed in the limits previously predicted for these ORFs. These findings show the variation between predicted and actual transcription and indicate the complex nature of transcription from HCMV ORFs.

    Virus genes 2002;24;1;39-48

  • Genome giants.

    Sebaihia M, Bentley S, Thomson N, Holden M and Parkhill J

    Trends in microbiology 2002;10;7;309-10

  • Tales of the unexpected.

    Sebaihia M, Bentley S, Thomson N, Holden M and Parkhill J

    Trends in microbiology 2002;10;6;261-2

  • Plant protein families and their relationships to food allergy.

    Shewry PR, Beaudoin F, Jenkins J, Griffiths-Jones S and Mills EN

    Long Ashton Research Station, Department of Agricultural Sciences, University of Bristol, UK.

    The analysis of plant proteins has a long and distinguished history, with work dating back over 250 years. Much of the work has focused on seed proteins, which are important in animal nutrition and food processing. Early studies classified plant proteins into groups based on solubility ('Osborne fractions') or protein function. More recently, families have been defined based on stuctural and evolutionary relationships. One of the most widespread groups of plant proteins is the prolaminin superfamily, which comprises cereal seed storage proteins, a range of low-molecular-mass sulphur-rich proteins (many of which are located in seeds) and some cell wall glycoproteins. This superfamily includes several major types of plant allergen: non-specific lipid transfer proteins, cereal seed inhibitors of alpha-amylase and/or trypsin, and 2 S albumin storage proteins of dicotyledonous seeds.

    Biochemical Society transactions 2002;30;Pt 6;906-10

  • BRCA1 and BRCA2 mutation frequency in women evaluated in a breast cancer risk evaluation clinic.

    Shih HA, Couch FJ, Nathanson KL, Blackwood MA, Rebbeck TR, Armstrong KA, Calzone K, Stopfer J, Seal S, Stratton MR and Weber BL

    Abramson Family Cancer Research Institute and Department of Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, PA, USA.

    Purpose: To determine the prevalence of BRCA1 and BRCA2 mutations in families identified in a breast cancer risk evaluation clinic.

    One hundred sixty-four families seeking breast cancer risk evaluation were screened for coding region mutations in BRCA1 and BRCA2 by conformation-sensitive gel electrophoresis and DNA sequencing.

    Results: Mutations were identified in 37 families (22.6%); 28 (17.1%) had BRCA1 mutations and nine (5.5%) had BRCA2 mutations. The Ashkenazi Jewish founder mutations 185delAG and 5382insC (BRCA1) were found in 10 families (6.1%). However, 6174delT (BRCA2) was found in only one family (0.6%) despite estimates of equal frequency in the Ashkenazi population. In contrast to other series, the average age of breast cancer diagnosis was earlier in BRCA2 mutation carriers (32.1 years) than in women with BRCA1 mutations (37.6 years, P =.028). BRCA1 mutations were detected in 20 (45.5%) of 44 families with ovarian cancer and 12 (75%) of 16 families with both breast and ovarian cancer in a single individual. Significantly fewer BRCA2 mutations (two [4.5%] of 44) were detected in families with ovarian cancer (P =.01). Eight families had male breast cancer; one had a BRCA1 mutation and three had BRCA2 mutations.

    Conclusion: BRCA1 mutations were three times more prevalent than BRCA2 mutations. Breast cancer diagnosis before 50 years of age, ovarian cancer, breast and ovarian cancer in a single individual, and male breast cancer were all significantly more common in families with BRCA1 and BRCA2 mutations, but none of these factors distinguished between BRCA1 and BRCA2 mutations. Evidence for reduced breast cancer penetrance associated with the BRCA2 mutation 6174delT was noted.

    Funded by: NCI NIH HHS: CA57601, CA84030

    Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2002;20;4;994-9

  • Pre-counseling education materials for BRCA testing: does tailoring make a difference?

    Skinner CS, Schildkraut JM, Berry D, Calingaert B, Marcom PK, Sugarman J, Winer EP, Iglehart JD, Futreal PA and Rimer BK

    Duke University Medical Center, Department of Community and Family Medicine and the Duke Comprehensive Cancer Center, Durham, NC 27710, USA.

    Although tailored print materials (TPMs) have been assessed for a variety of behavioral targets, their effectiveness as decision aids for genetic testing had not been evaluated at the time this study began. We compared TPMs and non-tailored print material (NPMs) that included similar content about genetic testing for breast and ovarian cancer susceptibility. TPMs were prepared especially for an individual based on information from and about her. We mailed baseline surveys to 461 women referred by physicians or identified through a tumor registry. All had personal and family histories of breast and/or ovarian cancer and, on the basis of these histories, an estimated > or =10% probability of carrying a mutation in the breast/ovarian cancer genes BRCA1 or BRCA2. The 325 (70%) who responded were randomly assigned to receive TPM or NPM. Followup surveys, mailed 2 weeks following receipt of print materials, were returned by 262 women (81% of baseline responders). Participants were predominately white (94%) and well-educated (50% college graduates). The mean age was 49 years. At follow-up, TPM recipients exhibited significantly greater improvement in percent of correct responses for the 13-item true/false measure of knowledge (24% increase for TPM vs. 16% for NPM; p < 0.0001) and significantly less over-estimation of risk of being a mutation carrier (40% TPM group overestimated vs. 70% NPM; p < 0.0001). Anxiety did not differ significantly between groups. Reactions to materials differed on two items: "seemed to be prepared just for me" (76% TPM vs. 52% NPM; p < 0.001) and "told me what I wanted to know about BRCA1 and 2 testing" (98% TPM vs. 91% NPM; p < 0.05). TPMs showed an advantage in increasing knowledge and enhancing accuracy of perceived risk. Both are critical components of informed decision making.

    Funded by: NCI NIH HHS: 2P50CA68438

    Genetic testing 2002;6;2;93-105

  • The Srk1 protein kinase is a target for the Sty1 stress-activated MAPK in fission yeast.

    Smith DA, Toone WM, Chen D, Bahler J, Jones N, Morgan BA and Quinn J

    School of Biochemistry and Genetics, Medical School, University of Newcastle upon Tyne, Newcastle upon Tyne NE2 4HH, United Kingdom.

    The fission yeast stress-activated Sty1/Spc1 MAPK pathway responds to a similar range of stresses as do the mammalian p38 and SAPK/JNK MAPK pathways. In addition, sty1(-) cells are sterile and exhibit a G(2) cell cycle delay, indicating additional roles of Sty1 in meiosis and cell cycle progression. To identify novel proteins involved in stress responses, a microarray analysis of the Schizosaccharomyces pombe genome was performed to find genes that are up-regulated following exposure to stress in a Sty1-dependent manner. One such gene identified, srk1(+) (Sty1-regulated kinase 1), encodes a putative serine/threonine kinase homologous to mammalian calmodulin kinases. At the C terminus of Srk1 is a putative MAPK binding motif similar to that in the p38 substrates, MAPK-activated protein kinases 2 and 3. Indeed, we find that Srk1 is present in a complex with the Sty1 MAPK and is directly phosphorylated by Sty1. Furthermore, upon stress, Srk1 translocates from the cytoplasm to the nucleus in a process that is dependent on the Sty1 MAPK. Finally, we show that Srk1 has a role in regulating meiosis in fission yeast; following nitrogen limitation, srk1(-) cells enter meiosis significantly faster than wild-type cells and overexpression of srk1(+) inhibits the nitrogen starvation-induced arrest in G(1).

    Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118

    The Journal of biological chemistry 2002;277;36;33411-21

  • The Bioperl toolkit: Perl modules for the life sciences.

    Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD and Birney E

    University Program in Genetics, Duke University, Durham, North Carolina 27710, USA.

    The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

    Funded by: NHGRI NIH HHS: 1 K32 HG00056, HG00739, K22 HG-00064-01, P41HG02223; NIGMS NIH HHS: T32 GM07754-22

    Genome research 2002;12;10;1611-8

  • The novel EPTP repeat defines a superfamily of proteins implicated in epileptic disorders.

    Staub E, Pérez-Tur J, Siebert R, Nobile C, Moschonas NK, Deloukas P and Hinzmann B

    metaGen Pharmaceuticals GmbH, Oudenarder Strasse 16, D-13347 Berlin, Germany.

    Recent studies suggest that mutations in the LGI1/Epitempin gene cause autosomal dominant lateral temporal epilepsy. This gene encodes a protein of unknown function, which we postulate is secreted. The LGI1 protein has leucine-rich repeats in the N-terminal sequence and a tandem repeat (which we named EPTP) in its C-terminal region. A redefinition of the C-terminal repeat and the application of sensitive sequence analysis methods enabled us to define a new superfamily of proteins carrying varying numbers of the novel EPTP repeats in combination with various extracellular domains. Genes encoding proteins of this family are located in genomic regions associated with epilepsy and other neurological disorders.

    Trends in biochemical sciences 2002;27;9;441-4

  • Physical and transcript map of the hereditary prostate cancer region at xq27.

    Stephan DA, Howell GR, Teslovich TM, Coffey AJ, Smith L, Bailey-Wilson JE, Malechek L, Gildea D, Smith JR, Gillanders EM, Schleutker J, Hu P, Steingruber HE, Dhami P, Robbins CM, Makalowska I, Carpten JD, Sood R, Mumm S, Reinbold R, Bonner TI, Baffoe-Bonnie A, Bubendorf L, Heiskanen M, Kallioneimi OP, Baxevanis AD, Joseph SS, Zucchi I, Burk RD, Isaacs W, Ross MT and Trent JM

    Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

    We have recently mapped a locus for hereditary prostate cancer (termed HPCX) to the long arm of the X chromosome (Xq25-q27) through a genome-wide linkage study. Here we report the construction of an approximately 9-Mb sequence-ready bacterial clone contig map of Xq26.3-q27.3. The contig was constructed by screening BAC/PAC libraries with markers spaced at approximately 85-kb intervals. We identified overlapping clones by end-sequencing framework clones to generate 407 new sequence-tagged sites, followed by PCR verification of overlaps. Contig assembly was based on clone restriction fingerprinting and the landmark information. We identified a minimal overlap contig for genomic sequencing, which has yielded 7.7 Mb of finished sequence and 1.5 Mb of draft sequence. The transcriptional mapping effort localized 57 known and predicted genes by database searching, STS content mapping, and sequencing, followed by sequence annotation. These transcriptional units represent candidate genes for HPCX and multiple other hereditary diseases at Xq26.3-q27.3.

    Genomics 2002;79;1;41-50

  • A targeted X-linked CMV-Cre line.

    Su H, Mills AA, Wang X and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.

    Genesis (New York, N.Y. : 2000) 2002;32;2;187-8

  • Escherichia coli aconitases and oxidative stress: post-transcriptional regulation of sodA expression.

    Tang Y, Quail MA, Artymiuk PJ, Guest JR and Green J

    The Krebs Institute for Biomolecular Research, Department of Molecular Biology and Biotechnology, University of Sheffield, Western Bank, Sheffield S10 2TN, UK.

    Escherichia coli possesses two aconitases, a stationary-phase enzyme (AcnA), which is induced by iron and oxidative stress, and a major but less stable enzyme (AcnB), synthesized during exponential growth. In addition to the catalytic activities of the holo-proteins, the apo-proteins function as post-transcriptional regulators by site-specific binding to acn mRNAs. Thus, it has been suggested that inactivation of the enzymes could mediate a rapidly reacting post-transcriptional component of the bacterial oxidative stress response. Here it is shown that E. coli acn mutants are hypersensitive to the redox-stress reagents H(2)O(2) and methyl viologen. Proteomic analyses further revealed that the level of superoxide dismutase (SodA) is enhanced in acnB and acnAB mutants, and by exposure to methyl viologen. The amounts of other proteins, including thioredoxin reductase, 2-oxoglutarate dehydrogenase, succinyl-CoA synthetase and chaperone proteins, were also affected in the acn mutants. The altered patterns of sodA expression were confirmed in studies with sodA-lacZ reporter strains. Quantitative Northern blotting indicated that AcnA enhances the stability of the sodA transcript, whereas AcnB lowers its stability. Direct evidence that the apo-proteins have positive (AcnA) and negative (AcnB) effects on SodA synthesis was obtained from in vitro transcription-translation experiments. It is suggested that the aconitase proteins of E. coli serve as a protective buffer against the basal level of oxidative stress that accompanies aerobic growth by acting as a sink for reactive oxygen species and by modulating translation of the sodA transcript.

    Microbiology (Reading, England) 2002;148;Pt 4;1027-37

  • A human TAPBP (TAPASIN)-related gene, TAPBP-R.

    Teng MS, Stephens R, Du Pasquier L, Freeman T, Lindquist JA and Trowsdale J

    Immunology Division, Department of Pathology, Cambridge, GB.

    TAPASIN, a V-C1 (variable-constant) immunoglobulin superfamily (IgSF) molecule that links MHC class I molecules to the transporter associated with antigen processing (TAP) in the endoplasmic reticulum (ER) is encoded by the TAPBP gene, located near to the MHC at 6p21.3. A related gene was identified at chromosome position 12p13.3 between the CD27 and VAMP1 genes near a group of MHC-paralogous loci. The gene, which we have called TAPBP-R (R for related), also encodes a member of the IgSF, TAPASIN-R. Its putative product contains similar structural motifs to TAPASIN, with some marked differences, especially in the V domain, transmembrane and cytoplasmic regions. By using the mouse ortholog to screen tissue, we revealed that the TAPBP-R gene was broadly expressed. Sub-cellular localization showed that the bulk of TAPASIN-R is located within the ER but biotinylation experiments were consistent with some expression at thecell surface. TAPASIN-R lacks an obvious ER retention signal. The function of TAPASIN-R will be of interest in regards to the evolution of the immune system as well as antigen processing.

    European journal of immunology 2002;32;4;1059-68

  • Requirement for downregulation of kreisler during late patterning of the hindbrain.

    Theil T, Ariza-McNaughton L, Manzanares M, Brodie J, Krumlauf R and Wilkinson DG

    Division of Developmental Neurobiology, National Institute for Medical Research, The Ridgeway, London NW7 1AA, UK.

    Pattern formation in the hindbrain is governed by a segmentation process that provides the basis for the organisation of cranial motor nerves. A cascade of transcriptional activators, including the bZIP transcription factor encoded by the kreisler gene controls this segmentation process. In kreisler mutants, r5 fails to form and this correlates with abnormalities in the neuroanatomical organisation of the hindbrain. Studies of Hox gene regulation suggest that kreisler may regulate the identity as well as the formation of r5, but such a role cannot be detected in kreisler mutants since r5 is absent. To gain further insights into the function of kreisler we have generated transgenic mice in which kreisler is ectopically expressed in r3 and for an extended period in r5. In these transgenic mice, the Fgf3, Krox20, Hoxa3 and Hoxb3 genes have ectopic or prolonged expression domains in r3, indicating that it acquires molecular characteristics of r5. Prolonged kreisler expression subsequently causes morphological alterations of r3/r5 that are due to an inhibition of neuronal differentiation and migration from the ventricular zone to form the mantle layer. We find that these alterations in r5 correlate with an arrest of facial branchiomotor neurone migration from r4 into the caudal hindbrain, which is possibly due to the deficiency in the mantle layer through which they normally migrate. We propose that the requirement for the downregulation of segmental kreisler expression prior to neuronal differentiation reflects the stage-specific roles of this gene and its targets.

    Development (Cambridge, England) 2002;129;6;1477-85

  • An integrated, functionally annotated gene map of the DXS8026-ELK1 interval on human Xp11.3-Xp11.23: potential hotspot for neurogenetic disorders.

    Thiselton DL, McDowall J, Brandau O, Ramser J, d'Esposito F, Bhattacharya SS, Ross MT, Hardcastle AJ and Meindl A

    Department of Molecular Genetics, Institute of Ophthalmology, University College, London, EC1V 9EL, UK.

    Human chromosome Xp11.3-Xp11.23 encompasses the map location for a growing number of diseases with a genetic basis or genetic component. These include several eye disorders, syndromic and nonsyndromic forms of X-linked mental retardation (XLMR), X-linked neuromuscular diseases and susceptibility loci for schizophrenia, type 1 diabetes, and Graves' disease. We have constructed an approximately 2.7-Mb high-resolution physical map extending from DXS8026 to ELK1, corresponding to a genetic distance of approximately 5.5 cM. A combination of chromosome walking and sequence-tagged site (STS)-content mapping resulted in an integrated framework and transcript map, precisely positioning 10 polymorphic microsatellites (one of which is novel), 16 ESTs, and 12 known genes (RP2, PCTK1, UHX1, UBE1, RBM10, ZNF157, SYN1, ARAF1, TIMP1, PFC, ELK1, UXT). The composite map is currently anchored with 89 STSs to give an average resolution of approximately 1 STS every 30 kb. By a combination of EST database searches and in silico detection of UniGene clusters within genomic sequence generated from this template map, we have mapped several novel genes within this interval: a Na+/H+ exchanger (SLC9A7), at least two zincfinger transcription factors (KIAA0215 and Hs.68318), carbohydrate sulfotransferase-7 (CHST7), regucalcin (RGN), inactivation-escape-1 (INE1), the human ortholog of mouse neuronal protein 15.6, and four putative novel genes. Further genomic analysis enabled annotation of the sequence interval with 20 predicted pseudogenes and 21 UniGene clusters of unknown function. The combined PAC/BAC transcript map and YAC scaffold presented here clarifies previously conflicting data for markers and genes within the Xp11.3-Xp11.23 interval and provides a powerful integrated resource for functional characterization of this clonally unstable, yet gene-rich and clinically significant region of proximal Xp.

    Genomics 2002;79;4;560-72

  • Evaluation of linkage of breast cancer to the putative BRCA3 locus on chromosome 13q21 in 128 multiple case families from the Breast Cancer Linkage Consortium.

    Thompson D, Szabo CI, Mangion J, Oldenburg RA, Odefrey F, Seal S, Barfoot R, Kroeze-Jansema K, Teare D, Rahman N, Renard H, Mann G, Hopper JL, Buys SS, Andrulis IL, Senie R, Daly MB, West D, Ostrander EA, Offit K, Peretz T, Osorio A, Benitez J, Nathanson KL, Sinilnikova OM, Olàh E, Bignon YJ, Ruiz P, Badzioch MD, Vasen HF, Futreal AP, Phelan CM, Narod SA, Lynch HT, Ponder BA, Eeles RA, Meijers-Heijboer H, Stoppa-Lyonnet D, Couch FJ, Eccles DM, Evans DG, Chang-Claude J, Lenoir G, Weber BL, Devilee P, Easton DF, Goldgar DE, Stratton MR and KConFab Consortium

    CRC Genetic Epidemiology Unit, Strangeways Research Laboratories, University of Cambridge, Cambridge CB1 4RN, United Kingdom.

    The known susceptibility genes for breast cancer, including BRCA1 and BRCA2, only account for a minority of the familial aggregation of the disease. A recent study of 77 multiple case breast cancer families from Scandinavia found evidence of linkage between the disease and polymorphic markers on chromosome 13q21. We have evaluated the contribution of this candidate "BRCA3" locus to breast cancer susceptibility in 128 high-risk breast cancer families of Western European ancestry with no identified BRCA1 or BRCA2 mutations. No evidence of linkage was found. The estimated proportion (alpha) of families linked to a susceptibility locus at D13S1308, the location estimated by Kainu et al. [(2000) Proc. Natl. Acad. Sci. USA 97, 9603-9608], was 0 (upper 95% confidence limit 0.13). Adjustment for possible bias due to selection of families on the basis of linkage evidence at BRCA2 did not materially alter this result (alpha = 0, upper 95% confidence limit 0.18). The proportion of linked families reported by Kainu et al. (0.65) is excluded with a high degree of confidence in our dataset [heterogeneity logarithm of odds (HLOD) at alpha = 0.65 was -11.0]. We conclude that, if a susceptibility gene does exist at this locus, it can only account for a small proportion of non-BRCA1/2 families with multiple cases of early-onset breast cancer.

    Funded by: NCI NIH HHS: CA69398, CA69417, CA69446, CA69467, CA69631, CA69638

    Proceedings of the National Academy of Sciences of the United States of America 2002;99;2;827-31

  • Sibling rivalry.

    Thomson N, Sebaihia M, Cerdeño-Tárraga A and Parkhill J

    Trends in microbiology 2002;10;9;396-7

  • Microbial genomics: spot the difference. . .

    Thomson NR, Sebaihia M and Parkhill J

    Trends in microbiology 2002;10;11;489-90

  • p53 mutant mice that display early ageing-associated phenotypes.

    Tyner SD, Venkatachalam S, Choi J, Jones S, Ghebranious N, Igelmann H, Lu X, Soron G, Cooper B, Brayton C, Park SH, Thompson T, Karsenty G, Bradley A and Donehower LA

    Cell and Molecular Biology Program, Baylor College of Medicine, Houston, TX 77030, USA.

    The p53 tumour suppressor is activated by numerous stressors to induce apoptosis, cell cycle arrest, or senescence. To study the biological effects of altered p53 function, we generated mice with a deletion mutation in the first six exons of the p53 gene that express a truncated RNA capable of encoding a carboxy-terminal p53 fragment. This mutation confers phenotypes consistent with activated p53 rather than inactivated p53. Mutant (p53+/m) mice exhibit enhanced resistance to spontaneous tumours compared with wild-type (p53+/+) littermates. As p53+/m mice age, they display an early onset of phenotypes associated with ageing. These include reduced longevity, osteoporosis, generalized organ atrophy and a diminished stress tolerance. A second line of transgenic mice containing a temperature-sensitive mutant allele of p53 also exhibits early ageing phenotypes. These data suggest that p53 has a role in regulating organismal ageing.

    Nature 2002;415;6867;45-53

  • Tools for targeted manipulation of the mouse genome.

    van der Weyden L, Adams DJ and Bradley A

    The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1SA, United Kingdom.

    In the postgenomic era the mouse will be central to the challenge of ascribing a function to the 40,000 or so genes that constitute our genome. In this review, we summarize some of the classic and modern approaches that have fueled the recent dramatic explosion in mouse genetics. Together with the sequencing of the mouse genome, these tools will have a profound effect on our ability to generate new and more accurate mouse models and thus provide a powerful insight into the function of human genes during the processes of both normal development and disease.

    Physiological genomics 2002;11;3;133-64

  • Cancer: stuck at first base.

    van der Weyden L, Jonkers J and Bradley A

    Nature 2002;419;6903;127-8

  • The mosaic structure of variation in the laboratory mouse genome.

    Wade CM, Kulbokas EJ, Kirby AW, Zody MC, Mullikin JC, Lander ES, Lindblad-Toh K and Daly MJ

    Whitehead Institute for Biomedical Research and Whitehead/MIT Center for Genome Research, 9 Cambridge Center, Cambridge, Massachusetts 02139, USA.

    Most inbred laboratory mouse strains are known to have originated from a mixed but limited founder population in a few laboratories. However, the effect of this breeding history on patterns of genetic variation among these strains and the implications for their use are not well understood. Here we present an analysis of the fine structure of variation in the mouse genome, using single nucleotide polymorphisms (SNPs). When the recently assembled genome sequence from the C57BL/6J strain is aligned with sample sequence from other strains, we observe long segments of either extremely high (approximately 40 SNPs per 10 kb) or extremely low (approximately 0.5 SNPs per 10 kb) polymorphism rates. In all strain-to-strain comparisons examined, only one-third of the genome falls into long regions (averaging >1 Mb) of a high SNP rate, consistent with estimated divergence rates between Mus musculus domesticus and either M. m. musculus or M. m. castaneus. These data suggest that the genomes of these inbred strains are mosaics with the vast majority of segments derived from domesticus and musculus sources. These observations have important implications for the design and interpretation of positional cloning experiments.

    Nature 2002;420;6915;574-8

  • Molecular mechanisms governing Pcdh-gamma gene expression: evidence for a multiple promoter and cis-alternative splicing model.

    Wang X, Su H and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

    The genomic architecture of protocadherin (Pcdh) gene clusters is remarkably similar to that of the immunoglobulin and T cell receptor gene clusters, and can potentially provide significant molecular diversity. Pcdh genes are abundantly expressed in the central nervous system. These molecules are primary candidates for establishing specific neuronal connectivity. Despite the extensive analyses of the genomic structure of both human and mouse Pcdh gene clusters, the definitive molecular mechanisms that control Pcdh gene expression are still unknown. Four theories have been proposed, including (1) DNA recombination followed by cis-splicing, (2) single promoter and cis-alternative splicing, (3) multiple promoters and cis-alternative splicing, and (4) multiple promoters and trans-splicing. Using a combination of molecular and genetic analyses, we evaluated the four models at the Pcdh-gamma locus. Our analysis provides evidence that the transcription of individual Pcdh-gamma genes is under the control of a distinct but related promoter upstream of each Pcdh-gamma variable exon, and posttranscriptional processing of each Pcdh-gamma transcript is predominantly mediated through cis-alternative splicing.

    Genes & development 2002;16;15;1890-905

  • Gamma protocadherins are required for survival of spinal interneurons.

    Wang X, Weiner JA, Levi S, Craig AM, Bradley A and Sanes JR

    Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.

    The murine genome contains approximately 70 protocadherin (Pcdh) genes. Many are expressed in the nervous system, suggesting that Pcdhs may specify neuronal connectivity. Here, we analyze the 22 contiguous genes of the Pcdh-gamma cluster. Individual neurons express subsets of Pcdh-gamma genes. Pcdh-gamma proteins are present in most neurons and associated with, but not confined to, synapses. Early steps in neuronal migration, axon outgrowth, and synapse formation proceed in mutant mice lacking all 22 Pcdh-gamma genes. At late embryonic stages, however, dramatic neurodegeneration leads to neonatal death. In mutant spinal cord, many interneurons are lost, but sensory and motor neurons are relatively spared. In cultures from mutant spinal cord, neurons differentiate and form synapses but then die. Thus, Pcdh-gamma genes are dispensable for at least some aspects of connectivity but required for survival of specific neuronal types.

    Neuron 2002;36;5;843-54

  • On the sequencing of the human genome.

    Waterston RH, Lander ES and Sulston JE

    Genome Sequencing Center, Washington University, Saint Louis, MO 63108, USA.

    Two recent papers using different approaches reported draft sequences of the human genome. The international Human Genome Project (HGP) used the hierarchical shotgun approach, whereas Celera Genomics adopted the whole-genome shotgun (WGS) approach. Here, we analyze whether the latter paper provides a meaningful test of the WGS approach on a mammalian genome. In the Celera paper, the authors did not analyze their own WGS data. Instead, they decomposed the HGP's assembled sequence into a "perfect tiling path", combined it with their WGS data, and assembled the merged data set. To study the implications of this approach, we perform computational analysis and find that a perfect tiling path with 2-fold coverage is sufficient to recover virtually the entirety of a genome assembly. We also examine the manner in which the assembly was anchored to the human genome and conclude that the process primarily depended on the HGP's sequence-tagged site maps, BAC maps, and clone-based sequences. Our analysis indicates that the Celera paper provides neither a meaningful test of the WGS approach nor an independent sequence of the human genome. Our analysis does not imply that a WGS approach could not be successfully applied to assemble a draft sequence of a large mammalian genome, but merely that the Celera paper does not provide such evidence.

    Proceedings of the National Academy of Sciences of the United States of America 2002;99;6;3712-6

  • The genome sequence of Schizosaccharomyces pombe.

    Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S, Basham D, Bowman S, Brooks K, Brown D, Brown S, Chillingworth T, Churcher C, Collins M, Connor R, Cronin A, Davis P, Feltwell T, Fraser A, Gentles S, Goble A, Hamlin N, Harris D, Hidalgo J, Hodgson G, Holroyd S, Hornsby T, Howarth S, Huckle EJ, Hunt S, Jagels K, James K, Jones L, Jones M, Leather S, McDonald S, McLean J, Mooney P, Moule S, Mungall K, Murphy L, Niblett D, Odell C, Oliver K, O'Neil S, Pearson D, Quail MA, Rabbinowitsch E, Rutherford K, Rutter S, Saunders D, Seeger K, Sharp S, Skelton J, Simmonds M, Squares R, Squares S, Stevens K, Taylor K, Taylor RG, Tivey A, Walsh S, Warren T, Whitehead S, Woodward J, Volckaert G, Aert R, Robben J, Grymonprez B, Weltjens I, Vanstreels E, Rieger M, Schäfer M, Müller-Auer S, Gabel C, Fuchs M, Düsterhöft A, Fritzc C, Holzer E, Moestl D, Hilbert H, Borzym K, Langer I, Beck A, Lehrach H, Reinhardt R, Pohl TM, Eger P, Zimmermann W, Wedler H, Wambutt R, Purnelle B, Goffeau A, Cadieu E, Dréano S, Gloux S, Lelaure V, Mottier S, Galibert F, Aves SJ, Xiang Z, Hunt C, Moore K, Hurst SM, Lucas M, Rochet M, Gaillardin C, Tallada VA, Garzon A, Thode G, Daga RR, Cruzado L, Jimenez J, Sánchez M, del Rey F, Benito J, Domínguez A, Revuelta JL, Moreno S, Armstrong J, Forsburg SL, Cerutti L, Lowe T, McCombie WR, Paulsen I, Potashkin J, Shpakovski GV, Ussery D, Barrell BG, Nurse P and Cerrutti L

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

    Nature 2002;415;6874;871-80

  • The PASTA domain: a beta-lactam-binding domain.

    Yeats C, Finn RD and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK CB10 1SA.

    The PASTA domain (for penicillin-binding protein and serine/threonine kinase associated domain) is found in the high molecular weight penicillin-binding proteins and eukaryotic-like serine/threonine kinases of a range of pathogens. We describe this previously uncharacterized domain and infer that it binds beta-lactam antibiotics and their peptidoglycan analogues. We postulate that PknB-like kinases are key regulators of cell-wall biosynthesis. The essential function of these enzymes suggests an additional pathway for the action of beta-lactam antibiotics.

    Trends in biochemical sciences 2002;27;9;438

  • Similarity of the phenotypic patterns associated with BRAF and KRAS mutations in colorectal neoplasia.

    Yuen ST, Davies H, Chan TL, Ho JW, Bignell GR, Cox C, Stephens P, Edkins S, Tsui WW, Chan AS, Futreal PA, Stratton MR, Wooster R and Leung SY

    Department of Pathology, University of Hong Kong, Queen Mary Hospital, Hong Kong.

    Activation of the RAS/RAF/extracellular signal-regulated kinase-mitogen-activated protein kinase/extracellular signal-regulated kinase/mitogen-activated protein kinase pathway by RAS mutations is commonly found in human cancers. Recently, we reported that mutation of BRAF provides an alternative route for activation of this signaling pathway and can be found in melanomas, colorectal cancers, and ovarian tumors. Here we perform an extensive characterization of BRAF mutations in a large series of colorectal tumors in various stages of neoplastic transformation. BRAF mutations were found in 11 of 215 (5.1%) colorectal adenocarcinomas, 3 of 108 (2.8%) sporadic adenomas, 1 of 63 (1.6%) adenomas from familial adenomatous polyposis (FAP) patients, and 1 of 3 (33%) hyperplastic polyps. KRAS mutations were detected in 34% of carcinomas, 31% of sporadic adenomas, 9% of FAP adenomas, and no hyperplastic polyps. Eight of 16 BRAF mutations were V599E, the previously described hotspot, and none of these was associated with a KRAS mutation in the same lesion. The remaining eight mutations involve other conserved amino acids in the kinase domain, and 62.5% have a KRAS mutation in the same tumor. Our data suggest that BRAF mutations are, to some extent, biologically similar to RAS mutations in colorectal cancer because both occur at approximately the same stage of the adenoma-carcinoma sequence, both are associated with villous morphology, and both are less common in adenomas from FAP cases. By contrast, colorectal adenocarcinomas with BRAF mutations are associated with early Dukes' tumor stages (P = 0.006) and no such relationship was observed for KRAS mutations. The presence in some colorectal neoplasms of mutations in both BRAF and KRAS suggests that modulation of the RAS-RAF-extracellular signal-regulated kinase-mitogen-activated protein kinase/extracellular signal-regulated kinase/mitogen-activated protein kinase signaling pathway may occur by mutation of multiple components.

    Cancer research 2002;62;22;6451-5

  • Visual genotyping of a coat color tagged p53 mutant mouse line.

    Zheng B, Vogel H, Donehower LA and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

    The p53 knockout mouse has been widely used as a model in cancer research and other applications. Because neither homozygous nor heterozygous mutant p53 mice exhibit an overt phenotype, each animal requires laborious molecular genotyping. Here we describe a new p53 mutant mouse that is tagged with a tyrosinase coat color minigene. On an albino background, heterozygous tyrosinase-tagged p53 mutant mice exhibit a light tan coat color, while homozygous mutants display a darker brown coat color. Thus, by 8-10 days of age, mice with two, one, or no mutant p53 alleles are immediately distinguishable by their coat color, eliminating the time, costs, and errors associated with molecular genotyping. Moreover, the homozygous mutant p53-tyrosinase mice display a tumor incidence and spectrum virtually identical to previous p53 null mouse lines. Thus, tagging targeted mutations with such coat color markers provides a generally applicable genotyping method for embryonic stem cell-derived mice.

    Cancer biology & therapy 2002;1;4;433-5

* quick link -