Sanger Institute - Publications 2001

Number of papers published in 2001: 24

  • Word Level Confidence Measures Using N -Best Sub-Hypotheses Likelihood Ratio

    Beng T. Tan, Yong Gu, and Trevor Thomas

    7th European Conference on Speech Communication and Technology, Aalborg, Denmark, September 3-7, 2001 2001

  • The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X.

    Bentley DR, Deloukas P, Dunham A, French L, Gregory SG, Humphray SJ, Mungall AJ, Ross MT, Carter NP, Dunham I, Scott CE, Ashcroft KJ, Atkinson AL, Aubin K, Beare DM, Bethel G, Brady N, Brook JC, Burford DC, Burrill WD, Burrows C, Butler AP, Carder C, Catanese JJ, Clee CM, Clegg SM, Cobley V, Coffey AJ, Cole CG, Collins JE, Conquer JS, Cooper RA, Culley KM, Dawson E, Dearden FL, Durbin RM, de Jong PJ, Dhami PD, Earthrowl ME, Edwards CA, Evans RS, Gillson CJ, Ghori J, Green L, Gwilliam R, Halls KS, Hammond S, Harper GL, Heathcott RW, Holden JL, Holloway E, Hopkins BL, Howard PJ, Howell GR, Huckle EJ, Hughes J, Hunt PJ, Hunt SE, Izmajlowicz M, Jones CA, Joseph SS, Laird G, Langford CF, Lehvaslaiho MH, Leversha MA, McCann OT, McDonald LM, McDowall J, Maslen GL, Mistry D, Moschonas NK, Neocleous V, Pearson DM, Phillips KJ, Porter KM, Prathalingam SR, Ramsey YH, Ranby SA, Rice CM, Rogers J, Rogers LJ, Sarafidou T, Scott DJ, Sharp GJ, Shaw-Smith CJ, Smink LJ, Soderlund C, Sotheran EC, Steingruber HE, Sulston JE, Taylor A, Taylor RG, Thorpe AA, Tinsley E, Warry GL, Whittaker A, Whittaker P, Williams SH, Wilmer TE, Wooster R and Wright CL

    The Sanger Centre, Hinxton, Cambridge, UK.

    We constructed maps for eight chromosomes (1, 6, 9, 10, 13, 20, X and (previously) 22), representing one-third of the genome, by building landmark maps, isolating bacterial clones and assembling contigs. By this approach, we could establish the long-range organization of the maps early in the project, and all contig extension, gap closure and problem-solving was simplified by containment within local regions. The maps currently represent more than 94% of the euchromatic (gene-containing) regions of these chromosomes in 176 contigs, and contain 96% of the chromosome-specific markers in the human gene map. By measuring the remaining gaps, we can assess chromosome length and coverage in sequenced clones.

    Nature 2001;409;6822;942-3

  • Mining the draft human genome.

    Birney E, Bateman A, Clamp ME and Hubbard TJ

    The European Bioinformatics Institute, Hinxton, Cambridge, UK.

    Now that the draft human genome sequence is available, everyone wants to be able to use it. However, we have perhaps become complacent about our ability to turn new genomes into lists of genes. The higher volume of data associated with a larger genome is accompanied by a much greater increase in complexity. We need to appreciate both the scale of the challenge of vertebrate genome analysis and the limitations of current gene prediction methods and understanding.

    Nature 2001;409;6822;827-8

  • An SSLP marker-anchored BAC framework map of the mouse genome.

    Cai WW, Chow CW, Damani S, Gregory SG, Marra M and Bradley A

    Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

    We have constructed a BAC framework map of the mouse genome consisting of 2,808 PCR-confirmed BAC clusters, using a previously described method. Fingerprints of BACs from selected clusters confirm the accuracy of the map. Combined with BAC fingerprint data, the framework map covers 37% of the mouse genome.

    Nature genetics 2001;29;2;133-4

  • Integration of cytogenetic landmarks into the draft sequence of the human genome.

    Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, Chen XN, Furey TS, Kim UJ, Kuo WL, Olivier M, Conroy J, Kasprzyk A, Massa H, Yonescu R, Sait S, Thoreen C, Snijders A, Lemyre E, Bailey JA, Bruzel A, Burrill WD, Clegg SM, Collins S, Dhami P, Friedman C, Han CS, Herrick S, Lee J, Ligon AH, Lowry S, Morley M, Narasimhan S, Osoegawa K, Peng Z, Plajzer-Frick I, Quade BJ, Scott D, Sirotkin K, Thorpe AA, Gray JW, Hudson J, Pinkel D, Ried T, Rowen L, Shen-Ong GL, Strausberg RL, Birney E, Callen DF, Cheng JF, Cox DR, Doggett NA, Carter NP, Eichler EE, Haussler D, Korenberg JR, Morton CC, Albertson D, Schuler G, de Jong PJ, Trask BJ and BAC Resource Consortium

    Department of Pediatrics, University of Pennsylvania, The Children's Hospital of Philadelphia, 19104, USA.

    We have placed 7,600 cytogenetically defined landmarks on the draft sequence of the human genome to help with the characterization of genes altered by gross chromosomal aberrations that cause human disease. The landmarks are large-insert clones mapped to chromosome bands by fluorescence in situ hybridization. Each clone contains a sequence tag that is positioned on the genomic sequence. This genome-wide set of sequence-anchored clones allows structural and functional analyses of the genome. This resource represents the first comprehensive integration of cytogenetic, radiation hybrid, linkage and sequence maps of the human genome; provides an independent validation of the sequence map and framework for contig order and orientation; surveys the genome for large-scale duplications, which are likely to require special attention during sequence assembly; and allows a stringent assessment of sequence differences between the dark and light bands of chromosomes. It also provides insight into large-scale chromatin structure and the evolution of chromosomes and gene families and will accelerate our understanding of the molecular bases of human disease and cancer.

    Nature 2001;409;6822;953-8

  • Disruption of an imprinted gene cluster by a targeted chromosomal translocation in mice.

    Cleary MA, van Raamsdonk CD, Levorse J, Zheng B, Bradley A and Tilghman SM

    Howard Hughes Medical Institute and Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA.

    Genomic imprinting is an epigenetic process in which the activity of a gene is determined by its parent of origin. Mechanisms governing genomic imprinting are just beginning to be understood. However, the tendency of imprinted genes to exist in chromosomal clusters suggests a sharing of regulatory elements. To better understand imprinted gene clustering, we disrupted a cluster of imprinted genes on mouse distal chromosome 7 using the Cre/loxP recombination system. In mice carrying a site-specific translocation separating Cdkn1c and Kcnq1, imprinting of the genes retained on chromosome 7, including Kcnq1, Kcnq1ot1, Ascl2, H19 and Igf2, is unaffected, demonstrating that these genes are not regulated by elements near or telomeric to Cdkn1c. In contrast, expression and imprinting of the translocated Cdkn1c, Slc22a1l and Tssc3 on chromosome 11 are affected, consistent with the hypothesis that elements regulating both expression and imprinting of these genes lie within or proximal to Kcnq1. These data support the proposal that chromosomal abnormalities, including translocations, within KCNQ1 that are associated with the human disease Beckwith-Wiedemann syndrome (BWS) may disrupt CDKN1C expression. These results underscore the importance of gene clustering for the proper regulation of imprinted genes.

    Nature genetics 2001;29;1;78-82

  • Massive gene decay in the leprosy bacillus.

    Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, Honoré N, Garnier T, Churcher C, Harris D, Mungall K, Basham D, Brown D, Chillingworth T, Connor R, Davies RM, Devlin K, Duthoy S, Feltwell T, Fraser A, Hamlin N, Holroyd S, Hornsby T, Jagels K, Lacroix C, Maclean J, Moule S, Murphy L, Oliver K, Quail MA, Rajandream MA, Rutherford KM, Rutter S, Seeger K, Simon S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Taylor K, Whitehead S, Woodward JR and Barrell BG

    Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, Paris, France.

    Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

    Nature 2001;409;6823;1007-11

  • Contact Mechanics and Coefficients of Restitution

    Colin Thornton, Zemin Ning, Chuan-yu Wu , Mohammed Nasrullah and Long-yuan Li

    Lecture Notes in Physics 2001;564;184-194

  • A SNP resource for human chromosome 22: extracting dense clusters of SNPs from the genomic sequence.

    Dawson E, Chen Y, Hunt S, Smink LJ, Hunt A, Rice K, Livingston S, Bumpstead S, Bruskiewich R, Sham P, Ganske R, Adams M, Kawasaki K, Shimizu N, Minoshima S, Roe B, Bentley D and Dunham I

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The recent publication of the complete sequence of human chromosome 22 provides a platform from which to investigate genomic sequence variation. We report the identification and characterization of 12,267 potential variants (SNPs and other small insertions/deletions) of human chromosome 22, discovered in the overlaps of 460 clones used for the chromosome sequencing. We found, on average, 1 potential variant every 1.07 kb and approximately 18% of the potential variants involve insertions/deletions. The SNPs have been positioned both relative to each other, and to genes, predicted genes, repeat sequences, other genetic markers, and the 2730 SNPs previously identified on the chromosome. A subset of the SNPs were verified experimentally using either PCR-RFLP or genomic Invader assays. These experiments confirmed 92% of the potential variants in a panel of 92 individuals. [Details of the SNPs and RFLP assays can be found at and in dbSNP.]

    Genome research 2001;11;1;170-8

  • A superfamily of variant genes encoded in the subtelomeric region of Plasmodium vivax.

    del Portillo HA, Fernandez-Becerra C, Bowman S, Oliver K, Preuss M, Sanchez CP, Schneider NK, Villalobos JM, Rajandream MA, Harris D, Pereira da Silva LH, Barrell B and Lanzer M

    Departamento de Parasitologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, Av. Lineu Prestes 1374, São Paulo, SP 05508-900, Brazil.

    The malarial parasite Plasmodium vivax causes disease in humans, including chronic infections and recurrent relapses, but the course of infection is rarely fatal, unlike that caused by Plasmodium falciparum. To investigate differences in pathogenicity between P. vivax and P. falciparum, we have compared the subtelomeric domains in the DNA of these parasites. In P. falciparum, subtelomeric domains are conserved and contain ordered arrays of members of multigene families, such as var, rif and stevor, encoding virulence determinants of cytoadhesion and antigenic variation. Here we identify, through the analysis of a continuous 155,711-base-pair sequence of a P. vivax chromosome end, a multigene family called vir, which is specific to P. vivax. The vir genes are present at about 600-1,000 copies per haploid genome and encode proteins that are immunovariant in natural infections, indicating that they may have a functional role in establishing chronic infection through antigenic variation.

    Nature 2001;410;6830;839-42

  • The DNA sequence and comparative analysis of human chromosome 20.

    Deloukas P, Matthews LH, Ashurst J, Burton J, Gilbert JG, Jones M, Stavrides G, Almeida JP, Babbage AK, Bagguley CL, Bailey J, Barlow KF, Bates KN, Beard LM, Beare DM, Beasley OP, Bird CP, Blakey SE, Bridgeman AM, Brown AJ, Buck D, Burrill W, Butler AP, Carder C, Carter NP, Chapman JC, Clamp M, Clark G, Clark LN, Clark SY, Clee CM, Clegg S, Cobley VE, Collier RE, Connor R, Corby NR, Coulson A, Coville GJ, Deadman R, Dhami P, Dunn M, Ellington AG, Frankland JA, Fraser A, French L, Garner P, Grafham DV, Griffiths C, Griffiths MN, Gwilliam R, Hall RE, Hammond S, Harley JL, Heath PD, Ho S, Holden JL, Howden PJ, Huckle E, Hunt AR, Hunt SE, Jekosch K, Johnson CM, Johnson D, Kay MP, Kimberley AM, King A, Knights A, Laird GK, Lawlor S, Lehvaslaiho MH, Leversha M, Lloyd C, Lloyd DM, Lovell JD, Marsh VL, Martin SL, McConnachie LJ, McLay K, McMurray AA, Milne S, Mistry D, Moore MJ, Mullikin JC, Nickerson T, Oliver K, Parker A, Patel R, Pearce TA, Peck AI, Phillimore BJ, Prathalingam SR, Plumb RW, Ramsay H, Rice CM, Ross MT, Scott CE, Sehra HK, Shownkeen R, Sims S, Skuce CD, Smith ML, Soderlund C, Steward CA, Sulston JE, Swann M, Sycamore N, Taylor R, Tee L, Thomas DW, Thorpe A, Tracey A, Tromans AC, Vaudin M, Wall M, Wallis JM, Whitehead SL, Whittaker P, Willey DL, Williams L, Williams SA, Wilming L, Wray PW, Hubbard T, Durbin RM, Bentley DR, Beck S and Rogers J

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.

    Nature 2001;414;6866;865-71

  • Cancer and genomics.

    Futreal PA, Kasprzyk A, Birney E, Mullikin JC, Wooster R and Stratton MR

    Cancer Genome Project, Sanger Centre, Cambridge, UK.

    Identification of the genes that cause oncogenesis is a central aim of cancer research. We searched the proteins predicted from the draft human genome sequence for paralogues of known tumour suppressor genes, but no novel genes were identified. We then assessed whether it was possible to search directly for oncogenic sequence changes in cancer cells by comparing cancer genome sequences against the draft genome. Apparently chimaeric transcripts (from oncogenic fusion genes generated by chromosomal translocations, the ends of which mapped to different genomic locations) were detected to the same degree in both normal and neoplastic tissues, indicating a significant level of false positives. Our experiment underscores the limited amount and variable quality of DNA sequence from cancer cells that is currently available.

    Nature 2001;409;6822;850-2

  • Functional annotation of a full-length mouse cDNA collection.

    Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, Adachi J, Fukuda S, Aizawa K, Izawa M, Nishi K, Kiyosawa H, Kondo S, Yamanaka I, Saito T, Okazaki Y, Gojobori T, Bono H, Kasukawa T, Saito R, Kadota K, Matsuda H, Ashburner M, Batalov S, Casavant T, Fleischmann W, Gaasterland T, Gissi C, King B, Kochiwa H, Kuehl P, Lewis S, Matsuo Y, Nikaido I, Pesole G, Quackenbush J, Schriml LM, Staubli F, Suzuki R, Tomita M, Wagner L, Washio T, Sakai K, Okido T, Furuno M, Aono H, Baldarelli R, Barsh G, Blake J, Boffelli D, Bojunga N, Carninci P, de Bonaldo MF, Brownstein MJ, Bult C, Fletcher C, Fujita M, Gariboldi M, Gustincich S, Hill D, Hofmann M, Hume DA, Kamiya M, Lee NH, Lyons P, Marchionni L, Mashima J, Mazzarelli J, Mombaerts P, Nordone P, Ring B, Ringwald M, Rodriguez I, Sakamoto N, Sasaki H, Sato K, Schönbach C, Seya T, Shibata Y, Storch KF, Suzuki H, Toyo-oka K, Wang KH, Weitz C, Whittaker C, Wilming L, Wynshaw-Boris A, Yoshida K, Hasegawa Y, Kawaji H, Kohtsuki S, Hayashizaki Y and RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium

    Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, Yokohama Institute, Kanagawa, Japan.

    The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

    Nature 2001;409;6821;685-90

  • Initial sequencing and analysis of the human genome.

    Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J and International Human Genome Sequencing Consortium

    Whitehead Institute for Biomedical Research, Center for Genome Research, Cambridge, MA 02142, USA.

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

    Funded by: NHGRI NIH HHS: U54 HG003273

    Nature 2001;409;6822;860-921

  • Tbx1 haploinsufficieny in the DiGeorge syndrome region causes aortic arch defects in mice.

    Lindsay EA, Vitelli F, Su H, Morishima M, Huynh T, Pramparo T, Jurecic V, Ogunrinu G, Sutherland HF, Scambler PJ, Bradley A and Baldini A

    Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030, USA.

    DiGeorge syndrome is characterized by cardiovascular, thymus and parathyroid defects and craniofacial anomalies, and is usually caused by a heterozygous deletion of chromosomal region 22q11.2 (del22q11) (ref. 1). A targeted, heterozygous deletion, named Df(16)1, encompassing around 1 megabase of the homologous region in mouse causes cardiovascular abnormalities characteristic of the human disease. Here we have used a combination of chromosome engineering and P1 artificial chromosome transgenesis to localize the haploinsufficient gene in the region, Tbx1. We show that Tbx1, a member of the T-box transcription factor family, is required for normal development of the pharyngeal arch arteries in a gene dosage-dependent manner. Deletion of one copy of Tbx1 affects the development of the fourth pharyngeal arch arteries, whereas homozygous mutation severely disrupts the pharyngeal arch artery system. Our data show that haploinsufficiency of Tbx1 is sufficient to generate at least one important component of the DiGeorge syndrome phenotype in mice, and demonstrate the suitability of the mouse for the genetic dissection of microdeletion syndromes.

    Nature 2001;410;6824;97-101

  • Breast cancer genetics: what we know and what we need.

    Nathanson KL, Wooster R, Weber BL and Nathanson KN

    Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    Breast cancer results from genetic and environmental factors leading to the accumulation of mutations in essential genes. Genetic predisposition may have a strong, almost singular effect, as with BRCA1 and BRCA2, or may represent the cumulative effects of multiple low-penetrance susceptibility alleles. Here we review high- and low-penetrance breast-cancer-susceptibility alleles and discuss ongoing efforts to identify additional susceptibility genes. Ultimately these discoveries will lead to individualized breast cancer risk assessment and a reduction in breast cancer incidence.

    Nature medicine 2001;7;5;552-6

  • SSAHA: a fast search method for large DNA databases.

    Ning Z, Cox AJ and Mullikin JC

    Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the "hits" for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.

    Genome research 2001;11;10;1725-9

  • Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18.

    Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J, Churcher C, Mungall KL, Bentley SD, Holden MT, Sebaihia M, Baker S, Basham D, Brooks K, Chillingworth T, Connerton P, Cronin A, Davis P, Davies RM, Dowd L, White N, Farrar J, Feltwell T, Hamlin N, Haque A, Hien TT, Holroyd S, Jagels K, Krogh A, Larsen TS, Leather S, Moule S, O'Gaora P, Parry C, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S and Barrell BG

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Salmonella enterica serovar Typhi (S. typhi) is the aetiological agent of typhoid fever, a serious invasive bacterial disease of humans with an annual global burden of approximately 16 million cases, leading to 600,000 fatalities. Many S. enterica serovars actively invade the mucosal surface of the intestine but are normally contained in healthy individuals by the local immune defence mechanisms. However, S. typhi has evolved the ability to spread to the deeper tissues of humans, including liver, spleen and bone marrow. Here we have sequenced the 4,809,037-base pair (bp) genome of a S. typhi (CT18) that is resistant to multiple drugs, revealing the presence of hundreds of insertions and deletions compared with the Escherichia coli genome, ranging in size from single genes to large islands. Notably, the genome sequence identifies over two hundred pseudogenes, several corresponding to genes that are known to contribute to virulence in Salmonella typhimurium. This genetic degradation may contribute to the human-restricted host range for S. typhi. CT18 harbours a 218,150-bp multiple-drug-resistance incH1 plasmid (pHCM1), and a 106,516-bp cryptic plasmid (pHCM2), which shows recent common ancestry with a virulence plasmid of Yersinia pestis.

    Nature 2001;413;6858;848-52

  • Genome sequence of Yersinia pestis, the causative agent of plague.

    Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, Prentice MB, Sebaihia M, James KD, Churcher C, Mungall KL, Baker S, Basham D, Bentley SD, Brooks K, Cerdeño-Tárraga AM, Chillingworth T, Cronin A, Davies RM, Davis P, Dougan G, Feltwell T, Hamlin N, Holroyd S, Jagels K, Karlyshev AV, Leather S, Moule S, Oyston PC, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S and Barrell BG

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The Gram-negative bacterium Yersinia pestis is the causative agent of the systemic invasive infectious disease classically referred to as plague, and has been responsible for three human pandemics: the Justinian plague (sixth to eighth centuries), the Black Death (fourteenth to nineteenth centuries) and modern plague (nineteenth century to the present day). The recent identification of strains resistant to multiple drugs and the potential use of Y. pestis as an agent of biological warfare mean that plague still poses a threat to human health. Here we report the complete genome sequence of Y. pestis strain CO92, consisting of a 4.65-megabase (Mb) chromosome and three plasmids of 96.2 kilobases (kb), 70.3 kb and 9.6 kb. The genome is unusually rich in insertion sequences and displays anomalies in GC base-composition bias, indicating frequent intragenomic recombination. Many genes seem to have been acquired from other bacteria and viruses (including adhesins, secretion systems and insecticidal toxins). The genome contains around 150 pseudogenes, many of which are remnants of a redundant enteropathogenic lifestyle. The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.

    Nature 2001;413;6855;523-7

  • A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.

    Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D and International SNP Map Working Group

    Cold Spring Harbor, New York 11724, USA.

    We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

    Nature 2001;409;6822;928-33

  • Granular powders and solids: Insights from numerical simulations

    W. Hoyle, M. Ghadiri, S. J. Antony, R. Moreno and Z. Ning

    Powders and Solids : Developments in Handling and Processing Technologies 2001

  • A Text-Independent Speaker Verification System Using Support Vector Machines Classifier

    Yong Gu and Trevor Thomas

    7th European Conference on Speech Communication and Technology, Aalborg, Denmark, September 3-7, 2001 2001

  • Comparison of human genetic and sequence-based physical maps.

    Yu A, Zhao C, Fan Y, Jang W, Mungall AJ, Deloukas P, Olsen A, Doggett NA, Ghebranious N, Broman KW and Weber JL

    Center for Medical Genetics, Marshfield Medical Research Foundation, Wisconsin 54449, USA.

    Recombination is the exchange of information between two homologous chromosomes during meiosis. The rate of recombination per nucleotide, which profoundly affects the evolution of chromosomal segments, is calculated by comparing genetic and physical maps. Human physical maps have been constructed using cytogenetics, overlapping DNA clones and radiation hybrids; but the ultimate and by far the most accurate physical map is the actual nucleotide sequence. The completion of the draft human genomic sequence provides us with the best opportunity yet to compare the genetic and physical maps. Here we describe our estimates of female, male and sex-average recombination rates for about 60% of the genome. Recombination rates varied greatly along each chromosome, from 0 to at least 9 centiMorgans per megabase (cM Mb(-1)). Among several sequence and marker parameters tested, only relative marker position along the metacentric chromosomes in males correlated strongly with recombination rate. We identified several chromosomal regions up to 6 Mb in length with particularly low (deserts) or high (jungles) recombination rates. Linkage disequilibrium was much more common and extended for greater distances in the deserts than in the jungles.

    Nature 2001;409;6822;951-3

  • Engineering chromosomal rearrangements in mice.

    Yu Y and Bradley A

    Program in Developmental Biology, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030,

    The combination of gene-targeting techniques in mouse embryonic stem cells and the Cre/loxP site-specific recombination system has resulted in the emergence of chromosomal-engineering technology in mice. This advance has opened up new opportunities for modelling human diseases that are associated with chromosomal rearrangements. It has also led to the generation of visibly marked deletions and balancer chromosomes in mice, which provide essential reagents for maximizing the efficiency of large-scale mutagenesis efforts and which will accelerate the functional annotation of mammalian genomes, including the human genome.

    Nature reviews. Genetics 2001;2;10;780-90