15th February 2001

Mouse genome data available in public databases

Coverage Now Exceeds Two-Thirds of Total Sequence

Freely Available Mouse Data Aid Study of Human Genes

A public-private effort to accelerate the sequencing of the mouse genome has exceeded its own goal of achieving 66 percent coverage of the genome just three months into the six-month project. At its current pace, the Mouse Sequencing Consortium (MSC) expects to reach its target of three-fold coverage by April of this year.

At the same time, collaborators in the MSC have extended the practice of making sequence data available for the free and unrestricted use of researchers worldwide. A new repository that contains not only the letters of the DNA sequence (as has been customary for previous large-scale sequencing projects), but also raw data, including actual "traces" from sequencing machines, has been established to make the information rapidly and freely available to the scientific community.

The Mouse Sequencing Consortium - comprising three private companies, six institutes of the National Institutes of Health and the Wellcome Trust - was formed in October 2000 to work collaboratively to produce a draft sequence of the mouse genome in six months. The availability of these data is considered essential to the further understanding of the human genome.

"Unrestricted access to the mouse sequence should enhance efforts to identify causative genes in mouse models of diseases as well as identify human genes responsible for various disorders," says Arthur Holden, Chairman of the Mouse Sequencing Consortium. "The rapid progress toward making these data widely available will in turn speed the search for new ways to treat or even prevent disease."

The MSC approach to sequencing the mouse genome takes advantage of the best features of the map-based shotgun and the whole genome shotgun strategies. Sequence data generated by the MSC are in short fragments (500-700 base pairs), and these so-called "raw reads" are now deposited weekly in new data repositories. The quality of the deposited data has been checked and found to be very good.

Sequences, quality scores, and traces from sequencing machines are accessible in databases maintained by the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) in a joint project with the Sanger Centre called "Ensembl". At present, approximately 6.4 million traces from the MSC's whole genome shotgun sequencing effort have been deposited into the archives.

Researchers now have the opportunity, for example, to compare a sequence of interest against the available mouse traces in the archives using software programs such as "megaBlast", and "SSAHA". Matching mouse traces can then be downloaded for further analysis.

Additionally, the EBI-Sanger Ensembl database provides direct views of homologies between the mouse traces and the human genome, which should facilitate interpretation of the human code (for example, mouse sequence matches to the human cystic fibrosis gene

The draft sequence, when completed in March, will bring the amount of mouse sequence available to about 93 to 95 percent - albeit in small, unordered fragments. The National Human Genome Research Institute (NHGRI) will go on to complete the highly accurate, "finished" sequence of the mouse genome.

Why Sequence the Mouse Genome?

With the working draft of the human genome sequence in hand, scientists in both industry and academia now seek to interpret its meaning.

Not only is the genome of the mouse about the same size as that of the human (approximately 3.1 billion base pairs), mice and humans share virtually the same set of genes. Thus, the DNA sequence of the mouse genome is an essential tool to identify and study the function of human genes.

For example, the gene sequences in mice and humans that encode proteins to carry out important biological functions - such as regulation of cell division, and development of major organ systems - are shared to a high degree (85 percent sequence identity). Thus, by comparing human and mouse genome sequences, the regions of high similarity are readily apparent and immediately identify protein coding regions and regulatory sequences.

In addition to its use to aid the interpretation of the human genome, the mouse genome sequence will increase the ability of scientists to use the mouse as a model system to study and understand human disease, and to develop and test new treatments in ways that can not easily be done with humans.

As recommended by scientists studying the mouse, the MSC effort is using a strain of mouse known as C57BL6/J, commonly called "Black 6."

About the Mouse Sequencing Consortium

The MSC is another example of an emerging model for supporting large-scale genomics research in which public and private sector entities join forces to produce publicly available data sets that are crucial for basic biomedical research.

The National Institutes of Health, the Wellcome Trust and three private companies formed the consortium to speed up the determination of the DNA sequence of the mouse genome. The MSC is co-chaired by Arthur Holden [Chairman and CEO, The SNP Consortium Ltd.] and Francis Collins, MD, PhD [Director, NHGRI]. The members of the Mouse Sequencing Consortium are GlaxoSmithKline, the Merck Genome Research Institute, Affymetrix, Inc., the Wellcome Trust, and six of the National Institutes, including the National Cancer Institute, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.

MSC funds are supporting mouse genome sequencing at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass., Washington University School of Medicine in St. Louis, and the Sanger Centre in the U.K.

For more information:

Mouse Sequencing Consortium Members


Consortium Member                                       Media Contacts

GlaxoSmithKline                                         Graeme P. Holland
                                                      44-12-7964-4269
                                                      Graeme_P_Holland@sbphrd.com

                                                      Rick Koenig
                                                      1-610-270-5546
                                                      Rick_M_Koenig@sbphrd.com

Merck Genome Research Institute                         Andrea F. Kollath, DVM
                                                      1-908-423-6492
                                                      Andrea_Kollath@merck.com

Affymetrix, Inc.                                        Anne Bowdidge
                                                      1-408-731-5925  
                                                      anne_bowdidge@affymetrix.com
                                                      
National Cancer Institute                               NCI Press Office
                                                      301-496-6641

National Human Genome Research Institute                Kathy Hudson, PhD
                                                      1-301-402-0955
                                                      hudsonk@exchange.nih.gov

National Institute on Deafness and Other                Marin Allen
Communication Disorders                                 1-301-496-7243
                                                      marin_allen@nih.gov

National Institute of Diabetes and Digestive            Joan Chamberlain
and Kidney Diseases                                     1-301-496-3583
                                                      joan_chamberlain@nih.gov
                                                                              
National Institute of Mental Health                     Marilyn Weeks
                                                      1-301-443-4536
                                                      mweeks@nih.gov

National Institute of Neurological Disorders            Margo Warren
and Stroke                                              1-301-496-5751
                                                      mw76v@nih.gov
      
Wellcome Trust                                          Noorece Ahmed
                                                      44-20-7611-8540
                                                      n.ahmed@wellcome.ac.uk

Genome Sequencing Centres


Sequencing Centre                                       Media Contacts

Whitehead Institute for Biomedical Research             Seema Kumar
                                                      1-617-258-6153
                                                      kumar@wi.mit.edu
                                                                      
Washington University School of Medicine                Joni Westerhouse
                                                      1-314-286-0120
                                                      joniw@medicine.wustl.edu

Sanger Centre                                           Don Powell
                                                      44-12-2349-4956
                                                      don@sanger.ac.uk

Foundation for the National Institutes of Health, Inc.  Constance U. Battle, MD
                                                      1-301-402-5311
                                                      cubattle@fnih.org

Other Contacts                                          Arthur Holden
                                                      1-847-317-9230
                                                      aholden@firstgenetic.net

Contact the Press Office

Don Powell Media and Public Relations Manager
Wellcome Trust Sanger Institute, Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 775 397
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/st0s7chr