Mouse genome data available in public databases
Freely Available Mouse Data Aid Study of Human Genes
A public-private effort to accelerate the sequencing of the mouse genome has exceeded its own goal of achieving 66 percent coverage of the genome just three months into the six-month project. At its current pace, the Mouse Sequencing Consortium (MSC) expects to reach its target of three-fold coverage by April of this year.
At the same time, collaborators in the MSC have extended the practice of making sequence data available for the free and unrestricted use of researchers worldwide. A new repository that contains not only the letters of the DNA sequence (as has been customary for previous large-scale sequencing projects), but also raw data, including actual “traces” from sequencing machines, has been established to make the information rapidly and freely available to the scientific community.
The Mouse Sequencing Consortium – comprising three private companies, six institutes of the National Institutes of Health and the Wellcome Trust – was formed in October 2000 to work collaboratively to produce a draft sequence of the mouse genome in six months. The availability of these data is considered essential to the further understanding of the human genome.
“Unrestricted access to the mouse sequence should enhance efforts to identify causative genes in mouse models of diseases as well as identify human genes responsible for various disorders. The rapid progress toward making these data widely available will in turn speed the search for new ways to treat or even prevent disease.”
Arthur Holden Chairman of the Mouse Sequencing Consortium
The MSC approach to sequencing the mouse genome takes advantage of the best features of the map-based shotgun and the whole genome shotgun strategies. Sequence data generated by the MSC are in short fragments (500-700 base pairs), and these so-called “raw reads” are now deposited weekly in new data repositories. The quality of the deposited data has been checked and found to be very good.
Sequences, quality scores, and traces from sequencing machines are accessible in databases maintained by the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) in a joint project with the Sanger Centre called “Ensembl“. At present, approximately 6.4 million traces from the MSC’s whole genome shotgun sequencing effort have been deposited into the archives.
Researchers now have the opportunity, for example, to compare a sequence of interest against the available mouse traces in the archives using software programs such as megaBlast, and SSAHA (http://trace.ensembl.org/perl/ssahaview). Matching mouse traces can then be downloaded for further analysis.
Additionally, the EBI-Sanger Ensembl database provides direct views of homologies between the mouse traces and the human genome, which should facilitate interpretation of the human code (for example, mouse sequence matches to the human cystic fibrosis gene.
The draft sequence, when completed in March, will bring the amount of mouse sequence available to about 93 to 95 percent – albeit in small, unordered fragments. The National Human Genome Research Institute (NHGRI) will go on to complete the highly accurate, “finished” sequence of the mouse genome.
Why Sequence the Mouse Genome?
With the working draft of the human genome sequence in hand, scientists in both industry and academia now seek to interpret its meaning.
Not only is the genome of the mouse about the same size as that of the human (approximately 3.1 billion base pairs), mice and humans share virtually the same set of genes. Thus, the DNA sequence of the mouse genome is an essential tool to identify and study the function of human genes.
For example, the gene sequences in mice and humans that encode proteins to carry out important biological functions – such as regulation of cell division, and development of major organ systems – are shared to a high degree (85 percent sequence identity). Thus, by comparing human and mouse genome sequences, the regions of high similarity are readily apparent and immediately identify protein coding regions and regulatory sequences.
In addition to its use to aid the interpretation of the human genome, the mouse genome sequence will increase the ability of scientists to use the mouse as a model system to study and understand human disease, and to develop and test new treatments in ways that can not easily be done with humans.
As recommended by scientists studying the mouse, the MSC effort is using a strain of mouse known as C57BL6/J, commonly called “Black 6.”
About the Mouse Sequencing Consortium
The MSC is another example of an emerging model for supporting large-scale genomics research in which public and private sector entities join forces to produce publicly available data sets that are crucial for basic biomedical research.
The National Institutes of Health, the Wellcome Trust and three private companies formed the consortium to speed up the determination of the DNA sequence of the mouse genome. The MSC is co-chaired by Arthur Holden [Chairman and CEO, The SNP Consortium Ltd.] and Francis Collins, MD, PhD [Director, NHGRI]. The members of the Mouse Sequencing Consortium are GlaxoSmithKline, the Merck Genome Research Institute, Affymetrix, Inc., the Wellcome Trust, and six of the National Institutes, including the National Cancer Institute, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.
MSC funds are supporting mouse genome sequencing at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass., Washington University School of Medicine in St. Louis, and the Sanger Centre in the U.K.