International Team of Researchers Assembles Draft Sequence of Mouse Genome

Ninety-six percent of mouse genome sequenced; Results freely available in public databases on Internet

Email newsletter

News and blog updates

Sign up

In a landmark advance in genomics, the international Mouse Genome Sequencing Consortium today announced that it has assembled and deposited into public databases an advanced draft sequence of the mouse genome – the genetic blueprint for the most important animal model in biomedical research. The data are posted on the Internet at the three sites listed below, where they are freely available.

This achievement represents a major milestone for the Human Genome Project because it provides a key tool needed to interpret the human sequence, a draft version of which was published last year. Researchers will be better able to understand the function of many human genes because the mouse carries virtually the same set of genes as the human but can be used in laboratory research.

For most human illnesses, from cancer to autoimmune disease, important insights have come from the study of mouse models. The advanced draft of the mouse sequence will greatly accelerate precise identification of the genetic contributors to those illnesses, leading to better understanding of human disease and improved tests and treatments. The mouse sequence will also allow researchers to recognize regions of the human genome that control gene activity, by virtue of the fact that they are conserved through the 100 million years of evolution separating humans and mice.

“The mouse genome project – carried out alongside finishing the human genome – has generated crucial publicly available information for biomedical research. Throughout the project we have refined both the technologies and the software to improve this resource and to bring it to researchers as swiftly as possible.”

Jane Rogers Ph.D., Head of Genome Sequencing at the Wellcome Trust Sanger Institute

The draft sequence was assembled by the Mouse Genome Sequencing Consortium, an international team of researchers from the Wellcome Trust Sanger Institute and the European Bioinformatics Institute, in Hinxton, England, the Whitehead Institute in Cambridge, MA, and Washington University School of Medicine in St. Louis, MO, with funding from the Wellcome Trust in and the National Institutes of Health in the USA.

The mouse genome is contained in 20 chromosome pairs and the current results suggest that it is about 2.7 billion base pairs in size, or about 15 percent smaller than the human genome. The human genome is 3.1 billion base pairs spread out over 23 pairs of chromosomes (22 autosomes and the X and the Y sex chromosomes).

Analysis of the genome assembly indicates roughly the same number of genes for the mouse as the human. So far researchers have found more than 22,500 high-quality gene predictions, with additional predictions expected to take the total to about 30,000.

“This is a most exciting development for biomedical research. My group and research groups around the world have used the public mouse sequence as it has developed. The new assembly and gene analysis is a phenomenal achievement by the international consortium, which will speed our investigations into human illness.”

Allan Bradley Ph.D, Director of the Wellcome Trust Sanger Institute

The draft sequence shows the order of the DNA chemical bases A, T, C, and G along the mouse chromosomes. The current assembly includes more than 96 percent of the mouse genome with long, continuous stretches of DNA and represents a seven-fold coverage of the genome. This means that the location of every base, or DNA letter, in the mouse genome was determined an average of seven times, a frequency that ensures a high degree of accuracy.

“The mouse sequence is much further along in the process than the human sequence was at the draft stage. Methods for efficient sequencing of large genomes continue to advance dramatically, and the sophistication of the team that accomplished this goal is truly impressive. This sets a new standard for speed, accuracy, and public accessibility.”

Francis S. Collins M.D., Ph.D., director of the National Human Genome Research Institute, Bethesda MD

The quality of the working draft sequence far exceeds the consortium’s original expectations for this stage and was completed much sooner than initially expected, reflecting the tremendous efficiencies gained in sequencing and computational technologies in the past few years.

“It is remarkable that we were able to complete the mouse genome in such a short time and with such great accuracy. We are now working hard with an international group of experts to explore the content of the sequence and to use it to improve our understanding of the human sequence.”

Robert Waterston M.D., Ph.D., director of the Genome Center at Washington University, St Louis MO

The sequence information is immediately and freely available to the world. The information will be utilized thousands of times daily by scientists in academia and industry, as well as by commercial database companies providing information services to biotechnologists.

The results from this analysis can be found at several websites, including at the European Bioinformatics Institute; at at the National Center for Biotechnology Information at the National Library of Medicine, and at the University of California, Santa Cruz. A comparison between the mouse sequence and the human sequence can be found at all three sites.

“The mouse sequence provides a very important chapter from evolution’s lab notebook. Being able to read evolution’s notebook and compare genomic information across species will allow us to glean important information about ourselves. That’s because evolution preserves the most important genetic information across species; if specific DNA sequences have been preserved by evolution over hundreds of millions of years, then they must be functionally important.”

Eric Lander Ph.D., director of the Whitehead/MIT Center for Genome Research, Cambridge MA

This milestone concludes the second phase of the consortium’s mouse-sequencing effort. In Phase III, the consortium will produce a “finished” version with the remaining gaps (the 4 percent where the sequence has yet to be determined) filled in and errors resolved. This phase will proceed using clone-based, or hierarchical, sequencing using the publicly available mouse genome clone map. A mapped set of BAC clones that covers the entire mouse genome is being sequenced. The BAC data will be combined with the draft genome sequence to finish the mouse sequence to the same high quality to which the human sequence is being completed. Clone-based sequencing remains the only method proven to produce a complete, fully accurate version of a complex genome. The complete genome sequence of the mouse will be available within 3 years.​

More information

  • The mouse sequencing strategy combines the best features of the clone-based, hierarchical-shotgun and whole-genome-shotgun strategies. The scientists used data from more than 33 million individual sequencing experiments. Using two different computer systems, called genome assemblers, the team reconstructed the 33 million individual fragments into a draft sequence. These whole-genome assemblers, called ARACHNE and PHUSION were developed at the Whitehead Institute and at the Wellcome Trust Sanger Institute, respectively.

    These long stretches of sequence, called contigs, were then linked into larger fragments called supercontigs of a typical length of 16.9 million base pairs. These supercontigs were then anchored to the mouse genetic and BAC clone maps. Finally, adjacent supercontigs were joined into even larger ultracontigs on the basis of other linking information. In the end, nearly the entire chromosomal sequence is contained in a mere 89 ultracontigs with a typical size of 50 megabases each.

  • The BAC resources used by the group were developed by the Children’s Hospital Oakland Research Institute in Oakland, California, The Genome Sciences Centre in Vancouver, Canada, and The Institute of Genome Research in Rockville, Maryland. The BAC map was assembled by Washington University and the Wellcome Trust Sanger Institute.

  • These results reported today built on work originally performed by the Mouse Sequencing Consortium (MSC), a public-private consortium that included 16 NIH institutes, GlaxoSmithKline of Research Triangle Park, NC, the Wellcome Trust, Merck & Co. of Whitehouse Station, NJ, and Affymetrix of Santa Clara, CA. The MSC achieved the first three-fold coverage of the mouse genome using the whole genome shotgun technique, which represented the first phase of the project.

  • The National Institutes of Health funding for this effort included support from the National Human Genome Research Institute, National Cancer Institute, National Institute of Dental and Craniofacial Research, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of General Medical Sciences , National Eye Institute, National Institute of Environmental Health Sciences, National Institute of Aging, National Institute of Arthritis and Muscoskeletal and Skin Diseases, National Institute on Deafness and Other Communication Disorders, National Institute of Mental Health, National Institute on Drug Abuse, National Center for Research Services and the Fogarty International Center.