Newly sequenced mouse genomes unearth unknown genes

Sixteen newly sequenced mouse strains reveal unexpected diversity that could impact disease research

Scientists at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the Wellcome Sanger Institute have discovered significant diversity in the genomes of 16 laboratory strains of mouse, potentially impacting future research in genetics, drug development and beyond.

The research, published in the journal Nature Genetics, produced draft genome sequences for 16 of the most widely used mouse strains, revealing, for the first time, notable genetic diversity. Significant areas of the genome where variation was found include regions impacting immunity, pathogen defence and sensory function. These variations also differ widely from the current reference strain, suggesting this discovery has the potential to significantly impact future human disease research.

A research staple

The lab mouse is a staple of research in understanding health and disease, drug development, vaccines and genetics, and is the most widely used mammalian model organism. Its similarity to the human genome, with 98 per cent of genes comparable to those in humans, has made the mouse genome instrumental in helping researchers understand disease and develop drug treatments.

Researchers use a variety of different mouse strains to study human disease. For example the Non-obese Diabetic (NOD) mouse is used to study type 1 diabetes. Prior to the current study, researchers only had the complete genome for one of these strains. By sequencing 16 of the most commonly used mouse strains, this study discovered hundreds of new forms of genes associated with disease, as well as a previously unknown gene. This is one of the largest known mouse genes to date, and has been associated with brain development.

Striking differences

“We examined the regions of the genome that were most different compared to the single genome that the whole community is using. One of the most striking things we found is that genes important for disease research were the most highly-variable genes. We looked in detail at a few of these regions and found completely different gene structures compared to the reference strain.

“If you’re using mice for your experiments you need to be aware of the diversity that’s present in those types of genes. What we’ve generated is a resource for the community.”

Thomas Keane Faculty member at EMBL-EBI

“Mice have played a critical role in defining the genetics of mammalian development and for modelling human disease. We have known for some time that there are differences between mouse strains in phenotypes such as response to viruses and pathogens. These genomes allow us to understand these differences, which could have profound implications for human disease research.”

David Adams Senior Group Leader at the Wellcome Sanger Institute

What next?

Compared to previous research in this area this study constructed whole genome sequences rather than just looking at differences between strains. The ability to see across whole loci or regions means researchers will be able to study these variations and differences in a wider context, rather than just looking at individual differences.

These findings have the potential to impact the future of genetics research, drug development and the way in which research is carried out. The resource has now been made available to the wider scientific community. The 16 genomes have been incorporated into Ensembl, where they can be freely accessed and analysed.

More information


Lilue, J et al. (2018). Multiple laboratory mouse reference genomes define strain specific haplotypes and novel functional loci. Nature Genetics. Published online DOI: 10.1038/s41588-018-0223-8

Training video available online

Ensembl webinar on how to effectively browse and compare data between strains.


This work was supported by UK Research and Innovation, the Wellcome Trust, the National Human Genome Research Institute, the Medical Research Council, the Biotechnology and Biological Sciences Research Council and many other funding bodies. Please see the paper for the full list of funders.

Selected websites

  • Wellcome Sanger Institute

    The Wellcome Trust Sanger Institute is one of the world’s leading genome centres. Through its ability to conduct research at scale, it is able to engage in bold and long-term exploratory projects that are designed to influence and empower medical science globally. Institute research findings, generated through its own research programmes and through its leading role in international consortia, are being used to develop new diagnostics and treatments for human disease. To celebrate its 25th year in 2018, the Institute is sequencing 25 new genomes of species in the UK. Find out more at or follow @sangerinstitute on Twitter, Facebook and LinkedIn

  • EMBL - European Bioinformatics Institute

    The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, analysis and dissemination of large biological datasets. EMBL-EBI helps scientists realise the potential of ‘big data’ by enhancing their ability to exploit complex information to make discoveries that benefit humankind. EMBL-EBI is at the forefront of computational biology research, with work spanning sequence analysis methods, multi-dimensional statistical analysis and data-driven biological discovery, from plant biology to mammalian development and disease. We are part of the European Molecular Biology Laboratory (EMBL), and are located on the Wellcome Genome Campus, one of the world’s largest concentrations of scientific and technical expertise in genomics.