Mouse Genomes Project

The ability to manipulate the mouse genome, together with the wealth of disease models, inbred strains and genomic resources, makes the mouse the premier model organism for genetic approaches to mammalian biology. Over a century of mouse genetics has provided scores of inbred strains, spontaneous and engineered mutations, making the mouse unsurpassed as a model system for investigating human disease. Access to complete sequence of multiple inbred strains will add to these resources and will become a permanent foundation for a systems biology approach to phenotypic variation in the mouse.

The Mouse Genomes Project aims to use new sequencing technologies to sequence the genomes of 17 key mouse strains. We will be releasing raw sequence data, collections of SNPs and assemblies of each strain, under our data release policy.

The strains we are sequencing are:

  • 129P2
  • 129S1/SvImJ
  • 129S5
  • A/J
  • AKR/J
  • BALB/cJ
  • C3H/HeJ
  • C57BL/6NJ
  • CAST/EiJ
  • CBA/J
  • DBA/2J
  • LP/J
  • NOD/ShiLtJ
  • NZO/HiLtJ
  • PWK/PhJ
  • Spretus/EiJ
  • WSB/EiJ

Accessing the Data

We have set up a suite of querying and visualisation tools to allow the community to gain access to the full range of variation data that is being produced as part of the project. Currently, it is possible to query SNPs and short indels and view sequence data across regions of the genome. It is also possible to click through from any SNP/indel to visualise the region of interest. If you notice any problems or experience any technical issues with accessing the website then we suggest you contact us by email via mousegenomes@sanger.ac.uk.

Coding SNPs can also be viewed in the Ensembl genome browser by adding a DAS track. There is one source per strain and they are all listed in the DAS Registry, named 'MGP_[strain_name]_SNPs'. To add a DAS track, view a region of the mouse genome, eg. http://www.ensembl.org/Mus_musculus/Location/View?r=1:9485681-9585680, and click 'Configure this page'. Click the 'Custom Data' tab. Click 'Attach DAS'. Make sure 'DAS Registry' is selected in the drop down and that the other boxes are blank, then click 'Next' (and wait, though nothing seems to happen). Scroll down and tick the MGP_* sources you want and click 'Next' (and wait). Now click on the 'Main Panel' tab and the new sources you added should be listed in the 'User attached data' area. You may find that these all have labels forced on; in which case click to the left-most icon next to the source name and change it to 'No Labels' or 'Normal'. Now 'Save and close' and the tracks will load.

The sequence data was aligned with Maq to the NCBI37 mouse reference and we make use of the LookSeq short-read visualisation tool to display the data. The variation consequences were called against Ensembl v55. These SNPs and indels were filtered using similar filters to those determined in the A/J and CASTEi Chr17 pilot study. We expect that these filters will not be appropriate for all chromosomes and strains and are planning to carry out validation experiments in order to determine the optimal SNP/indel filters per strain. We strongly recommend that you carry out independent experimental validation.

We are continuing to sequence the strains to much higher depth with the overall goal of producing full de novo assemblies of each strain. We will also be making new releases of the variation query website in the coming months as we increase the depth per strain and carry out our own validation of the data.

Data Release (REL-0912)

The initial sequencing for the project is now mostly complete. 14 of the 17 strains are over 20x coverage (three over 30x) with the others (129S5, NZO, and WSB) all over 15x. These intermediate data are available from our FTP site, as BAM format alignments, FASTA consensus sequences, and pileup format for variants.

Please note that these are preliminary unaccessioned data, and are covered by our data release policy.

Announcements Mailing List

We have setup a mailing list where we'll be making announcements related to the project. To join this mailing list, please visit this link and sign up.

Enquiries

For further information, please contact mousegenomes@sanger.ac.uk

Latest News

  • 2010-01-13 New release
    The new release data have been loaded and are visible in the SNP query and LookSeq pages. 14 of the 17 strains are over 20x coverage (three over 30x) with the others (129S5, NZO, and WSB) all over 15x.
  • 2009-12-11 New Sequencing Data
    We have made a new freeze of the sequencing data with almost all of the strains sequenced to over 20x depth. This new data has now been loaded into the Lookseq viewer. We are preparing new SNP/indel calls and hope to have these searchable very soon.
  • 2009-11-03 IMGC
    Here is the poster we presented at IMGC
  • 2009-11-03 Sequencing Progress
    We now have 10 strains over 20x sequencing depth and have begun to prepare a new data release (SNP, indel, SV calls) for these strains. We'll post and update when the new data has been loaded.