Mouse Genomes Project

The ability to manipulate the mouse genome, together with the wealth of disease models, inbred strains and genomic resources, makes the mouse the premier model organism for genetic approaches to mammalian biology. Over a century of mouse genetics has provided scores of inbred strains, spontaneous and engineered mutations, making the mouse unsurpassed as a model system for investigating human disease. Access to complete sequence of multiple inbred strains will add to these resources and will become a permanent foundation for a systems biology approach to phenotypic variation in the mouse.

The Mouse Genomes Project aims to use new sequencing technologies to sequence the genomes of 17 key mouse strains. We are releasing the raw sequence data, SNPs, short indels, structural variants and assemblies of each strain, under our data release policy.

The whole-genome sequencing is available from the European Nucleotide Archive (ENA) under the following accessions:

We have also completed RNA-Seq from whole-brain RNA across 15 of the strains and the data has been accessioned under ERP000614. We have also completed RNA-Seq from a cross of C57BL/6J and DBA/2J across 6 different tissues and this data is available from the ENA under accession ERP000591

Data Release

The sequencing for the project is now complete. The strains have been sequenced to an average of 25x coverage on the Illumina GAII platform with a mixture of 54bp, 76bp, and 108bp paired reads. The raw data has been accessioned at the Short Read Archive (SRA) and European Nucleotide Archive (ENA) under the study accession numbers given above. The data is also available from our FTP site in the form of:

Please see the README files in each folder for more information. The SNPs and indels have also been submitted to dbSNP recently and will appear soon. The SVs have been submitted to DGVa under accession number estd118. Please note that these are covered by our data release policy.

Data querying and visualization

We have set up a suite of querying and visualization tools to allow the community to gain access to the full range of variation data that has been produced as part of the project. The data is also available for download from our FTP site.

SNP and Indel Query

Use our query page to search for SNPs and indels by genomic region or gene. Select the strains, SNP quality, and consequences to display. The variation consequences were called against Ensembl v64. SNPs and indels can also be visualized on our LookSeq page. Please note that these are preliminary SNPs and indels, and the lists are periodically updated. We strongly recommend that you carry out independent experimental validation.

Lookseq Visualizer

We have implemented LookSeq to visualize read alignments in a region of interest. The Mouse Genomes Project LookSeq page displays data in 'pileup' view to visualize SNPs and indels, or 'read pair' view to visualize larger structural variants, and allows filtering of data by mapping quality.

DAS Tracks

Coding SNPs can also be viewed in the Ensembl Genome Browser by adding a DAS track. Click here to find out how.

References

Keane TM, Goodstadt L, Danecek P, et al. (2011) Mouse genomic variation and its effect on phenotypes and gene regulation, Nature, 477(7364):289-294. link

Yalcin B, Wong K, Agam A, et al. (2011) Sequence-based characterization of structural variation in the mouse genome, Nature, 477(7364):326-329. link

Announcements Mailing List

We have setup a mailing list where we'll be making announcements related to the project. To join this mailing list, please visit this link and sign up.

Enquiries

For further information, please contact mousegenomes@sanger.ac.uk

Latest News

  • 2011-11-17 FVB Strain Sequenced
    We have completed sequencing of the FVB/NJ strain to 53x sequence coverage. SNP, indel, and structural variation calls will be posted soon. The raw data is available for download from our ftp site and can browsed in LookSeq.
  • 2011-09-15 Nature Publications
    We are delighted to announce that two papers have been published in Nature describing the variation found and its impact on mouse phenotypes and traits. Further details.
  • 2011-08-05 Structural Variation Calls
    The entire set of structural variation calls across the 17 strains have been posted on our ftp site in the current_svs directory. These have also been submitted to DGVa under accession estd118.
  • 2011-06-30 Whole-brain RNA-seq BAMs
    We have just posted the BAM files and top-hat expression values for the whole-brain RNA-Seq data on our ftp site in the current_rna_bams directory. Note this data is also available from the ENA under accession ERP000614
  • 2011-06-27 Multi-tissue RNA-Seq
    We have recently completed RNA-Seq from a cross of C57BL/6J and DBA/2J across 6 different tissues and this data is available from the ENA under accession ERP000591
  • 2011-06-24 Whole-brain RNA-Seq
    We have recently completed sequencing whole-brain RNA from 15 of the strains using the Illumina RNA-Seq protocol. The data has been accessioned at the ENA under ERP000614.
  • 2011-06-24 Long Fragment end Sequencing
    We have recently completed sequencing the ends of 3kb fragments on the illumina platform across the mouse strains. The data is available from the ENA under study accession ERP000255
  • 2011-06-23 Variant Release & dbSNP
    We have made the final release of the SNP and short indel variants. The only change from the previous release (REL-1101) is that we have filtered out chrY calls. These SNPs and indels have been submitted to dbSNP as the final published set. The SNPs and short indels are available from our ftp site. The BAM alignment files have not been changed.
  • 2011-06-22 Accession Numbers
    The raw data has all been submitted to the ENA and the accession numbers and links are listed with the strain names.
  • 2010-07-27 Variant Releases
    The variant calls have been steadily updated over the past few months. The Data Release section below has links to all of the latest sets of SNP and short indel calls and the online browser has also been updated to reflect the new callsets. The BAM alignment files have not been changed.
  • 2010-01-13 New release
    The new release data have been loaded and are visible in the SNP query and LookSeq pages. 14 of the 17 strains are over 20x coverage (three over 30x) with the others (129S5, NZO, and WSB) all over 15x.
  • 2009-12-11 New Sequencing Data
    We have made a new freeze of the sequencing data with almost all of the strains sequenced to over 20x depth. This new data has now been loaded into the Lookseq viewer. We are preparing new SNP/indel calls and hope to have these searchable very soon.
  • 2009-11-03 IMGC
    Here is the poster we presented at IMGC
  • 2009-11-03 Sequencing Progress
    We now have 10 strains over 20x sequencing depth and have begun to prepare a new data release (SNP, indel, SV calls) for these strains. We'll post and update when the new data has been loaded.
* quick link - http://q.sanger.ac.uk/3bhylma8