Mouse Genomes Project

The ability to manipulate the mouse genome, together with the wealth of disease models, inbred strains and genomic resources, makes the mouse the premier model organism for genetic approaches to mammalian biology. Over a century of mouse genetics has provided scores of inbred strains, spontaneous and engineered mutations, making the mouse unsurpassed as a model system for investigating human disease. Access to complete sequence of multiple inbred strains will add to these resources and will become a permanent foundation for a systems biology approach to phenotypic variation in the mouse.

The Mouse Genomes Project uses next generation sequencing technologies to sequence the genomes of key laboratory mouse strains. We are releasing the raw sequence data, SNPs, short indels, structural variants and assemblies of each strain, under our data release policy.

The whole-genome sequencing is available from the European Nucleotide Archive (ENA) under the following accessions:

Sequencing data for BUB/BnJ, C57BL/10J, C57BR/cdJ, C58/J, DBA/1J, I/LnJ, MOLF/EiJ, NZB/B1NJ, NZW/LacJ, SEA/GnJ was provided by Kent Hunter

RNA-Seq

We have also completed RNA-Seq from whole-brain RNA across 15 of the strains and the data has been accessioned under ERP000614. We have also completed RNA-Seq from a cross of C57BL/6J and DBA/2J across 6 different tissues and this data is available from the ENA under accession ERP000591

ChIP-Seq

We sequenced DNA from liver bound to chromatin precipitated by a marker for active gene promoters (histone 3, lysine 4 trimethylation; H3K4me3). The two samples that we used in our analysis are ERS001976 and ERS001977

Data Release

The strains have been sequenced to an average of 40x coverage on the Illumina HiSeq platform with 100bp paired reads. The raw data has been accessioned at the Short Read Archive (SRA) and European Nucleotide Archive (ENA) under the study accession numbers given above. The data is also available from our FTP site in the form of:

Please see the README files in each folder for more information. The SNPs and indels have also been submitted to dbSNP recently and will appear soon. The SVs have been submitted to DGVa under accession number estd118. Please note that these are covered by our data release policy.

Data querying and visualization

We have set up a suite of querying and visualization tools to allow the community to gain access to the full range of variation data that has been produced as part of the project. The data is also available for download from our FTP site.

SNP and Indel Query

Use our query page to search for SNPs and indels by genomic region or gene. Select the strains, SNP quality, and consequences to display. The variation consequences were called against Ensembl v75. Please note that these are preliminary SNPs and indels, and the lists are periodically updated. We strongly recommend that you carry out independent experimental validation.

References

Primary citation for the resource:

  • Mouse genomic variation and its effect on phenotypes and gene regulation.

    Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellåker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assunção JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J and Adams DJ

    Nature 2011;477;7364;289-94

Other references:

  • Next-generation sequencing of experimental mouse strains.

    Yalcin B, Adams DJ, Flint J and Keane TM

    Mammalian genome : official journal of the International Mammalian Genome Society 2012;23;9-10;490-8

  • The fine-scale architecture of structural variants in 17 mouse genomes.

    Yalcin B, Wong K, Bhomra A, Goodson M, Keane TM, Adams DJ and Flint J

    Genome biology 2012;13;3;R18

  • The genomic landscape shaped by selection on transposable elements across 18 mouse strains.

    Nellåker C, Keane TM, Yalcin B, Wong K, Agam A, Belgard TG, Flint J, Adams DJ, Frankel WN and Ponting CP

    Genome biology 2012;13;6;R45

  • High levels of RNA-editing site conservation amongst 15 laboratory mouse strains.

    Danecek P, Nellåker C, McIntyre RE, Buendia-Buendia JE, Bumpstead S, Ponting CP, Flint J, Durbin R, Keane TM and Adams DJ

    Genome biology 2012;13;4;26

  • Sequencing and characterization of the FVB/NJ mouse genome.

    Wong K, Bumpstead S, Van Der Weyden L, Reinholdt LG, Wilming LG, Adams DJ and Keane TM

    Genome biology 2012;13;8;R72

  • Sequence-based characterization of structural variation in the mouse genome.

    Yalcin B, Wong K, Agam A, Goodson M, Keane TM, Gan X, Nellåker C, Goodstadt L, Nicod J, Bhomra A, Hernandez-Pliego P, Whitley H, Cleak J, Dutton R, Janowitz D, Mott R, Adams DJ and Flint J

    Nature 2011;477;7364;326-9

Enquiries

For further information, please contact mousegenomes@sanger.ac.uk

Latest News

  • 2014-10-27 New Variants Release
    New sequencing data and variants release for 28 laboratory strains. Query variants here. Raw data files available from our ftp site.
  • 2014-09-23 Upcoming new strains
    Two additional strains: BUB/BnJ and SEA/GnJ. Data processing and variant calling in progress - 28 strain website release scheduled for IMGC 2014.
  • 2014-07-08 Upcoming new strains
    C57/10J, C57BR/cdJ, C58/J, DBA/1J, I/LnJ, MOLF/EiJ, NZB/BlNJ, and NZW/LacJ. Data processing and variant calling in progress. A collaboration with Kent Hunter.
  • 2013-07-24 Accession Numbers
    We have made a minor update to this page to include the ENA accession numbers beside each strain name that correspond to the latest HiSeq 100bp sequencing data.
  • 2013-04-16 New Variants Pages
    We have launched a new interface for querying all of the sequence variants. It has been updated to include the latest variant release on GRCm38.
  • 2013-03-20 GRCm38 Release
    A new release of the BAM files, SNPs/indels, and structural variants has been made on the latest mouse reference genome (GRCm38). Please see the README files in the folders for more details. Our variant querying web page will be updated to the new release in the coming weeks.
  • 2013-02-27 Denovo Assemblies
    Preliminary scaffolds from SGA de novo assembly of 16 strains. Fasta files are in REL-1302-Assembly.These are scaffolds only and have not yet been organised into chromosomes.
  • 2013-02-12 New SV Release
    A new release of the structural variant calls has been made on our ftp site. This release incorporates the structural variation genotypes for the FVB/NJ strain. Also, we have updated the last SNP/indel release with a minor change to the VCF to fix the annotations for multi-allelic sites (the sites/genotypes are unchanged).
  • 2012-12-04 New SNP/indel Release
    A new release of the SNPs and indels for 16 of the mouse strains has been produced using the new higher quality sequencing data. The VCF files are available from the ftp site and the website query pages will be updated in the coming weeks. A new submission to dbSNP is also being prepared.
  • 2012-10-16 New Sequencing Data
    New higher quality sequencing data (100bp, ~40x, HiSeq platform) is available for 16 of the strains. This has been uploaded to the ftp site. New variant calls will be posted soon.
  • 2012-08-03 New Sequence Data
    New higher quality sequencing data (100bp, ~40x, HiSeq platform) is available on our ftp site for several of the strains - the rest will be posted soon.
  • 2012-06-12 FVB Variant Calls
    We have posted the full set of FVB variant calls (SNPs, short indels, and structural variants) on our ftp site in VCF format. The query pages on this site will be updated in the near future to incoroporate these calls.
  • 2011-11-17 FVB Strain Sequenced
    We have completed sequencing of the FVB/NJ strain to 53x sequence coverage. SNP, indel, and structural variation calls will be posted soon. The raw data is available for download from our ftp site and can browsed in LookSeq.
  • 2011-09-15 Nature Publications
    We are delighted to announce that two papers have been published in Nature describing the variation found and its impact on mouse phenotypes and traits. Further details.
  • 2011-08-05 Structural Variation Calls
    The entire set of structural variation calls across the 17 strains have been posted on our ftp site in the current_svs directory. These have also been submitted to DGVa under accession estd118.
  • 2011-06-30 Whole-brain RNA-seq BAMs
    We have just posted the BAM files and top-hat expression values for the whole-brain RNA-Seq data on our ftp site in the current_rna_bams directory. Note this data is also available from the ENA under accession ERP000614
  • 2011-06-27 Multi-tissue RNA-Seq
    We have recently completed RNA-Seq from a cross of C57BL/6J and DBA/2J across 6 different tissues and this data is available from the ENA under accession ERP000591
  • 2011-06-24 Whole-brain RNA-Seq
    We have recently completed sequencing whole-brain RNA from 15 of the strains using the Illumina RNA-Seq protocol. The data has been accessioned at the ENA under ERP000614.
  • 2011-06-24 Long Fragment end Sequencing
    We have recently completed sequencing the ends of 3kb fragments on the illumina platform across the mouse strains. The data is available from the ENA under study accession ERP000255
  • 2011-06-23 Variant Release & dbSNP
    We have made the final release of the SNP and short indel variants. The only change from the previous release (REL-1101) is that we have filtered out chrY calls. These SNPs and indels have been submitted to dbSNP as the final published set. The SNPs and short indels are available from our ftp site. The BAM alignment files have not been changed.
  • 2011-06-22 Accession Numbers
    The raw data has all been submitted to the ENA and the accession numbers and links are listed with the strain names.
  • 2010-07-27 Variant Releases
    The variant calls have been steadily updated over the past few months. The Data Release section below has links to all of the latest sets of SNP and short indel calls and the online browser has also been updated to reflect the new callsets. The BAM alignment files have not been changed.
  • 2010-01-13 New release
    The new release data have been loaded and are visible in the SNP query and LookSeq pages. 14 of the 17 strains are over 20x coverage (three over 30x) with the others (129S5, NZO, and WSB) all over 15x.
  • 2009-12-11 New Sequencing Data
    We have made a new freeze of the sequencing data with almost all of the strains sequenced to over 20x depth. This new data has now been loaded into the Lookseq viewer. We are preparing new SNP/indel calls and hope to have these searchable very soon.
  • 2009-11-03 IMGC
    Here is the poster we presented at IMGC
  • 2009-11-03 Sequencing Progress
    We now have 10 strains over 20x sequencing depth and have begun to prepare a new data release (SNP, indel, SV calls) for these strains. We'll post and update when the new data has been loaded.
* quick link - http://q.sanger.ac.uk/3mbwjqe8