Accessing the Data
We have set up a suite of querying and visualisation tools to allow the community to gain access to the full range of variation data that is being produced as part of the project. Currently, it is possible to query SNPs and short indels and view sequence data across regions of the genome. It is also possible to click through from any SNP/indel to visualise the region of interest. If you notice any problems or experience any technical issues with accessing the website then we suggest you contact us by email via mousegenomes@sanger.ac.uk.
Coding SNPs can also be viewed in the Ensembl genome browser by adding a DAS track. There is one source per strain and they are all listed in the DAS Registry, named 'MGP_[strain_name]_SNPs'. To add a DAS track, view a region of the mouse genome, eg. http://www.ensembl.org/Mus_musculus/Location/View?r=1:9485681-9585680, and click 'Configure this page'. Click the 'Custom Data' tab. Click 'Attach DAS'. Make sure 'DAS Registry' is selected in the drop down and that the other boxes are blank, then click 'Next' (and wait, though nothing seems to happen). Scroll down and tick the MGP_* sources you want and click 'Next' (and wait). Now click on the 'Main Panel' tab and the new sources you added should be listed in the 'User attached data' area. You may find that these all have labels forced on; in which case click to the left-most icon next to the source name and change it to 'No Labels' or 'Normal'. Now 'Save and close' and the tracks will load.
The sequence data was aligned with Maq to the NCBI37 mouse reference and we make use of the LookSeq short-read visualisation tool to display the data. The variation consequences were called against Ensembl v55. These SNPs and indels were filtered using similar filters to those determined in the A/J and CASTEi Chr17 pilot study. We expect that these filters will not be appropriate for all chromosomes and strains and are planning to carry out validation experiments in order to determine the optimal SNP/indel filters per strain. We strongly recommend that you carry out independent experimental validation.
We are continuing to sequence the strains to much higher depth with the overall goal of producing full de novo assemblies of each strain. We will also be making new releases of the variation query website in the coming months as we increase the depth per strain and carry out our own validation of the data.
Data Release (REL-0912)
The initial sequencing for the project is now mostly complete. 14 of the 17 strains are over 20x coverage (three over 30x) with the others (129S5, NZO, and WSB) all over 15x. These intermediate data are available from our FTP site, as BAM format alignments, FASTA consensus sequences, and pileup format for variants.
Please note that these are preliminary unaccessioned data, and are covered by our data release policy.
Announcements Mailing List
We have setup a mailing list where we'll be making announcements related to the project. To join this mailing list, please visit this link and sign up.
Enquiries
For further information, please contact mousegenomes@sanger.ac.uk
