Yeast Data

Archived

Yeast Data

SGRP, the Saccharomyces Genome Resequencing Project

This work was a collaboration between the Sanger Institute and Professor Ed Louis' group at the Institute of Genetics, University of Nottingham

Archive Page

This page is maintained as a historical record and is no longer being updated.

Saccharomyces Genome Resequencing

The goal of the project was to advance understanding of genomic variation and evolution by analysing sequences from multiple strains of the two Saccharomyces pecies, S cerevisiae and S paradoxus.

We have completed ABI sequencing of haploids of 37 cerevisiae strains and 27 paradoxus strains to a depth of between 1x and 3x, yielding a total of 1.42 million reads (1,292 megabases); and Illumina GA (Solexa) sequencing of four of the 37 cerevisiae strains and an additional 10 paradoxus strains.

The sequence data has been aligned to the respective reference genome sequences using SsahaSNP (for ABI) and Maq (for Illumina) followed by the application of heuristics to select the most plausible alignments. The SNPs (single-nucleotide polymorphisms) implied by these alignments have been extracted. We have also developed methods, based on ancestral recombination graphs, for imputing nucleotide values at positions in the genome where some strains may have no or only poor-quality evidence while other, closely-related ones are better represented.

Links

Download the reads, alignments and provisional assemblies of each strain. This is what you need if you are interested in carrying out genome-wide analyses. You will also need:
- The reads from the above download are also available from the NCBI Trace Archive, and can be accessed by following the instructions below.
Purchase the strains
SsahaSNP alignment software.
Maq alignment software.

A BLAST server at the University of Toronto.

Instructions

To download the SGRP reads from the NCBI Trace Archive, enter a query such as

CENTER_NAME = "SC" and STRAIN = "W303"

(substituting the strain name of your choice for W303) and click “Submit”.

However, you need to be aware that because of some plate-handling errors, the names of some of the reads there need to be corrected. These corrections have already been applied in the SGRP browser and the FTP download data, which you should use unless you specifically need NCBI format. Also, quality clipping has been applied to the FTP download data, but not to the versions in the trace archive.

The full list of corrections is available on the ftp site. In that file, a single name on a line by itself means that that read in the Trace Archive should be ignored, while two names mean that the read with the first name should have the second name so that the p1k and q1k reads are correctly paired. The strains in question, and the number of reads affected, are as follows.

S cerevisiae		S paradoxus	–
BC187	619	A4	–
DBVPG1373	85	CBS5829	651
DBVPG6044	1161	DBVPG4650	180
DBVPG6765	1128	DBVPG6304	1981
L_1374	96	N_17	78
SK1	19594	N_43	530
Y55	647	N_44	2273
YGPM	1343	N_45	720
YPS128	16347	Q59_1	871
YPS606	1151	T21_4	389
273614N	194	UFRJ50816	354
NCYC361	188	YPS138	201
UWOPS03_461_4	2822	UWOPS91_917_1	471
W303	–	–	–
YJM975	24	–	–
YJM978	1114	–	–

Data Release Policy

The release of pre-publication data from large resource-generating scientific projects was the subject of a meeting held in January 2003, the Fort Lauderdale meeting, sponsored by the Wellcome Trust, one of the Project funders. The report from that meeting can be viewed here.

The recommendations of the Fort Lauderdale meeting address the roles and responsibilities of data producers, data users, and funders of “community resource projects”, with the aim of establishing and maintaining an appropriate balance between the interests of data users in rapid access to data and the needs of data producers to receive recognition for their work. The conclusion of the attendees at the meeting was that responsible use of the data is necessary to ensure that first-rate data producers will continue to participate in such projects and produce and quickly release valuable large-scale data sets. “Responsible use” was defined as allowing the data producers to have the opportunity to publish the initial global analyses of the data, as articulated at the outset of the project. Doing so also will ensure that the data generated are fully described.

Data use

This sequencing centre plans on publishing the completed and annotated sequences in a peer-reviewed journal as soon as possible. Permission of the principal investigator should be obtained before publishing analyses of the sequence/open reading frames/genes on a chromosome or genome scale. See our data sharing policy.

Careers and Study

Policies

Archive

Leadership

Faculty

Yeast Data

Archive Page

Saccharomyces Genome Resequencing

Links

Instructions

Data Release Policy

Related links

Data use