Overview of Data for Sequence Similarity Searches
Data types available for searching
- Sequence reads
These are the individual sequence reads, generally 500-600 bp in
length. The sequence read has the shotgun clone id and either "q1c" or "p1c" (for either forward or
reverse primer) in its name.
- Contig sequences
Assembled contigs taken at 3x coverage and their automated annotation are now available via GeneDB. In addition to this, all available contig sequence > 2kb are regularly posted on the ftp site.
- Contig sequences represent secondary sequence data, in that
they are the condensation of a number of shotgun reads. Contig
reflect more reliably the finished sequence data because the
depth of coverage of assembled shotgun reads ensures that the
majority of ambiquities are identified and at least partially
resolved. This is not to say that contigs do not contain
insertion and/or deletion events, usually as a conseqeunce of
the algorithim used to create a consensus. Please not, that
currently the Sanger Institute is unable to track contigs
through assembly and therefore, contig id's will change.
- Individual contig sequences which are highlighted by Blast
analysis can be retrieved by following the 'Sequence' link in the
returned HTML page.
- Only contigs greater than 2 kb are present in the Blast
searchable dataset.