Second assembly Zv2 of the zebrafish genome released
Please note that this is a *preliminary* assembly and there are a number of points to remember:
There is a high level of misassembly. This is because the source DNA came from ~1000 5 day old embryos and the polymorphism is at least 1/200bps with additional significant indels. Thus regions of the genome which are highly variable do not form clusters for assembly since the sequences that originate from a given region are quite likely from different haplotypes. This causes assembly dropouts for some regions and false duplications in other regions where phrap splits different haplotypes into multiple paths. We are working on the assembly code, Phusion, to address these issues. However, there is an enormous amount of useful sequence in this assembly and hope this outweighs the problems in the assembly.
We tried to include the fingerprint information from our
fpc database to merge assembly supercontigs. If this could be done, the new contigs were named
after the fpc contig that lead to the merge (eg. ctg123). However, please not that this assembly is
not tied to a map and mapping information derived from the contig names are therefore to be treated
with care. We offer a search tool to make all mapping information for a certain supercontig available.
An ensembl database build on the Zv2 assembly including a gene build is now available.
The assembly can be searched using BLAST or SSAHA. Single contigs of your interest can be downloaded right there under the Export Data option.
Note that Zebrafish SSAHA now supports very rapid queries using protein sequences.
The whole assembly can be downloaded at ftp://ftp.ensembl.org/pub/assembly/zebrafish/Zv2release
Assembly Statistics
We started with 11737560 reads comprising 7.64 Gbp (651 bps average RL). There are 9953938 unique reads, 84.8 % of the total reads, placed in the assembly.
Phusion was used to cluster the reads and phrap was used for cluster assembly and consensus generation
Small supercontigs with less than 3 reads or smaller than 1kb were rejected.
For the supercontigs (bp measures include estimated gap sizes):
Contig stats:
Supercontig stats (bp measures include estimated gap sizes):
Estimated coverage based on 93 Mbp of 656 finished clones gives:
