Third assembly Zv3 of the zebrafish genome released
Please note that this still is a *preliminary* assembly and there are a number of points to remember:
There is a high level of misassembly. This is because the source DNA came from ~1000 5 day old embryos and the polymorphism is at least 1/200bps with additional significant indels. Thus regions of the genome which are highly variable do not form clusters for assembly since the sequences that originate from a given region are quite likely from different haplotypes. This causes assembly dropouts for some regions and false duplications in other regions where phrap splits different haplotypes into multiple paths. We are working on the assembly code, Phusion, to address these issues. However, there is an enormous amount of useful sequence in this assembly and hope this outweighs the problems in the assembly.
The assembly comprises a total sequence length of 1,459,115,486 bp in 58,339 supercontigs. This assembly is the first one that has been tied to the FPC map (data freeze 31st of October, 2003). Supercontigs that could be tied to FPC contigs based on BAC end placement were given the FPC contig name, the other supercontigs were named 'NA' followed by a random number. Finished clone sequences were stitched into the supercontig sequence and the supercontigs placed onto chromosomes where possible. According to the agreement reached at the European zebrafish meeting in Paris, 2003, we translated linkage group numbers directly into chromosome numbers (e.g. linkage group 1 = chromosome 1).
A pre-ensembl database build on the Zv3 assembly featuring the sequence and raw computes is now available.
The assembly can be searched using BLAST or SSAHA. Single contigs of your interest can be downloaded right there under the Export Data option.
The whole assembly can be downloaded at
ftp://ftp.ensembl.org/pub/assembly/zebrafish/Zv3release
Assembly Statistics
We started with 13,122,073 reads comprising 9,107,933,259 bp (694 bps average RL). The coverage is roughly 5.7 x. There are 10,504,790 unique reads, 80 % of the total reads, placed in the assembly. (Note: untrimmed reads and placed reads align well with the assembly)
Phusion was used to cluster the reads and phrap was used for cluster assembly and consensus generation
Small supercontigs with less than 3 reads or smaller than 0.5 kb were rejected.
1,083,447,588 bp (74 %) could be tied to the FPC map. N50 = 698,104, n = 422
For the supercontigs (bp measures include estimated gap sizes):
Supercontig stats (bp measures include estimated gap sizes):
Estimated coverage based on 236 Mbp of 1502 finished clones gives a supercontig coverage of 98%. Note: the supercontig (AGP) coverage might slightly overestimated. The actual coverage is about 96-97%.
Stats for the new version of stitched assembly
Stats of stitched contigs:



