| clones sequencing and manual annotation | frequently asked questions |
Zebrafish genome assemblies
The current integrated genome assembly Zv9 comprises a total sequence length of 1.4 Gb in 4,560 scaffolds. This assembly is based on a clone path sorted with the high-density meiotic map SATMAP (Clark et al., in preparation). The data freeze was taken on the 1st of April 2010. The remaining gaps were filled with sequence from WGS31, a combined Illumina and capillary assembly. The assembly integration process involves sequence alignemnts as well as cDNA, marker and BAC/Fosmid end sequence placements.
WGS31 was created using Illumina sequencing reads from a double-haploid Tuebingen fish (289 million reads providing approximately 30-fold coverage), combined with capillary sequencing reads from a second related double-haploid Tuebingen fish (12.2 million reads providing approximately 7.5-fold coverage). This use of data from double-haploid Tuebingen fish results in less artificial haplotypic duplication than was found in previous WGS assemblies which were generated from multiple individual diploid fish. A novel de Bruijn graph based algorithm called Fuzzypath was used to assemble the Illumina reads into short sequence contigs; these contigs were then combined with the capillary reads using the Phusion assembler. This resulted in 119,136 contigs with an N50 size of 25 kb. Contigs are joined in supercontigs based on read pair information where the sizes of gaps are estimated using insert sizes of different lengths. There are 32,044 supercontigs in the WGS31 assembly with an N50 size of 614 kb.
EnsemblEnsembl currently features the Zv9 assembly complete with gene build and comparative genomics data. DAS sources can be added to the browser to view additional data aligned to the genome
Previous assemblies
tracesfrequently asked questions |
||||

