Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
  • D. rerio
  • Overview
  • Frequently asked questions
  • Genome Resources Workshop 2011 Material
  • Contact us
  • map, clones & vega
  • assemblies & ensembl
  • Zebrafish Mutation Project
  • other services
  • contacts
The Danio rerio Sequencing Project


Second assembly Zv2 of the zebrafish genome released

Please note that this is a *preliminary* assembly and there are a number of points to remember:

There is a high level of misassembly. This is because the source DNA came from ~1000 5 day old embryos and the polymorphism is at least 1/200bps with additional significant indels. Thus regions of the genome which are highly variable do not form clusters for assembly since the sequences that originate from a given region are quite likely from different haplotypes. This causes assembly dropouts for some regions and false duplications in other regions where phrap splits different haplotypes into multiple paths. We are working on the assembly code, Phusion, to address these issues. However, there is an enormous amount of useful sequence in this assembly and hope this outweighs the problems in the assembly.

We tried to include the fingerprint information from our fpc database to merge assembly supercontigs. If this could be done, the new contigs were named after the fpc contig that lead to the merge (eg. ctg123). However, please not that this assembly is not tied to a map and mapping information derived from the contig names are therefore to be treated with care. We offer a search tool to make all mapping information for a certain supercontig available.

An ensembl database build on the Zv2 assembly including a gene build is now available.

The assembly can be searched using BLAST or SSAHA. Single contigs of your interest can be downloaded right there under the Export Data option.

Note that Zebrafish SSAHA now supports very rapid queries using protein sequences.

The whole assembly can be downloaded at ftp://ftp.ensembl.org/pub/assembly/zebrafish/Zv2release



Assembly Statistics

We started with 11737560 reads comprising 7.64 Gbp (651 bps average RL). There are 9953938 unique reads, 84.8 % of the total reads, placed in the assembly.

Phusion was used to cluster the reads and phrap was used for cluster assembly and consensus generation

Small supercontigs with less than 3 reads or smaller than 1kb were rejected.

For the supercontigs (bp measures include estimated gap sizes):

Contig stats:

  • Total bases = 1306256104 bps
  • contig number = 430985
  • Average length = 3030 bps
  • Largest = 44497 bps
  • bases / contigs: N50 = 4451, n = 87069

    Supercontig stats (bp measures include estimated gap sizes):

  • Total bases = 1452210772 bps
  • Supercontigs = 83470
  • Average length = 17398 bps
  • Largest = 3581975 bps
  • bases / contigs: N50 = 296896, n = 1397

    Estimated coverage based on 93 Mbp of 656 finished clones gives:

  • Supercontig coverage: 95%
  • Contig coverage: 77%

  • webmaster@sanger.ac.uk

    Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

    Last Modified Tue Oct 14 13:58:59 2003

    Genome Research Limited is a charity registered in England with number 1021457

    Data Sharing | Copyright