Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
  • D. rerio
  • Overview
  • Frequently asked questions
  • Genome Resources Workshop 2011 Material
  • Contact us
  • map, clones & vega
  • assemblies & ensembl
  • Zebrafish Mutation Project
  • other services
  • contacts
The Danio rerio Sequencing Project


First assembly of the zebrafish genome released

Please note that this is a *preliminary* assembly and there are a number of points to remember:

There is a high level of misassembly. This is because the source DNA came from ~1000 5 day old embryos and the polymorphism is at least 1/200bps with additional significant indels. Thus regions of the genome which are highly variable do not form clusters for assembly since the sequences that originate from a given region are quite likely from different haplotypes. This causes assembly dropouts for some regions and false duplications in other regions where phrap splits different haplotypes into multiple paths. We are working on the assembly code, Phusion, to address these issues. However, there is an enormous amount of useful sequence in this assembly and hope this outweighs the problems in the assembly.



More information is available at:

ftp://ftp.ensembl.org/pub/traces/zebrafish/assembly/assembly06/README

Although the assembly is being made available as early as possible to the research community, an Ensembl gene build has NOT yet been performed. We are investigating this now but for the moment Ensembl will continue to present clone-based data.

We plan to release an updated Ensembl which presents all normal Ensembl features except Ensembl gene predictions in a few weeks.

The assembly may be searched using BLAST at:

http://www.ensembl.org/Danio_rerio/blastview

and by SSAHA at:

http://www.ensembl.org/Danio_rerio/ssahaview

Note that Zebrafish SSAHA now supports very rapid queries using protein sequences. This feature will be extended to all Ensembl species in due course.

Assembly data are available at:

ftp://ftp.ensembl.org/pub/traces/zebrafish/assembly/assembly06



Assembly Statistics

We started with 9643640 reads comprising 6.07Gbp (630bps average RL). There are 7942778 unique reads, 82.4% of starting reads, in the assembly.

Phusion was used to cluster the reads and phrap was used for cluster assembly and consensus generation

Small supercontigs with less than 3 reads or smaller than 1kb were rejected. 3.5Mbp of the assembly was rejected as possible contamination based on read source statistics at the supercontig level.

For the supercontigs (bp measures include estimated gap sizes):

  • Total bases = 1169967887 bps
  • Supercontigs = 158689
  • Average length = 7372 bps
  • Largest = 168788 bps
  • bases / contigs: N50 = 20521, n = 16515
  • Estimated coverage based on 12Mbp of 143 finished clones gives:

  • Supercontig coverage: 77%
  • Contig coverage: 61%
  • webmaster@sanger.ac.uk

    Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

    Last Modified Tue Oct 14 13:58:59 2003

    Genome Research Limited is a charity registered in England with number 1021457

    Data Sharing | Copyright