Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
  • D. rerio
  • Overview
  • Frequently asked questions
  • Genome Resources Workshop 2011 Material
  • Contact us
  • map, clones & vega
  • assemblies & ensembl
  • Zebrafish Mutation Project
  • other services
  • contacts
The Danio rerio Sequencing Project

Ninth assembly, Zv9, of the zebrafish genome released

General information

The assembly comprises a total sequence length of 1.4 Gb in 4,560 scaffolds. This assembly is based on a clone path sorted with the high-density meiotic map SATMAP (Clark et al., in preparation). The data freeze was taken on the 1st of April 2010. The remaining gaps were filled with sequence from WGS31, a combined Illumina and capillary assembly. The assembly integration process involves sequence alignemnts as well as cDNA, marker and BAC/Fosmid end sequence placements.

The sequences that are based on clone contigs or are linked to chromosomes via markers are named 'Zv9_scaffold' followed by a number. The WGS contigs that could not be placed onto chromosomes are named 'Zv9_NA' followed by a number. According to the agreement reached at the European Zebrafish Meeting in Paris, 2003, we translated linkage group numbers directly into chromosome numbers (e.g. linkage group 1 = chromosome 1).

Please note:

This is still a preliminary assembly. The regions of the assembly covered by WGS contigs are of lower quality than those covered by clones.

Resources

An Ensembl database built on the Zv9 assembly, featuring the sequence and preliminary annotation, is now available. Zv9 has been submitted to EMBL/Genbank and can be downloaded there.

Assembly Statistics WGS31

The WGS31 assembly used to fill the gaps in the clone path was created using Illumina sequencing reads from a double-haploid Tübingen fish (289 million reads providing approximately 30-fold coverage), combined with capillary sequencing reads from a second related double-haploid Tuebingen fish (12.2 million reads providing approximately 7.5-fold coverage). This use of data from double-haploid Tuebingen fish results in less artificial haplotypic duplication than was found in previous WGS assemblies which were generated from multiple individual diploid fish. A novel de Bruijn graph based algorithm called Fuzzypath was used to assemble the Illumina reads into short sequence contigs; these contigs were then combined with the capillary reads using the Phusion assembler.uses reads from doubled haploid Tuebingen zebrafish.

This resulted in 119,136 contigs with an N50 size of 25 kb. Contigs are joined in supercontigs based on read pair information where the sizes of gaps are estimated using insert sizes of different lengths. There are 32,044 supercontigs in the WGS31 assembly with an N50 size of 614 kb.

Assembly Statistics Integrated Assembly Zv9

The integration of the WGS31 assembly with the clone sequences results in the Zv9 assembly (bp measures include estimated gap sizes):

  • Total bases = 1,412,464,843 bp
  • Scaffolds = 4,560
  • Largest scaffold = 77,276,063 bp
  • Scaffold N50 = 1,551,602, n = 4560
  • 1,357,051,643 bp in scaffolds placed on chromosomes 1-25 (includes 100 bp gaps between scaffolds).
  • 25,245,215 bp in 112 unplaced clone-based scaffolds
  • 30,167,985 bp in 995 NA scaffolds.

webmaster@sanger.ac.uk

Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

Last Modified Tue Nov 9 13:56:24 2010

Genome Research Limited is a charity registered in England with number 1021457

Data Sharing | Copyright