Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
All Sequencing
  • Human (HGP)
  • Pathogens
  • Blast
  • D. rerio
  • Overview
  • Frequently asked questions
  • OpenDoorWorkshop '09 Tutorial
  • Contact us
  • map, clones & vega
  • assemblies & ensembl
  • ZF-Models
  • other services
  • contacts
  • Website Search
  • People Search
  • Library Services
  • Site Map
  • Feedback / Help
Retrieve BLAST result
The Danio rerio Sequencing Project

Seventh assembly, Zv7, of the zebrafish genome released

General information

The assembly comprises a total sequence length of 1,440,582,308 bp in 5,036 fragments. This assembly has been tied to the FPC map (data freeze 11th April 2007) which provides a tiling path of sequenced clones. 1.02 Gb of sequence from 7,823 sequenced clones (7,139 finished and 684 unfinished) were taken as a scaffold that was completed with contigs from a whole genome shotgun (WGS) assembly (see details below). This integration of clone sequences and WGS contigs is based on a mixed strategy that considers sequence alignments and the placement of BAC ends and features such as zebrafish cDNAs and markers.

The sequences that are based on FPC contigs or are linked to chromosomes via markers are named Zv7_scaffold followed by a number. The WGS contigs that could not be placed onto chromosomes are named Zv7_NA followed by a number. According to the agreement reached at the European Zebrafish Meeting in Paris, 2003, we translated linkage group numbers directly into chromosome numbers (e.g. linkage group 1 = chromosome 1).

Please note:

This is still a *preliminary* assembly and there are a number of points to remember. The regions of the assembly covered by WGS contigs are of lower quality. In general regions which are highly variable do not form clusters since they are quite likely from different haplotypes. This also affects the generation of the physical map resulting in assembly dropouts and false duplications. In this assembly special attention has been paid to these issues and over 200 Mb of duplicated sequence has been removed compared to Zv6.

Resources

A full Ensembl database built on the Zv7 assembly, featuring the sequence and considerable annotation, is now available.

The assembly can be searched using BLAST or SSAHA2. Single contigs of interest can be downloaded using the Export Data option.

The whole assembly can be downloaded from ftp://ftp.ensembl.org/pub/assembly/zebrafish/Zv7release

Assembly Statistics

The WGS assembly used to fill the gaps in the tiling path uses reads solely from a library generated from a single Tuebingen, doubled haploid zebrafish. It is based on 13,756,367 reads comprising 10,891,216,277 bp with a coverage of 5.5x. Phusion was used to cluster the reads and phrap was used for consensus generation. This resulted in 131,933 contigs with an N50 size of 20,127 bp. Contigs are joined in supercontigs based on read pair information where the sizes of gaps are estimated using insert sizes of different lengths. Small supercontigs with less than 3 reads or smaller than 0.5 kb were rejected. There are 22,961 supercontigs in the WGS assembly with an N50 size of 1,499,123 bp.

The integration of the WGS assembly with the clone sequences results in the Zv7 assembly (bp measures include estimated gap sizes):

  • Total bases = 1,440,582,308 bp
  • Scaffolds = 5,036
  • Largest = 10,976,257
  • N50 = 1,153,933, n = 277
  • 1,277,075,233 bp in scaffolds placed on chromosomes 1-25 (includes 100 bp gaps between scaffolds).
  • 45,800,611 bp in 166 scaffolds tied to unplaced FPC contigs.
  • 117,689,868 bp in 4,844 NA scaffolds.
Information Projects Other Services
Sanger Home
Sitemap
Site Search
Information
Careers
Press
News
Seminars
Workshops
Publications
Staff Theses
Travel Directions
Research Teams
Research Faculty
Personnel Search
Human Genetics
Model Organism Genetics
Pathogen Genetics
Bioinformatics
Sequencing
Library
Helpdesk
Webmail
VPN Access
Sign In
SSO Pass. Reset

webmaster@sanger.ac.uk

Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

Last Modified Fri Nov 2 09:22:57 2007

Genome Research Limited is a charity registered in England with number 1021457

Data Sharing Policy | Conditions of Use | Copyright