Comparison of draft human sequence versions from the public and private domain

  Celera - 1 Celera - 2 Human Genome Project
(HGP)
Assembly 'Whole Genome'
(WGA in Science paper)
Compartmentalised
Shotgun
(CSA in Science paper)
Clone-based shotgun
Sequence Coverage
used in assembly
5.1-fold Celera
+ 7.5-fold HGP
12.6-fold total
5.1-fold Celera
+ 7.5-fold HGP
12.6-fold total
+ HGP localisation
7.5-fold HGP
+ HGP localisation
Genome Scaffold
including unknown bases in gaps
2.85 Billion 2.91 Billion 2.92 Billion
Genome Sequence
bases whose sequence was determined
2.57 Billion
(88%)
2.65 Billion
(90%)
2.69 Billion
(92%)
Fraction of Genome:
1. Covered by raw sequence
2. Successfully assembled
3. Unassembled
99.9%
88%
~12%*
* 26% raw data
not localized in genome
>99%
90%
~10%*
* 22% raw data
not localized in genome
94%
92%
~2%*
* < 1% raw data
all localized to individual clones
Number of contigs
(and hence gaps)
221,036 170,033 149,821
Number of scaffolds
(connected sets of contigs)
118,968
fully ordered internally
53,591
fully ordered internally
87,757
partially ordered internally
Largest Contig 1.2 Million 2.0 Million 28.5 Million
Proportion of Genome in contigs >100kb 31% 49% 46%

* Note: all percentages above assume a nominal euchromatic genome size of 2.93 Billion bases (the euchromatic genome is the part that can be sequenced with current technology, and that contains almost all the genes). Although the correct euchromatic genome size is still not known exactly, any difference from the assumption will not change relative numbers above.

Back to comparison page

Human Genome Publication

Contact the Press Office

Don Powell Media and Public Relations Manager
Wellcome Trust Sanger Institute, Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 775 397
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/z2ipf2dj