Comparison of draft human sequence versions from the public and private domain
| Celera - 1 | Celera - 2 |
Human Genome Project (HGP) |
|
|---|---|---|---|
| Assembly |
'Whole Genome' (WGA in Science paper) |
Compartmentalised Shotgun (CSA in Science paper) |
Clone-based shotgun |
|
Sequence Coverage used in assembly |
5.1-fold Celera + 7.5-fold HGP 12.6-fold total |
5.1-fold Celera + 7.5-fold HGP 12.6-fold total + HGP localisation |
7.5-fold HGP + HGP localisation |
|
Genome Scaffold including unknown bases in gaps |
2.85 Billion | 2.91 Billion | 2.92 Billion |
|
Genome Sequence bases whose sequence was determined |
2.57 Billion (88%) |
2.65 Billion (90%) |
2.69 Billion (92%) |
| Fraction of Genome: | |||
|
1. Covered by raw sequence 2. Successfully assembled 3. Unassembled |
99.9% 88% ~12%* * 26% raw data not localized in genome |
>99% 90% ~10%* * 22% raw data not localized in genome |
94% 92% ~2%* * < 1% raw data all localized to individual clones |
|
Number of contigs (and hence gaps) |
221,036 | 170,033 | 149,821 |
|
Number of scaffolds (connected sets of contigs) |
118,968 fully ordered internally |
53,591 fully ordered internally |
87,757 partially ordered internally |
| Largest Contig | 1.2 Million | 2.0 Million | 28.5 Million |
| Proportion of Genome in contigs >100kb | 31% | 49% | 46% |
* Note: all percentages above assume a nominal euchromatic genome size of 2.93 Billion bases (the euchromatic genome is the part that can be sequenced with current technology, and that contains almost all the genes). Although the correct euchromatic genome size is still not known exactly, any difference from the assumption will not change relative numbers above.


