The finished human genome

The "Gold Standard" sequence

Email newsletter

News and blog updates

Sign up

The finished human genome
The International Human Genome Sequencing Consortium, of which the Wellcome Trust Sanger Institute is a major partner, today published their scientific analysis of the finished human genome, the Gold Standard sequence that is already acting to prime new biomedical research.

The paper is published on 21 October 2004 in Nature and details the rigorous standards set and surpassed during the 13-year Human Genome Project (HGP). The analysis suggests that there are perhaps only 20,000-25,000 protein-coding genes in our human genome.

The Wellcome Trust Sanger Institute made the largest single contribution to the human genome sequence and the ‘genome browser’ ENSEMBL, run by the Sanger Institute and the EMBL-European Bioinformatics Institute is a leading resource for researchers around the globe.

Key results of the research are:

  • The number of gaps has been reduced 400-fold to only 341

  • The finished human genome
    It covers 99 per cent of the gene-containing parts of the genome and is 99.999 per cent accurate

  • The new sequence correctly identifies almost all known genes (99.74 per cent)

  • It defines 22,287 ‘gene loci’, consisting of 19,599 protein-coding genes in the human genome and another 2,188 DNA segments that are predicted to be protein-coding genes

  • It identifies the ‘birth’ of 1183 genes in the last 60-100 million years

  • It identifies the ‘death’ of 30 or so genes in a similar time period

  • The accuracy and completeness allows systematic searches for the causes of disease, for example, to find all key heritable factors predisposing to diabetes or mutations underlying breast cancer – with confidence that little can escape detection

  • At a practical level, it eliminates tedious confirmatory work by researchers, who can now rely on highly accurate information

  • More generally, the HGP demonstrates the tremendous potential value of coordinated projects to create community resources to propel biomedical research

“In our analysis we revised some predictions based on the unfinished, draft sequence of the human genome. The task of identifying genes remains challenging, but the finished human genome sequence, genome sequences from other organisms, better computational models and other improved resources, have combined to give a much clearer and more reliable picture of our genomic landscape.”

Dr Jane Rogers Head of Sequencing at the Wellcome Trust Sanger Institute

The quality of sequence produced has an estimated error rate of less than one per 100,000 bases of code – tenfold better than the original goal. This means that gene identification can be more reliable and that studies our genome and health – for example, what genetic changes mean some individuals are predisposed to disease – can be carried out with greater confidence.

“Only a decade ago, most scientists thought humans had about 100,000 genes. When we analyzed the working draft of the human genome sequence three years ago, we estimated there were about 30,000 to 35,000 genes, which surprised many. This new analysis reduces that number even further and provides us with the clearest picture yet of our genome. The availability of the highly accurate human genome sequence in free public databases enables researchers around the world to conduct even more precise studies of our genetic instruction book and how it influences health and disease.”

NHGRI Director Francis S. Collins MD, PhD

Key challenges that lie ahead include: a systematic study of sequence variation among humans in a study of the association of variation with disease; systematic identification of non-protein-coding elements in the human genome, especially regulatory controls and structure elements; systematic identification of all the ‘modules’ in which genes and proteins function together to place genetic information in a functional context.

“Collectively we have produced a sequence that is as accurate and complete as possible in the present state of the art. It will be open for continuous improvement over the years to come, and of course open for all to use for any purpose, without restraint or fee. Let us continue to work together to ensure that the enormous benefits from this new knowledge flow to all and not just to the few.”

Sir John Sulston Former Director of The Wellcome Trust Sanger Institute​

More information

  1. More than 2,800 researchers who took part in the International Human Genome Sequencing Consortium share authorship on today’s Nature paper, which expands upon the group’s initial analysis published in Feb. 2001. Even more detailed annotations and analyses have already been published for chromosomes 5, 6, 7, 9, 10, 13, 14, 19, 20, 21, 22 and Y. Publications describing the remaining 12 chromosomes are forthcoming.

  2. The finished human genome sequence and its annotations can be accessed through the following public genome browsers: the Ensembl Genome Browser (www.ensembl.org) at the Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute; GenBank (www.ncbi.nih.gov/Genbank) at NIH’s National Center for Biotechnology Information (NCBI); the UCSC Genome Browser (www.genome.ucsc.edu) at the University of California at Santa Cruz; EMBL-Bank (www.ebi.ac.uk/embl) at the EMBL-European Bioinformatics Institute; and the DNA Data Bank of Japan (www.ddbj.nig.ac.jp)

  3. The International Human Genome Sequencing Consortium includes scientists at 20 institutions located in France, Germany, Japan, China, the United Kingdom and the United States.

Websites

Publications:

Loading publications...

Selected websites

  • The Wellcome Trust Sanger Institute

    The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms and more than 90 pathogen genomes. In October 2006, new funding was awarded by the Wellcome Trust to exploit the wealth of genome data now available to answer important questions about health and disease.

  • The Wellcome Trust and Its Founder

    The Wellcome Trust is the most diverse biomedical research charity in the world, spending about £450 million every year both in the UK and internationally to support and promote research that will improve the health of humans and animals. The Trust was established under the will of Sir Henry Wellcome, and is funded from a private endowment, which is managed with long-term stability and growth in mind.