The finished human genome

The "Gold Standard" sequence

The finished human genome

seq_300.jpg
The finished human genome

The International Human Genome Sequencing Consortium, of which the Wellcome Trust Sanger Institute is a major partner, today published their scientific analysis of the finished human genome, the Gold Standard sequence that is already acting to prime new biomedical research.

The paper is published on 21 October 2004 in Nature and details the rigorous standards set and surpassed during the 13-year Human Genome Project (HGP). The analysis suggests that there are perhaps only 20,000-25,000 protein-coding genes in our human genome.

The Wellcome Trust Sanger Institute made the largest single contribution to the human genome sequence and the 'genome browser' ENSEMBL, run by the Sanger Institute and the EMBL-European Bioinformatics Institute is a leading resource for researchers around the globe.

Key results of the research are:
  • The number of gaps has been reduced 400-fold to only 341

  • manarray.png
    The finished human genome

    It covers 99 per cent of the gene-containing parts of the genome and is 99.999 per cent accurate

  • The new sequence correctly identifies almost all known genes (99.74 per cent)

  • It defines 22,287 'gene loci', consisting of 19,599 protein-coding genes in the human genome and another 2,188 DNA segments that are predicted to be protein-coding genes

  • It identifies the 'birth' of 1183 genes in the last 60-100 million years

  • It identifies the 'death' of 30 or so genes in a similar time period

  • The accuracy and completeness allows systematic searches for the causes of disease, for example, to find all key heritable factors predisposing to diabetes or mutations underlying breast cancer - with confidence that little can escape detection

  • At a practical level, it eliminates tedious confirmatory work by researchers, who can now rely on highly accurate information

  • More generally, the HGP demonstrates the tremendous potential value of coordinated projects to create community resources to propel biomedical research

"In our analysis we revised some predictions based on the unfinished, draft sequence of the human genome. The task of identifying genes remains challenging, but the finished human genome sequence, genome sequences from other organisms, better computational models and other improved resources, have combined to give a much clearer and more reliable picture of our genomic landscape."

Dr Jane Rogers, Head of Sequencing at the Wellcome Trust Sanger Institute

The quality of sequence produced has an estimated error rate of less than one per 100,000 bases of code - tenfold better than the original goal. This means that gene identification can be more reliable and that studies our genome and health - for example, what genetic changes mean some individuals are predisposed to disease - can be carried out with greater confidence.

"Only a decade ago, most scientists thought humans had about 100,000 genes. When we analyzed the working draft of the human genome sequence three years ago, we estimated there were about 30,000 to 35,000 genes, which surprised many. This new analysis reduces that number even further and provides us with the clearest picture yet of our genome. The availability of the highly accurate human genome sequence in free public databases enables researchers around the world to conduct even more precise studies of our genetic instruction book and how it influences health and disease."

NHGRI Director Francis S. Collins, MD, PhD

Key challenges that lie ahead include: a systematic study of sequence variation among humans in a study of the association of variation with disease; systematic identification of non-protein-coding elements in the human genome, especially regulatory controls and structure elements; systematic identification of all the 'modules' in which genes and proteins function together to place genetic information in a functional context.

"Collectively we have produced a sequence that is as accurate and complete as possible in the present state of the art. It will be open for continuous improvement over the years to come, and of course open for all to use for any purpose, without restraint or fee. Let us continue to work together to ensure that the enormous benefits from this new knowledge flow to all and not just to the few."

Sir John Sulston, former Director of The Wellcome Trust Sanger Institute​

Notes to Editors
Publications
  • Finishing the euchromatic sequence of the human genome.

    International Human Genome Sequencing Consortium

    Nature 2004;431;7011;931-45

  1. More than 2,800 researchers who took part in the International Human Genome Sequencing Consortium share authorship on today's Nature paper, which expands upon the group's initial analysis published in Feb. 2001. Even more detailed annotations and analyses have already been published for chromosomes 5, 6, 7, 9, 10, 13, 14, 19, 20, 21, 22 and Y. Publications describing the remaining 12 chromosomes are forthcoming.

  2. The finished human genome sequence and its annotations can be accessed through the following public genome browsers: the Ensembl Genome Browser (www.ensembl.org) at the Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute; GenBank (www.ncbi.nih.gov/Genbank) at NIH's National Center for Biotechnology Information (NCBI); the UCSC Genome Browser (www.genome.ucsc.edu) at the University of California at Santa Cruz; EMBL-Bank (www.ebi.ac.uk/embl) at the EMBL-European Bioinformatics Institute; and the DNA Data Bank of Japan (www.ddbj.nig.ac.jp)

  3. The International Human Genome Sequencing Consortium includes scientists at 20 institutions located in France, Germany, Japan, China, the United Kingdom and the United States.

Websites
Selected Websites
Contact the Press Office

Dr Samantha Wynne, Media Officer

Tel +44 (0)1223 492 368

Emily Mobley, Media Officer

Tel +44 (0)1223 496 851

Wellcome Trust Sanger Institute,
Hinxton,
Cambridgeshire,
CB10 1SA,
UK

Mobile +44 (0) 7900 607793

Recent News

Thermo Fisher Scientific and Wellcome Trust Sanger Institute Announce the Axiom Africa Array for Medical and Population Genomics

The array tags at least 90 per cent of common genetic variation in 12 diverse African populations

1 to 10 mutations are needed to drive cancer, scientists find

The results show the number of mutations driving cancer varies considerably across different cancer types

The international Human Cell Atlas publishes strategic blueprint; announces data from first one million cells

Blueprint describes path forward for cataloging every cell in the human body; cell data release to be available to research community