17th May 2006

Genome doesn't start with 'G'

Study of the largest and last chromosome of the human genome published

chromosome 1 pair

The Wellcome Trust Sanger Institute and colleagues in the UK and USA today publish the longest and final chapter in what has been called The Book of Life - the text and study of our human genetic material. Published in Nature, the report of the sequence of human chromosome 1 is the final chromosome analysis from the Human Genome Project.

The sequence has been used to identify more than 1000 new genes and is expected to help researchers find novel diagnostics and treatments for many diseases. In the past year alone, genes involved in a dozen diseases, including cancer and neurological disease, have been identified using the freely available chromosome 1 sequence and DNA resources.

If it were typed out, chromosome 1's huge repository of genetic information would cover 60,000 pages. It is home to more than 3000 genes and more than 350 known diseases, including conditions as varied as cancer development, Parkinson's and Alzheimer's disease, high cholesterol and porphyria (thought to affect King George III of England).

"The sequence we have generated, like that produced by our collaborators throughout the Human Genome Project, has driven biomedical discovery," said Dr Simon Gregory, Assistant Professor from Duke University, who led the project while at the Sanger Institute. "This moment, the publication of the sequence from the last and largest human chromosome, completes the story of the HGP and marks the growing wave of biological and medical research founded on the human genome sequence".

" Chromosome 1 contains fascinating stories of chromosome biology, of our evolution, and our health ... "

Dr Simon Gregory

"Chromosome 1 contains fascinating stories of chromosome biology, of our evolution, and our health, and it's inspiring to have played a part in a programme that will have so much power to understand the essence of human biology."

Human chromosomes are numbered from the largest (chromosome 1) to the smallest (chromosomes 22 and 21). Each is composed of many millions of genetic letters or bases, called A, C, T and G. The first genetic letter of chromosome 1 sequence, and hence the beginning of our genome, is "C".

The sequence of human chromosome 1 is 223,569,564 bases of genetic code - around 8% of our genome - and contains about twice as many genes as the average chromosome. "The size of chromosome 1 means its landscape spans extremes in gene content, with stretches of millions of bases of gene-rich oases and gene-poor deserts", continued Dr Gregory, "as well as regions of the chromosome that are copied during early and late phases of cell division.".

But the sequence must be mined to be of benefit: for example, differences in the sequence between individuals will help develop an understanding of diseases associated with this chromosome. Almost 4500 single-letter changes in the genetic code (called SNPs) were identified that could lead to changes in protein activity. In addition, 90 SNPs were found that would result in a shortened - and possibly inactive - protein. Although some 15 SNPs are associated with already known protection from malaria and predisposition to porphyria, the function of these newly located SNPs is yet to be discovered.

60,000 pages

"A catalyst for our gene discoveries", is how Dr Brian Schutte, Associate Professor of Pediatrics at the University of Iowa, describes the sequence of chromosome 1. "Prior to the sequencing efforts, we managed to localize the gene for a rare human orofacial clefting disease to a region on chromosome 1. But, we had no clue which genes lay in the region".

"Our collaboration with the Sanger led to much more rapid discovery of the gene involved and now we, and others, have found that normal genetic variation in the same gene contributes 12% risk for the common form of cleft lip and palate. Our experience demonstrates two important issues. Firstly, gene discoveries in rare diseases can contribute directly to the understanding of common diseases. Secondly, sequencing efforts accelerate gene discovery of not only rare genetic disorders, but also common diseases that place the greatest burden on our healthcare system."

The finished sequence of chromosome 1 enabled the team to bring together chromosome-wide information associated with genetic variation from projects such as the HapMap - a leading international study of human genetic variation. Our chromosome pairs 'recombine' with each other, so that regions inherited from our two parents are shuffled when passed on to our children.

Shuffling the deck tends not to disrupt genes. Most of the recombination found on chromosome 1 occurs at a few hotspots and more than 80% of hotspots are in only 15% of the sequence. Fine scale analyses have shown that recombination tends to be near to genes but outside the actual gene structures themselves.

Dr David Bentley, Chief Scientist at Solexa and former Head of Human Genetics at the Wellcome Trust Sanger Institute, said "The sequence of chromosome 1, published today, is part of an exciting and near-complete reference volume of our genome. Freely available in the public domain, researchers all over the world are already adding new information to it, enriching the picture of what it is to be human, for the benefit of others in the future.".

Careful analysis also showed how our genome has undergone recent evolutionary selection. The team looked at correlations between the HapMap data and the annotated chromosome 1 sequence to investigate the variation between three human population groups with ancestry in Europe, Africa or Asia.

Genome sequence varies from person to person. New insights are being gained all the time. We now find that genetic differences may be prevalent in one population but rare in, or absent from, another. Some of these like the gain or loss of large regions, have been recognized only in the past few years as a result of the Human Genome Project.

For example, as well as the fine-grain variation represented by SNPs, the team localized genes to a number of larger 'chunks' of DNA that differed between individuals. These chunks are as large as 1 million bases. Some of the regions have been previously implicated in how we vary in our interaction with the environment around us. For example, variations in the region around the GSTM1 gene can alter our susceptibility to cancer-causing chemicals or toxins and influence the toxicity or efficacy of certain drugs.

Chromosome 1 is particularly susceptible to rearrangement and it is thought that disruption to genes within these rearrangements play a role in several cancers and in mental retardation. The high-quality sequence has already helped researchers around the world to home in on genes that affect a range of cancers.

Rearrangements, deletions and duplications can tell us about our evolution and our diseases. More than 5% of the chromosome is duplicated and can provide material for the evolution of new functions. In one example, the partial duplication of a gene called NOTCH2 has resulted in a novel protein that is known to be functional in humans and has been implicated in disease. Meanwhile, deletion of regions of chromosome 1p is found in 1/5000 to 1/10,000 live births and may contribute to mental retardation syndromes.

"The Human Genome Project has provided us with a wealth of information about our genes and their many variations," said Dr Mark Walport, Director of the Wellcome Trust. "It is a vital resource for answering important questions about health and disease. We have been a committed partner in the project since 1992 both in supporting the research and ensuring the results are freely accessible to all".

"The completion of the project, with the publication of the Chromosome 1 sequence, is a monumental achievement that will benefit the research community for years to come and is a credit to all involved.".

The human genome is essential in understanding disease and the sequence of chromosome 1, together with the sequences produced and analysed throughout the Human Genome Project, will continue to be a foundation to help improve human health.

When seeking funding from the Wellcome Trust for their efforts to sequence the human genome in 1995, the Sanger Institute management wrote: "Sequencing is not an end in itself: it is not the solution of the genome, but merely the baseline information that allows the real aim - the biology - to proceed faster". The chromosome 1 project stands as a reflection of that view. Genome sequence powers research to help us understand the biology of our genome and the medical consequences of sequence variation.

Notes to Editors

Chromosome 1

The finished sequence comprises 223.6 million base-pairs (Mbp), determined to an accuracy of >99.99%. The sequence of chromosome 1 published today includes 99.4% of the gene coding (euchromatin) regions of the chromosome amenable to sequencing with current technologies. Gaps within the sequence (most are due to repetitive sequence) comprise about 1.3 Mbp. The total size of chromosome 1 is estimated to be 237.6 Mbp, which includes the centromere and a large non-coding region (heterochromatin) in the centre of the chromosome.

Sequencing was carried out at the Wellcome Trust Sanger Institute and the University of Washington Genome Center contributed 13% of the sequence finishing. Analysis of the chromosome content was carried out by Wellcome Trust Sanger Institute.

The Human Genome Project

Throughout the Human Genome Project, sequence data have been released freely to speed biological and biomedical research. For each of our 24 human chromosomes, a peer-reviewed report has been published: the publications describe the attributes of the finished sequence and analysis of the gene content, variation in sequence and other features. The sequence of chromosome 1 is the final report in this series.

Publication details

  • The DNA sequence and biological annotation of human chromosome 1.

    Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CC, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RI, Aubin K, Babbage AK, Bagguley CL, Bailey J, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Buckley D, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dunn M, Earthrowl M, Ellington AG, Errington H, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MR, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ES, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NS, McLaren S, Milne S, Mistry S, Moore MJ, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall M, Wallis JM, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR, Banerjee R, Bryant SP, Burford DC, Burrill WD, Clegg SM, Dhami P, Dovey O, Faulkner LM, Gribble SM, Langford CF, Pandian RD, Porter KM and Prigmore E

    Nature 2006;441;7091;315-21

Participating Centres

  • Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK
  • Division of Medical Genetics, Department of Medicine, University of Washington Genome Center, Seattle, WA, USA
  • Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
  • King's College London, Dept of Medical and Molecular Genetics, Guy's Tower, London SE1 9RT, UK
  • HUGO Gene Nomenclature Committee, The Galton Laboratory, Department of Biology, University College NW1 2HE, UK
  • The Duke University Center for Human Genetics, Durham, North Carolina NC27708, USA
  • Solexa Ltd, Chesterford Research Park, Little Chesterford, Essex CB10 1XL, UK


The Wellcome Trust Sanger Institute

The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms and more than 90 pathogen genomes. In October 2006, new funding was awarded by the Wellcome Trust to exploit the wealth of genome data now available to answer important questions about health and disease.


The Wellcome Trust and Its Founder

The Wellcome Trust is the most diverse biomedical research charity in the world, spending about £450 million every year both in the UK and internationally to support and promote research that will improve the health of humans and animals. The Trust was established under the will of Sir Henry Wellcome, and is funded from a private endowment, which is managed with long-term stability and growth in mind.


Sanger Institute Contact Information:

Don Powell Press Officer
Wellcome Trust Sanger Institute Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 7753 97
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/wdufjol8