13th June 2007

The Wider View from a Detailed Focus

New Project challenges conventional view of genome biology

ENCODE

One of the key lessons that ENCODE has taught us is that not all functional genomic elements are evolutionarily constrained and that many evolutionarily constrained elements perform functions that are currently unknown

A major study of the organization and regulation of the human genome published today changes our concept of how our genome works. The integrated study is an exhaustive analysis of 1% of the genome that, for the first time, gives an extensive view of genetic activity alongside the cellular machinery that allows DNA to be read and replicated.

The lead report from the ENCyclopedia Of DNA Elements (ENCODE) Consortium, published in Nature, together with 28 companion papers published in Genome Research, defines in detail which regions of the genome are actively copied in the cell, reveals the location and evolution of elements that control gene activity, and defines the relationship between DNA-associated proteins and gene activity and DNA replication.

The complex tapestry of interwoven elements revealed today suggests that "our perspective of transcription and genes may have to evolve," the researchers state, noting that their research "poses some interesting mechanistic questions" that have yet to be answered.

Our understanding of genome biology from the Human Genome Project gave us an overview of a 3-billion-base genome, peppered with some 22,000 discrete genes and the sequences that regulate their activity. These were estimated to occupy perhaps 3-5% of the genome, though this number is expected to be an underestimate.

" A major surprise was that many of the novel control regions are not shared with other species, but restricted to our human genome "

Dr Manolis Dermitzakis

"The new view transforms our view of the genomic fabric," explained Dr Tim Hubbard, from the Wellcome Trust Sanger Institute, "The majority of the genome is copied, or transcribed, into RNA, which is the active molecule in our cells, relaying information from the archival DNA copy to the cellular machinery. This is a remarkable finding, since most prior research suggested only a fraction of the genome was transcribed."

"But it is our new understanding of regulation of genes that stands out. The integrated approach has helped us to identify new regions of gene regulation and altered our view of how gene regulation occurs."

ENCODE

ENCODE publication

From the earliest studies of gene activity in bacteria, a picture emerged that suggested control regions were most often located at or near sites from which gene transcription started. The new work identifies many previously unknown control regions and shows that control regions are as likely to be beyond the end of the gene.

"Alterations in control regions are increasingly thought to be of significance for human disease," Dr Dermitzakis from the Wellcome Trust Sanger Institute and one of the corresponding authors on the paper explained: "For the first time we can see DNA sequence variation in the context of the biochemical workings of the cell. We can now begin to unravel the consequences that such variations hold for individuals and their susceptibility to disease."

The team showed that transcription of DNA is pervasive across the genome, and that RNA transcripts overlap known genes and are found in what were previously thought to be gene 'deserts'.

"A major surprise was that many of the novel control regions are not shared with other species, but restricted to our human genome," continued Dr Dermitzakis. "We appear to have a reservoir of active elements that seem to provide no specific or direct benefit."

"Our suggestion is that these elements can provide a source for new variation between species and within the human genome. This is our genomic seedcorn for the future."

The scale of the collaboration brings new understanding of the interaction between our genome and the proteins that control gene activity and DNA replication. The results show that proteins called histones that bind DNA to package it within the cell nucleus are modified to promote or inhibit gene activity and can be used to predict better the location of novel genes.

"Specific types of modifications of the histone proteins near gene starts are a strong predictor of gene activity," explained Dr Ian Dunham, from the Wellcome Trust Sanger Institute, "whereas further histone modifications at sites away from genes appear to be a signature of regulatory elements that can enhance transcription." A detailed analysis of these effects is also published by the Sanger Institute group in one of the companion papers in Genome Research.

"It is only from a study such as ENCODE that we can obtain such a valuable detailed view of our genome. This project has been a magnificent collaboration amongst some of the world's premier genome scientists, and has revealed many new insights. There is every expectation that a great deal more will be revealed as the project scales to the whole genome."

Although much that is new has been discovered, much yet remains to be understood. Similarity of DNA sequence between species is often a sign of the value of that sequence, yet a function has not been found for many DNA sequences that are conserved. The role of the massive new numbers of RNA transcripts is unknown. And the function of the large number of control elements is yet to be elucidated.

The ENCODE consortium is organized by the National Human Genome Research Institute (NHGRI), whose Director, Francis S Collins, MD PhD, said: "This impressive effort has uncovered many exciting surprises and blazed the way for future efforts to explore the functional landscape of the entire human genome."

"Because of the hard work and keen insights of the ENCODE consortium, the scientific community will need to rethink some long-held views about what genes are and what they do, as well as how the genome's functional elements have evolved. This could have significant implications for efforts to identify the DNA sequences involved in many human diseases."

Notes to Editors

Outline of the Project and Key Findings

The collaborative study involved 80 centres and focused on 44 targets, which together cover about 1% of the human genome sequence, or about 30 million DNA base pairs. The targets were strategically selected to provide a representative cross-section of the entire human genome. All told, the ENCODE consortium generated more than 200 datasets and analysed more than 600 million data points.

The highlights of the study include:

  • Unexpectedly, most of the human genome is shown to be transcribed and many RNA transcripts link new regions to established protein-coding genes;
  • Many novel non-protein-coding transcripts have been identified, many of which overlap protein-coding genes: others are in regions of the genome previously thought to be transcriptionally silent;
  • Many novel transcription start sites have been identified, which commonly show chromatin structure and protein-binding properties similar to those of known promoters;
  • Surprisingly, regulatory sequences are symmetrically distributed around transcription start sites, in contrast to an expected bias towards upstream regions;
  • Chromatin structure and histone modification predict well both the presence and activity of transcription start sites;
  • DNA-replication timing is correlated with chromatin structure;
  • 5% of the bases in the genome are under evolutionary constraint in mammals;
  • Remarkably, not all bases within experimentally-defined functional regions show evidence of evolutionary constraint - their sequences have diverged;
  • In contrast to expectation, sequences of many functional elements are preserved across mammalian evolution. These might represent a pool of neutral elements that are biologically active but provide no specific benefit to the organism, and might serve as a reservoir for natural selection.

Publication details

  • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

    ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, NISC Comparative Sequencing Program, Baylor College of Medicine Human Genome Sequencing Center, Washington University Genome Sequencing Center, Broad Institute, Children's Hospital Oakland Research Institute, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B and de Jong PJ

    Nature 2007;447;7146;799-816

Funding

Much of this work was supported by NHGRI ENCODE Project grants. Other funding support includes: National Institutes of Health, The European Union, Lejeune and ChildCare Foundations, Affymetrix, Inc, Swiss National Science Foundation, the Spanish Ministerio de Educación y Ciencia (RG), Spanish Ministry of Education and Science, CIBERESP, Genome Spain and Generalitat de Catalunya, Ministry of Education, Culture, Sports, Science and Technology of the Japan, the NCCR Frontiers in Genetics, the Jérôme Lejeune Foundation, the Childcare Foundation, the Novartis Foundations, the Intramural Program of the National Human Genome Research Institute, the Danish Research Council, the Swedish Research Council, the Knut and Alice Wallenberg Foundation, the Wellcome Trust, the Howard Hughes Medical Institute, the Bio-X Institute, the Riken Institute, the BBSRC and The European Molecular Biology Laboratory.

Websites

Participating Centres

A full list of the 80 centres and authors participating in the paper in Nature can be found in the publication (see details above).

The National Institutes of Health

The National Institutes of Health - "The Nation's Medical Research Agency" includes 27 institutes and centres, and is a component of the US Department of Health and Human Services. It is the primary federal agency for conducting and supporting basic, clinical and translational medical research, and it investigates the causes, treatments, and cures for both common and rare diseases. For more, visit http://www.nih.gov.




The Wellcome Trust Sanger Institute

The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms and more than 90 pathogen genomes. In October 2006, new funding was awarded by the Wellcome Trust to exploit the wealth of genome data now available to answer important questions about health and disease.

Websites

The Wellcome Trust

The Wellcome Trust is the largest charity in the UK. It funds innovative biomedical research, in the UK and internationally, spending around £500 million each year to support the brightest scientists with the best ideas. The Wellcome Trust supports public debate about biomedical research and its impact on health and wellbeing.

Websites

Sanger Institute Contact Information:

Don Powell Press Officer
Wellcome Trust Sanger Institute Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 496 928
Mobile +44 (0)7753 7753 97
Fax +44 (0)1223 494 919
Email press.office@sanger.ac.uk

* quick link - http://q.sanger.ac.uk/4qchcx1f