Google Earth of Biomedical Research

An integrated encyclopaedia of DNA elements in the human genome

Email newsletter

News and blog updates

Sign up
ENCyclopaedia Of DNA Elements – ENCODE

The ENCODE Project, today, announces that most of what was previously considered as ‘junk DNA’ in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.

The collaborative project mapped more than 4 million regulatory regions, or genetic switches, where proteins specifically interact with the DNA; these findings represent a significant advance in understanding the precise and complex controls over how genes work within a cell type. This information will greatly enhance our understanding of both common and rare diseases that have a genetic component such as cancers.

The Human Genome Project produced an almost complete list of the 3 billion pairs of chemical letters in the DNA that embodies the genetic code – but nothing about the way this blueprint works. ENCODE wanted to take this vast amount of data and ascribe function to the entire human genome. Now, after five years of concerted effort by more than 440 researchers in 32 labs around the world, working collaboratively in the ENCODE Project, the first holistic view of how the human genome actually does its job has emerged.

“If the Human Genome Project is like an ordnance survey map, then the ENCODE project is like Google Earth. The Human Genome Project gave us a broad overview of our genome, but the ENCODE maps allow researchers to inspect the chromosomes, genes, functional elements and individual nucleotides in the human genome in much the same way as Google Earth magnifies what we see on a map.”

Dr Jennifer Harrow Principal Investigator from the Wellcome Trust Sanger Institute

By integrating information from genome-wide studies with several datasets from the ENCODE project, it is now possible for researchers to predict variations that may be central to diseases and predict the cell types in which the affected genes might be active. This approach generated functional, biological information for DNA sequences for up to 80 per cent of all previously reported associations.

With this information researchers can see all single DNA letter variations, what state they are in, what’s happening around these variations such as which binding sites are involved and which cell types they’re active in. This type of information potentially can provide functional predictions as to the genetics behind disease, making it an extremely powerful interpretation tool.

“We’ve come a long way and we have learned an incredible amount by integrating the different types of data that ENCODE produced, which was done at a scale never before achieved in biology. This data integration was one of the keys to the success of the project.”

Dr Ewan Birney of the European Bioinformatics Institute and lead analysis coordinator of the ENCODE data

The coordinated publication set includes one main integrative paper and five other papers in the journal Nature; 18 papers in Genome Research; and six papers in Genome Biology. The ENCODE data are so complex that the three journals have developed a pioneering way to present the information in an integrated form that they call ‘threads.’

Since the same topics were addressed in different ways in different papers The new website,, will allow anyone to follow a topic through all of the papers in the ENCODE publication set in which it appears, by clicking on the relevant ‘thread’ at the Nature ENCODE explorer page. For example, thread number one compiles figures, tables, and text relevant to genetic variation and disease from several papers and displays them all on one page. ENCODE scientists believe this will illuminate many biological themes emerging from the analyses.

“The ENCODE project is providing an encyclopaedia to understand how the sequence of the human genome forms the words that tell our bodies how to work at the cellular and molecular level. This will serve as a critical reference for interpreting the relationship between genome variation and disease and in the development of stem cell based therapies. By developing more revolutionary technologies for probing genome function, we expect to accelerate these efforts.”

Dr Tim Hubbard, lead principal investigator from the Wellcome Trust Sanger Institute

More information

Publication details

Details of publications, funding and participating centres can be found on

Selected websites

  • The Wellcome Trust Sanger Institute

    The Wellcome Trust Sanger Institute is one of the world’s leading genome centres. Through its ability to conduct research at scale, it is able to engage in bold and long-term exploratory projects that are designed to influence and empower medical science globally. Institute research findings, generated through its own research programmes and through its leading role in international consortia, are being used to develop new diagnostics and treatments for human disease.

  • The Wellcome Trust

    The Wellcome Trust is a global charitable foundation dedicated to achieving extraordinary improvements in human and animal health. We support the brightest minds in biomedical research and the medical humanities. Our breadth of support includes public engagement, education and the application of research to improve health. We are independent of both political and commercial interests.