The Ensembl project aims to automatically annotate genome sequences, integrate these data with other biological information and to make the results freely available to geneticists, molecular biologists, bioinformaticians and the wider research community. Ensembl is jointly headed by Dr Stephen Searle at the Wellcome Trust Sanger Institute and Dr Paul Flicek at the European Bioinformatics Institute (EBI).



Ensembl was established in 1999, towards the end of the Human Genome Project, in response to a recognition that understanding the genetic code of organisms is as important as reading it. However, purely manual curation of all genome sequences is an unthinkable task, given the labour-intensive and time-consuming nature of such work. To overcome this problem, the Ensembl project team developed new software pipelines to automatically generate evidence-based annotation of genome sequences.

Since its inception, the Ensembl project has expanded from the curation of the human genome to embrace more than 50 vertebrate species. These include many model organisms central to the study of human diseases. Ensembl has participated in many genome consortia, producing annotation used in the initial genomic analyses of newly sequenced organisms.

The project provides an expanding wealth of information for a diverse list of species, including:

  • intron and exon structure for protein-coding and non-coding genes
  • genomic variations and somatic mutations and their consequences on genes and genotypes in populations and individuals
  • cross-species gene trees and genomic alignments
  • functional genomic data - including regulatory region annotation.

Ensembl website

Generating the annotation is just the start. To provide the data in the most useful format for researchers, Ensembl provides several means of access, the foremost of which is the Ensembl website. This is a highly customisable, interactive site, providing a track-based genome browser location view, and many additional displays to supply highly integrated views of genomic annotation.

Rapid and open data access

Free and unrestricted access to the information held in Ensembl is one of the primary principles of the project, which was founded with a vision to promote rapid research into all areas of human disease.

Ensembl code is all open source.

Selected Publications

  • Ensembl 2011.

    Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GR, Ruffier M, Schuster M, Sobral D, Spudich G, Tang YA, Trevanion S, Vandrovcova J, Vilella AJ, White S, Wilder SP, Zadissa A, Zamora J, Aken BL, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, Vogel J and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. flicek@ebi.ac.uk

    The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust: 062023, 077198

    Nucleic acids research 2011;39;Database issue;D800-6

  • The Ensembl genome database project.

    Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I and Clamp M

    The Wellcome Trust Sanger Institute and European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

    Nucleic acids research 2002;30;1;38-41

BMC Ensembl Thematic Series 2010 (series of six papers about Ensembl)

* quick link - http://q.sanger.ac.uk/ensembl