Please note: This page is no longer being updated and was last edited in 2013.

Genome sequences provide a natural index for organising and understanding biological data.

Following the sequencing of the human and other vertebrate genomes, vertebrate genome browsers such as Ensembl have become critical resources, providing biologists with integrated access to the sequence and its associated annotation. The activities of the Vertebrate genome analysis team revolve around generating and presenting core vertebrate genome annotation, particularly in the form of reference genesets, and in maintaining the reference genome sequences of human, mouse and zebrafish. As well as contributing to resources used globally, the team is involved in a wide variety of collaborations related to genome annotation and the development of improved methods for analysis and annotation resulting in many publications. Tim Hubbard was the principal investigator of the team until he left the Sanger Institute in 2013 to become Professor of Bioinformatics, Head of Department of Medical and Molecular Genetics at King's College London and overall Director of Bioinformatics for King's Health Partners/King's College London.


The team includes the Wellcome Trust Sanger Institute part of the Ensembl project (led by Steve Searle) and the Havana annotation group (lead by Jen Harrow). Ensembl is a joint project with the European Bioinformatics Institute (EBI). Steve Searle’s EBI counterpart is Paul Flicek who heads the EBI Vertebrate Genomics Team. Sanger Institute Ensembl consists of the genebuild group (led by Steve Searle) that generates genesets using an automatic pipeline and the web team (led by Anne Parker) that develops and maintains the Ensembl website.


A major combined activity of Havana and the Ensembl genebuild group is to generate complete, high-accuracy genesets for the high-quality reference genomes of human and mouse. Ensembl generates complete genesets using its automatic pipeline for most of the 40+ genomes that it contains. Human and mouse are exceptions where the genesets are referred to as ‘Ensembl-Havana’ since they combine curated gene structures from Havana with annotation from the Ensembl automatic pipeline. So far only about 50 per cent of human and 30 per cent of mouse genome have been manually curated. Ultimately the whole of these genesets will be curated and for human this is the objective of the GENCODE project, which is a scale up programme of the NHGRI funded ENCODE project, which brings together HAVANA, Ensembl and seven external groups to generate the reference geneset for the human genome. The Havana-Ensembl geneset incorporates the subset of human and mouse CDS (protein coding) regions that have been curated and agreed by the CCDS consortium, which includes curators at Havana and NCBI (Refseq) with computational annotation and assessment from the Ensembl genebuild group and UCSC.

The gene curation carried out by Havana is supported by specialist analysis pipelines and annotation tools provided by the Anacode group (led by James Gilbert). Anacode also develops and maintains many of the software systems that support curation of reference genome sequences and WTSI sequence submission to the EMBL sequence database (the EBI partner of the INSDC database consortium). A key component of the otterlace curation interface, which can be used by annotators anywhere in the world, is the ZMAP genome display engine developed by the Acedb group (led by Ed Griffiths). The group continues to support the Acedb database package, used by the model organism databases wormbase. The Havana group is involved in the annotation genes as candidates for knockout in mouse for the Embryonic Stem (ES) Cell Mutagenesis team of Bill Skarnes as part of the EUCOMM and KOMP projects. Otterlace is also used remotely by KOMP annotators at the Genome Center at Washington University.

The genome of the Zebrafish (a key model organism) is being sequenced to reference quality by WTSI. The team includes the Zebrafish analysis group (led by Kerstin Howe) which is responsible for preparing genome assemblies and integrating functional data such as from the EU ZF-models project and the Sanger Institute zebrafish mutagenesis project. Kerstin also leads the informatics group of the Sanger Institute’s component of the Genome reference consortium (GRC) which is responsible for maintaining the reference genome sequences of human and mouse.




