Background
The team includes the Wellcome Trust Sanger Institute part of the Ensembl project (led by Steve Searle) and the Havana annotation group (lead by Jen Harrow). Ensembl is a joint project with the European Bioinformatics Institute (EBI). Steve Searle's EBI counterpart is Paul Flicek who heads the EBI Vertebrate Genomics Team. Sanger Institute Ensembl consists of the genebuild group (led by Steve Searle) that generates genesets using an automatic pipeline and the web team (led by James Smith) that develops and maintains the Ensembl website.
Research
A major combined activity of Havana and the Ensembl genebuild group is to generate complete, high-accuracy genesets for the high-quality reference genomes of human and mouse. Ensembl generates complete genesets using its automatic pipeline for most of the 40+ genomes that it contains. Human and mouse are exceptions where the genesets are referred to as 'Ensembl-Havana' since they combine curated gene structures from Havana with annotation from the Ensembl automatic pipeline. So far only about 50 per cent of human and 30 per cent of mouse genome have been manually curated. Ultimately the whole of these genesets will be curated and for human this is the objective of the GENCODE project, which is a scale up programme of the NHGRI funded ENCODE project, which brings together HAVANA, Ensembl and seven external groups to generate the reference geneset for the human genome. The Havana-Ensembl geneset incorporates the subset of human and mouse CDS (protein coding) regions that have been curated and agreed by the CCDS consortium, which includes curators at Havana and NCBI (Refseq) with computational annotation and assessment from the Ensembl genebuild group and UCSC.
The gene curation carried out by Havana is supported by specialist analysis pipelines and annotation tools provided by the Anacode group (led by James Gilbert). Anacode also develops and maintains many of the software systems that support curation of reference genome sequences and WTSI sequence submission to the EMBL sequence database (the EBI partner of the INSDC database consortium). A key component of the otterlace curation interface, which can be used by annotators anywhere in the world, is the ZMAP genome display engine developed by the Acedb group (led by Ed Griffiths). The group continues to support the Acedb database package, used by the model organism databases wormbase. The Havana group is involved in the annotation genes as candidates for knockout in mouse for the Embryonic Stem (ES) Cell Mutagenesis team of Bill Skarnes as part of the EUCOMM and KOMP projects. Otterlace is also used remotely by KOMP annotators at the Genome Center at Washington University.
The genome of the Zebrafish (a key model organism) is being sequenced to reference quality by WTSI. The team includes the Zebrafish analysis group (led by Kerstin Howe) which is responsible for preparing genome assemblies and integrating functional data such as from the EU ZF-models project and the Sanger Institute zebrafish mutagenesis project. Kerstin also leads the informatics group of the Sanger Institute's component of the Genome reference consortium (GRC) which is responsible for maintaining the reference genome sequences of human and mouse.
Selected Publications
-
Ensembl 2009.
Nucleic acids research 2009;37;Database issue;D690-7
PUBMED: 19033362; DOI: 10.1093/nar/gkn828; PMC: 2686571
-
Petabyte-scale innovations at the European Nucleotide Archive.
Nucleic acids research 2009;37;Database issue;D19-25
PUBMED: 18978013; DOI: 10.1093/nar/gkn765; PMC: 2686451
-
The Protein Feature Ontology: a tool for the unification of protein feature annotations.
Bioinformatics (Oxford, England) 2008;24;23;2767-72
PUBMED: 18936051; DOI: 10.1093/bioinformatics/btn528
-
BioJava: an open-source framework for bioinformatics.
Bioinformatics (Oxford, England) 2008;24;18;2096-7
PUBMED: 18689808; DOI: 10.1093/bioinformatics/btn397; PMC: 2530884
-
Integrating biological data - the Distributed Annotation System.
BMC bioinformatics 2008;9 Suppl 8;S3
PUBMED: 18673527; DOI: 10.1186/1471-2105-9-S8-S3; PMC: 2500094




Dr Tim Hubbard