The HAVANA team manually annotate the human, mouse, zebrafish and other vertebrate genomes.
The HAVANA team puts special emphasis on alternatively spliced transcripts and pseudogenes, two areas still underdeveloped in automated annotation systems, as well as poly-adenylation features. Also, where other systems concentrate on, or are limited to, protein-coding genes, many HAVANA transcripts are annotated without a protein-coding region. These transcripts may function as non-coding RNAs or they may be incomplete gene fragments for which the coding sequence cannot yet be determined.
All annotated gene structures (transcripts) are supported by transcriptional evidence, either from cDNA, EST or protein sequences. As such not all annotated transcripts are necessarily complete. Support does not need to come from locus-specific evidence, but can also be homologous, paralogous or orthologous.
While the transcript and protein sequences are the most important pieces of information, HAVANA annotation takes into account and uses other data, such as CpG islands, gene predictions, repeats and genome signatures. Because the annotation software used is DAS (Distributed Annotation System) aware, the HAVANA team can link to external data sources. Ensembl gene models and data from GENCODE collaborators are some of the DAS sources the HAVANA group uses. HAVANA sources are under constant review and subject change. For example, the group recently started to use data from new technologies such as RNAseq and protein mass spectrometry in its annotation efforts.
Like its data sources, HAVANA's annotation guidelines are under constant review and are routinely updated to take into account feedback from collaborators, incorporate new data sources and reflect new trends in genetics, transcriptomics, proteomics and genomics.
Details of our annotation tools can be found at Annosoft.
HAVANA Annotation Guidelines detail our annotation standards.
All of our manual annnotation is displayed in the VEGA browser.