Josch13, Pixabay

Tree of Life


The Tree of Life Programme uses DNA sequencing and cellular technologies to investigate the diversity and origins of life on Earth. Our research covers all eukaryotic organisms, which means all complex life with a nucleus in their cells – that is every animal, plant, fungi and protist on the planet.

At the core of our research is the production of high-quality reference genomes for individual species, which our teams are producing at an unprecedented scale. Alongside this reference genome production, research teams within Tree of Life are exploring questions around ecosystem function, species radiations, reproductive diversity and more.

All our data – including as much work-in-progress as possible – is published openly and available freely to researchers across the world.

Tree of Life’s projects operate in tandem with a global initiative called the Earth BioGenome Project. This is a network of affiliated biodiversity genomics projects from across the globe, all sharing the common aim to “sequence everything”. The goal is to produce reference genomes for all 1.2 million known species on Earth.

We believe this young and growing scientific field, known as Biodiversity Genomics, will transform the way we do biology.


Scientific fields expected to be impacted by the Earth BioGenome Project - biomedicines, climate adaptation, green technologies, next generation genomic science, sustainable fisheries, species reintroduction, biofuels, pollinators, food security, conservation, crop development, soil health and evolutionary biology


Why reference genomes?

Tree of Life aims for the highest standard of genome assemblies wherever possible. This means complete genome sequences that span each chromosome of the nuclear genome (the DNA in the nucleus), plus the genetic material in all organelles (e.g. mitochondria and chloroplasts). Genes are then identified and features annotated to help users make sense of how different parts of the sequence function.

This exacting standard, which we expect to stand the tests of future science, has only recently been achievable using the latest long-read and long-range sequencing technologies.

We want the next generation of scientists to operate in a genome-ready world. Currently, a typical PhD researcher might spend up to a year sequencing the genome for their chosen species. This is a quarter to a third of their project which would be better spent probing the questions they set out to study – if only the relevant reference genome were available.

There is much ground to cover, with sequence data available for only 0.8% of Earth’s species. The chart below shows how this translates up the taxonomic hierarchy.


Levels of genome sequence data available by taxonomy level as of August 2023: Phylum 80.6%, Class 66.1%, Order 46.9%, Family 21.5%, Genus 4.8% and Species 0.8%
Figure via Genomes on a Tree (GoaT)


Once a reference genome is available, a whole toolkit of other exciting methods and techniques are unlocked. For example, scientists can use resequencing to look at the DNA of other individuals within the same species and compare back to the reference genome. In this way, it becomes quicker, cheaper and more efficient to study organisms’ biology and evolution, support conservation efforts, or search for new biomedicines and other compounds.


How do we produce our genomes?

Tree of Life has assembled a world-leading genome production pipeline. This ensures samples are collected ethically and legally, and transported in such a way that the required High Molecular Weight DNA can be extracted.

Once extracted, the genetic material is then sequenced using the latest long-read technologies. These large segments of DNA are then assembled and curated by teams of bioinformaticians. Our scientists have developed several powerful automated tools to help do this accurately at scale, but some key stages of the process are still done manually by eye!

Finally, the genome assemblies are submitted to the publicly accessible European Nucleotide Archive (ENA) database. The assembly is also annotated by our partners at EMBL-EBI, and a Genome Note is published to announce the new assembly and how we did it.


The Tree of Life programme's Genome Engine - The pipeline from sampling to gene identification - 1. Sample collected in the field, 2. Sample onboarded at Sanger, 3. DNA and RNA extraction, 4. Genome and transcriptome sequencing, 5. Genome assembly, 6. Genome curation, 7. Submission to public database (European Nucleotide Archive - ENA), 8. Genome Note publication on Wellcome Open Research, 9. Gene finding and feature annotation on the Ensembl database


Tree of Life’s projects

The Tree of Life Programme is a partner in many research projects (see below under ‘Collaborations’). However, the bulk of our teams’ time is focused on a handful of these transformative initiatives.


Darwin Tree of Life

The aim of the Darwin Tree of Life (DToL) project is to produce reference genomes for each of the estimated 70,000 eukaryotic species in Britain and Ireland.

DToL is a partnership between biodiversity, genomics and analytics partners: Sanger, the Earlham Institute, EMBL-EBI, the Marine Biological Association, the Natural History Museum in London, the Royal Botanic Gardens at Edinburgh and Kew, and the universities of Cambridge, Edinburgh and Oxford.

To find out more, visit the Darwin Tree of Life website. Or follow DToL on Twitter @darwintreelife.


Aquatic Symbiosis Genomics

The Aquatic Symbiosis Genomics (ASG) project is sequencing the genomes of symbiotic systems. The project seeks to provide the genomic foundations needed by scientists to answer key questions about the ecology and evolution of symbiosis in marine and freshwater species, where at least one partner is a microbe.

ASG is jointly funded by the Wellcome Sanger Institute and the Gordon and Betty Moore Foundation, with ten global partners acting as hubs for different groups of symbiotic organisms.

To find out more, visit the Aquatic Symbiosis Genomics website.



Tree of Life’s BIOSCAN project aims to study the genetic diversity of 1,000,000 flying insects from across the UK. Insects from 100 sites will be collected on a monthly basis for five years by project partners and then analysed at Sanger using DNA barcoding. The resulting sequence data will provide a baseline characterisation of insect species diversity over space and time and thus form a much needed resource for DNA-based biomonitoring in the UK.

To find out more, visit the BIOSCAN webpage.


Latest news from Tree of Life

As well as the project websites linked above, you can follow Tree of Life on social media.

Twitter @sangertol

LinkedIn at Sanger Tree of Life Programme

YouTube at Tree of Life


Related groups

Associated research