Tree of Life Informatics Infrastructure

Tree of Life Programme

The Informatics Infrastructure team provides support for the production of reference genome assemblies and large-scale genome analyses in the Tree of Life programme, and helps with the management and use of IT resources.

The Tree of Life projects will generate tens of thousands of high-quality genomes over the coming years – more than have ever been sequenced! It is a challenging and extremely exciting task that will shape the future of biology, and the team’s role is to provide the platform for assembling and analysing those genomes at an unprecedented scale. We are the interface between the Tree of Life teams (assembly production and faculty research) and Sanger’s IT teams, working together with the informatics teams of the other programmes.

The team is organised in three poles.

Data management

Our data curators and managers maintain the integrity, consistency, and quality, or multiple databases used in production, including Genomes on a Tree (GoaT), Sample Tracking System (STS), Collaborative Open Plant Omics (COPO), and BioSamples.


Our bioinformaticians develop the suite of analysis pipelines that will run on every genome produced in Tree of Life, providing a central database of core results available for all.


We develop and maintain some core systems used in production, including the execution and tracking of all bioinformatics pipelines, and the deployment of third-party web applications for internal use.

The team uses a wide range of technologies, frameworks and programming languages, including Nextflow, Python, Conda, Jira, LSF, Singularity, and Kubernetes. The technology wheel below shows most of their logos. How many can you recognise ? Let us know on the Sanger Tree of Life Twitter account.

Core team

Photo of Mr Paul Davis

Mr Paul Davis

Data Manager

Photo of Dr Cibele Sotero-Caio

Dr Cibele Sotero-Caio

Genomic Data Curator - Tree of Life Genomics