We are a team of bioinformaticians, software developers and genomic data scientists primarily responsible for the informatics and large scale sequencing projects for the Durbin and Adams groups.
We have played lead or key roles in the data processing and analysis of large scale sequencing projects such as 1000 Genomes, Mouse Genomes Project, UK10K, HipSci, and Haplotype Reference Consortium among others.
Recently, in collaboration with the Durbin and GRIT groups at the Sanger Institute, along with a number of external partners, we have joined the Vertebrate Genomes Project and Genome 10K to begin producing genome assemblies for hundreds to thousands of species, using cutting edge long-read sequencing technologies like PacBio, Oxford Nanopore and 10x alongside Illumina.
We develop tools and software to manage our data management and analysis needs at scale.
BCFtools is a set of tools for variant calling and manipulating variant data stored in VCF and BCF files. We also contribute to the development of HTSlib and SAMtools.
We develop pipelines and pipeline management systems to track and process our data. The 1000 Genomes and UK10K projects were made possible using the VRPipe and vr-runner systems. With the Sanger Institute recently moving to a cloud oriented compute infrastructure we are developing a new workflow runner (wr) system.
As part of our work with the Haplotype Reference Consortium, we have developed a free genotype imputation and phasing service, the Sanger Imputation Service.