Computational Genomics

Archive Page

This page is maintained as a historical record and is no longer being updated.

The Computational Genomics programme ran until October 2016. The computational genomics faculty, teams and research projects have been transferred into the Cellular Genetics and Human Genetics programmes.​ This page is being retained as a historical record and is not being updated.


In the Computational Genomics programme, novel computational methods were developed, both for managing and analyzing large datasets. We were interested in population genetics approaches for characterizing the variations in human genomes as well as computational methods for understanding the functional consequences of this variation.

Computational methods and resources for studying genetic variation:

Since its inception, the Sanger Institute has been a leader in the development of software, methods and resources for the analysis of large-scale DNA sequence data. Many of the techniques that we developed in this area underpin research in other programmes in the Institute as well as elsewhere in the world. Research within Computational Genomics developed and drove forward established programmes for algorithms, software and data resources for using DNA sequence data to study genetic variation, in conjunction with the Global Alliance for Genomics and Health (GA4GH); for the development of reference genome sequences for humans and mouse as part of the Genome Reference Consortium; and for the development of the DECIPHER platform for exchange of clinical rare variant data. Alongside these, we conducted research activity in the development of novel population genetic analysis methods based on whole genome sequences, and their application to large genomic data sets.

Computational analysis of genome regulation:

The central goals in genomics are to understand how genome functions are affected by genetic variation. To achieve this goal, the Sanger Institute strives to develop novel computational and statistical approaches, focusing in particular on non-coding and regulatory sequence. We developed new methods and tools for genomic data analysis for providing new knowledge about genome function: the identification of sequence and chromatin features involved in enhancer activity, the identification of variants and cell types involved in complex traits, and improved understanding of biological variation and the transcriptional response in single cells.

The Sanger Institute is a global leader in the technology of collecting and processing this data, and the science of understanding and using it. A core requirement to achieve this is computational, to identify the significant information in each data set, finding the genetic variation present in a sample or quantifying measurements, and to relate that to existing knowledge. The primary tools for analysing sequence data are algorithmic methods for sequence alignment based on string matching, and data representation including compression to manage previous data and knowledge. The underlying disciplines are computer science, statistics and genetics. This is very much the domain of Big Data, and it was no surprise that companies such as Google, Amazon and Microsoft are participating alongside science institutions such as the Sanger Institute, the Broad Institute, EBI, NCBI and UC Santa Cruz in the new Global Alliance for Genomics and Health (GA4GH) which supports genomic data exchange to further health and research.

Related groups