This page is maintained as a historical record and is no longer being updated.
The Computational Genomics programme ran until October 2016. The computational genomics faculty, teams and research projects have been transferred into the Cellular Genetics and Human Genetics programmes. This page is being retained as a historical record and is not being updated.
Computational methods and resources for studying genetic variation:
Since its inception, the Sanger Institute has been a leader in the development of software, methods and resources for the analysis of large-scale DNA sequence data. Many of the techniques that we developed in this area underpin research in other programmes in the Institute as well as elsewhere in the world. Research within Computational Genomics developed and drove forward established programmes for algorithms, software and data resources for using DNA sequence data to study genetic variation, in conjunction with the Global Alliance for Genomics and Health (GA4GH); for the development of reference genome sequences for humans and mouse as part of the Genome Reference Consortium; and for the development of the DECIPHER platform for exchange of clinical rare variant data. Alongside these, we conducted research activity in the development of novel population genetic analysis methods based on whole genome sequences, and their application to large genomic data sets.
Computational analysis of genome regulation:
The central goals in genomics are to understand how genome functions are affected by genetic variation. To achieve this goal, the Sanger Institute strives to develop novel computational and statistical approaches, focusing in particular on non-coding and regulatory sequence. We developed new methods and tools for genomic data analysis for providing new knowledge about genome function: the identification of sequence and chromatin features involved in enhancer activity, the identification of variants and cell types involved in complex traits, and improved understanding of biological variation and the transcriptional response in single cells.
The Sanger Institute is a global leader in the technology of collecting and processing this data, and the science of understanding and using it. A core requirement to achieve this is computational, to identify the significant information in each data set, finding the genetic variation present in a sample or quantifying measurements, and to relate that to existing knowledge. The primary tools for analysing sequence data are algorithmic methods for sequence alignment based on string matching, and data representation including compression to manage previous data and knowledge. The underlying disciplines are computer science, statistics and genetics. This is very much the domain of Big Data, and it was no surprise that companies such as Google, Amazon and Microsoft are participating alongside science institutions such as the Sanger Institute, the Broad Institute, EBI, NCBI and UC Santa Cruz in the new Global Alliance for Genomics and Health (GA4GH) which supports genomic data exchange to further health and research.
Using outbred genetic variation to understand basic biology
DNA sequence remains at the heart of molecular biology and bioinformatics. The Birney Associate Faculty Research Group at the Sanger ...
Core Software Services
Informatics and Digital Solutions (Web, Web security and Core Bioinformatics)
Core Software Services comprises: Core Web Team; Core Bioinformatics (CoreBio) and; Core Web security.
Population and evolutionary genomics, novel computational genomics methods, and related mathematical and statistical models.
Genome Reference Informatics Team
Tree of Life Programme
The Genome Reference Informatics Team analyses genome assemblies to reveal and correct quality issues and to identify and add variation. It ...
Non-coding RNA and epigenetics
We are interested in all aspects of gene regulation by non-coding RNA. Current research themes include: miRNA biology and pathology, miRNA ...
Understanding human DNA function by engineering
Our goal is to mechanistically understand impact of mutations in human DNA. To do so, we engineer DNA variation in cells, ...
Classification of proteins and RNAs
The Classification of proteins and RNAs group moved to EMBL-EBI (European Molecular Biology Institute-European Bioinformatics Institute) in November 2012. The ...
Quantitative models of gene expression
The Hemberg group is interested in developing quantitative models of gene expression. Our approach is theoretical and we strive to develop ...
Vertebrate Genome Analysis
The activities of the Vertebrate genome analysis team revolved around generating and presenting core vertebrate genome annotation, particularly in the form ...
Population genomics of adaptation
High-throughput sequencing opened up a new chapter in the study of molecular evolution and genetics, allowing us to study in ...
Sequence Variation Infrastructure
We developed algorithms and technologies that enable researchers to discover and share genetic variation using next-generation sequencing technologies. We were ...
Genome Reference Consortium
The GRC aims to ensure that the human, mouse and zebrafish reference assemblies are biologically relevant by closing gaps, fixing ...