Computational Genomics

Archived

Computational Genomics

Archive Page

This page is maintained as a historical record and is no longer being updated.

The Computational Genomics programme ran until October 2016. The computational genomics faculty, teams and research projects have been transferred into the Cellular Genetics and Human Genetics programmes. This page is being retained as a historical record and is not being updated.

Overview

In the Computational Genomics programme, novel computational methods were developed, both for managing and analyzing large datasets. We were interested in population genetics approaches for characterizing the variations in human genomes as well as computational methods for understanding the functional consequences of this variation.

Computational methods and resources for studying genetic variation:

Since its inception, the Sanger Institute has been a leader in the development of software, methods and resources for the analysis of large-scale DNA sequence data. Many of the techniques that we developed in this area underpin research in other programmes in the Institute as well as elsewhere in the world. Research within Computational Genomics developed and drove forward established programmes for algorithms, software and data resources for using DNA sequence data to study genetic variation, in conjunction with the Global Alliance for Genomics and Health (GA4GH); for the development of reference genome sequences for humans and mouse as part of the Genome Reference Consortium; and for the development of the DECIPHER platform for exchange of clinical rare variant data. Alongside these, we conducted research activity in the development of novel population genetic analysis methods based on whole genome sequences, and their application to large genomic data sets.

Computational analysis of genome regulation:

The central goals in genomics are to understand how genome functions are affected by genetic variation. To achieve this goal, the Sanger Institute strives to develop novel computational and statistical approaches, focusing in particular on non-coding and regulatory sequence. We developed new methods and tools for genomic data analysis for providing new knowledge about genome function: the identification of sequence and chromatin features involved in enhancer activity, the identification of variants and cell types involved in complex traits, and improved understanding of biological variation and the transcriptional response in single cells.

The Sanger Institute is a global leader in the technology of collecting and processing this data, and the science of understanding and using it. A core requirement to achieve this is computational, to identify the significant information in each data set, finding the genetic variation present in a sample or quantifying measurements, and to relate that to existing knowledge. The primary tools for analysing sequence data are algorithmic methods for sequence alignment based on string matching, and data representation including compression to manage previous data and knowledge. The underlying disciplines are computer science, statistics and genetics. This is very much the domain of Big Data, and it was no surprise that companies such as Google, Amazon and Microsoft are participating alongside science institutions such as the Sanger Institute, the Broad Institute, EBI, NCBI and UC Santa Cruz in the new Global Alliance for Genomics and Health (GA4GH) which supports genomic data exchange to further health and research.

Related groups

Science group

Birney Group

Using outbred genetic variation to understand basic biology

DNA sequence remains at the heart of molecular biology and bioinformatics. The Birney Associate Faculty Research Group at the Sanger ...

Science group

Core Software Services

Informatics and Digital Solutions (Web, Web security and Core Bioinformatics)

Core Software Services comprises: Core Web Team; Core Bioinformatics (CoreBio) and; Core Web security.

Science group

Durbin Group

Computational Genomics

Population and evolutionary genomics, novel computational genomics methods, and related mathematical and statistical models.

Science group

Genome Reference Informatics Team

Tree of Life Programme

The Genome Reference Informatics Team analyses genome assemblies to reveal and correct quality issues and to identify and add variation. It ...

Science group

Parts Group

Understanding human DNA function by engineering

Our goal is to mechanistically understand impact of mutations in human DNA. To do so, we engineer DNA variation in cells, ...

Science group

Trynka Group

Immune Genomics Group

The Trynka group combines experimental and computational approaches to understand how genetic variation shapes immune cell function and contributes to human ...

Science group

Bateman Group

Classification of proteins and RNAs

The Classification of proteins and RNAs group moved to EMBL-EBI (European Molecular Biology Institute-European Bioinformatics Institute) in November 2012. The ...

Science group

Hemberg Group

Quantitative models of gene expression

The Hemberg group is interested in developing quantitative models of gene expression. Our approach is theoretical and we strive to develop ...

Science group

Hubbard Group

Vertebrate Genome Analysis

The activities of the Vertebrate genome analysis team revolved around generating and presenting core vertebrate genome annotation, particularly in the form ...

Science group

Miska Group

Non-coding RNA and epigenetics

We are interested in all aspects of gene regulation by non-coding RNA. Current research themes include: miRNA biology and pathology, miRNA ...

Science group

Mustonen Group

Population genomics of adaptation

High-throughput sequencing opened up a new chapter in the study of molecular evolution and genetics, allowing us to study in ...

Science group

Sequence Variation Infrastructure

Human Genetics

We developed algorithms and technologies that enable researchers to discover and share genetic variation using next-generation sequencing technologies. We were ...

Science group

Vertebrate Annotation

Human Genetics

This group consists of manual annotators and software developers. The HAVANA team provides the manual annotation of human, mouse, zebrafish and ...

Associated research

Collaborations

Collaboration

Genome Reference Informatics

As the impact of the human reference genome assembly on biomedical research has shown, the availability of a high quality ...

Collaboration

HipSci

Hundreds of induced pluripotent stem cell lines for cellular genetic analysis

Tools & software

Tool

Single-cell Consensus Clustering (SC3)

SC3 is a method for unsupervised clustering of single-cell RNA-seq data. In addition to a graphical user-interface, SC3 provides additional ...

Tool

DECIPHER - Mapping the Clinical Genome

DECIPHER is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. DECIPHER ...

Data

Data set

Genome Reference Consortium

The GRC aims to ensure that the human, mouse and zebrafish reference assemblies are biologically relevant by closing gaps, fixing ...

Data set

Zebrafish Genome Project

Danio rerio reference genome assemblies and assemblies of additional D. rerio strains and Danio and Danionella species.

Data set

Mouse Genomes Project

The Mouse Genomes Project is an ongoing effort to catalog all forms of genetic variation between the common laboratory mouse strains ...

Careers and Study

Policies

Archive

Leadership

Faculty