New deep learning technique offers a more accurate approach to single-cell genomics

Scientists are using the emerging technology of deep learning to fill in the missing gaps of single-cell genomic analysis

New deep learning technique offers a more accurate approach to single-cell genomics


A new ‘deep learning’ method, DeepCpG, has been designed by researchers at the Wellcome Trust Sanger Institute, the European Bioinformatics Institute and the Babraham Institute to help scientists better understand the epigenome – the biochemical activity around the genome. Reported today (11 April) in Genome Biology, DeepCpG leverages ‘deep neural networks’, a multi-layered machine learning model inspired by the brain, and provides a valuable tool for research into health and disease. 

As a result of projects like 1000 Genomes, scientists now have a ‘book’ of the human genome divided up into chapters and annotated in parts. However, to fully understand how life works, scientists need to decipher both the genome – the set of instructions repeated in every cell – and the epigenome, the part that varies wildly between cells.

To better understand how DNA sequences relate to biological changes, the genomics community is turning to artificial neural networks – a class of machine learning methods first introduced in the 1980s and inspired by the wiring of the brain. More recently, these models have been rebranded as ‘deep neural networks’, which form the field of deep learning. 

Scientists have leveraged the capacity of deep learning to fill in the gaps in single-cell genomics, an emerging technology that offers a close-up view on epigenetics. 

A new technique, DeepCpG, has been designed to help scientists learn about the connections between DNA sequences and DNA methylation – a biochemical modification of the genome sequence that can act like an off-switch for individual genes. Methylation plays a key part in important biological processes, including cell development, ageing and cancer progression.  

The new method uses genomic and epigenomic data to make predictions about DNA methylation in single cells. This is important because current technologies provide incomplete information about this. With DeepCpG, researchers can obtain a more complete picture of DNA methylation. The model can also be used to obtain new biological insights, for example on the connection between the DNA sequence and methylation. 

“DeepCpG actually learns meaningful features in a data-driven manner. It has major advantages over previous methods, including the ability to more accurately predict DNA methylation and to study intercellular differences. By studying the wiring of the learnt network, we can understand how the biology of DNA methylation works. This has allowed us to recover known DNA sequence motifs that are important for methylation changes, as well as to discover new motifs, which are the starting point for future studies.”

Christof Angermueller, PhD candidate at EMBL-EBI

“We have demonstrated that DeepCpG enables us to accurately predict and analyse DNA methylation in single cells. However, DeepCpG is just one example of how we can apply deep learning to genomics and single-cell technologies. It is exciting to see the versatile applications deep learning has already found in genomics. I am looking forward to seeing more deep learning techniques come online. I believe it will make a big difference to how we study biology and has the potential to yield new answers about how life works.” 

Dr Oliver Stegle, Group Leader at EMBL-EBI 

“Single cell epigenomics methods provide exciting insights into cell heterogeneity in development, ageing and disease; however if you are just dealing with two genomes in a single cell, bits of information are often lost during the experiment. This new method recognises patterns of the epigenome in single cells and then reconstructs lost information, returning a data-rich single cell epigenome.” 

Professor Wolf Reik from The Babraham Institute and Associate Faculty member at the Wellcome Trust Sanger Institute

"Deep learning is now the state-of-the art in many fields. We are exploring its utility for making sense of large scale biological data. Pioneering studies, such as the one by Angermueller and colleagues, prove that there is lot to be gained by using deep learning methods in computational biology.”

Dr Leopold Parts, Group Leader at the Sanger Institute

Notes to Editors
  • DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning.

    Angermueller C, Lee HJ, Reik W and Stegle O

    Genome biology 2017;18;1;67

Notes to Editors:

In a review of deep learning for computational biology, Angermueller, Stegle and their colleagues present different applications of deep neural networks in computational biology. These range from models for understanding the impact of disease mutations to methods for localising and classifying cancer cells in microscopy images.

However, they also point out that deep learning is not the ultimate Swiss Army knife. Instead, the choice of whether to apply deep learning or conventional models depends on the nature of the data and the problem to be solved. Read more about publicly available software in the Molecular Systems Biology Review.

Source articles

Angermueller C, et al. (2016) Deep Learning for computational biology. Mol. Sys. Biol. 12:878; published online 19 July.

Nikhil Buduma. Deep Learning in a nutshell. Blog Post -


Oliver Stegle is supported by the European Molecular Biology Laboratory (EMBL), the Wellcome Trust and the European Union.

Wolf Reik is supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC), the Wellcome Trust and the EU.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 635290. 

Selected Websites
Contact the Press Office

Dr Samantha Wynne

Media Officer

Wellcome Trust Sanger Institute,
CB10 1SA,

Tel +44 (0)1223 492 368

Mobile +44 (0) 7900 607793

Fax +44 (0)1223 494 919

Recent News

Present-day Lebanese descend from Biblical Canaanites, genetic study suggests

Scientists sequenced the genomes of 4,000-year-old Canaanite individuals and compared these to other ancient and present-day populations

New global health initiative for genomic surveillance of antimicrobial resistance funded by NIHR

The Centre for Genomic Pathogen Surveillance to house the Global Health Research Unit to monitor antibiotic resistant bacteria around the globe

MRSA emerged years before methicillin was even discovered

Study shows that Staphylococcus aureus acquired the mecA methicillin resistance gene in the mid-1940s