Parts Group

Understanding human DNA function by engineering

Our goal is to mechanistically understand impact of mutations in human DNA. To do so, we engineer DNA variation in cells, measure its impact to assay outputs, and quantitatively model the mechanism in between. In the lab, we develop tools for genetic perturbations, and use genome engineering and synthetic biology to create cell lines for randomization, screening, or evaluation. In the office, we develop probabilistic models as well as software tools to accurately and efficiently analyse the readouts.

Research overview

Our goal is to understand how function arises from human DNA, and mutations disrupt it. To do so, we combine genetic perturbation and genome engineering methods with assays of cell state, create accurate and useful quantitative models of the readouts, and disentangle causality by experimenting in multiple contexts.
We are a combined computational and laboratory based research group. In the lab, we adapt, develop, and apply tools for genetic perturbations genomic assays. We engineer the genomes using CRISPR/Cas, prime editing, and recombinase systems to strike the balance of randomization, control, and noise needed to answer our questions. We quantify cell state by growth competition, single cell RNA sequencing, and sorting. Computationally, we model the salient aspects of data generating processes to understand the underlying biology. We create generative models of large scale genetic screens and their outputs, and cast it in software.


The following four statements describe our approach:

1) We get things done. We start projects with clearly defined goals, and publish both positive and negative results of the ones that pass the pilot stage. We deliver to our collaborators.

2) We work on important problems. We pick projects based on how much they impact our understanding of human cells, characterize the variation of gene function across individuals, or influence how others work.

3) We succeed as a team. We have a diverse mix of backgrounds and skillsets, complementing each other with our strenghts.

4) We are excited about science. We read broadly, discuss latest developments, and keep up to date both with the depth of our field, and the entire breadth of genomics.


Deciphering genomes through engineered structural variation

While the functions of protein-coding genes are characterized increasingly well, the importance of the non-coding genome content and organization remains poorly understood. This gap is due to a lack of tools for engineering variants  that affect sequence at the scale needed to interrogate gigabases of the human genome.
To bridge this gap, we are developing a toolbox to create deletions, inversions, translocations, and duplications at scale by combining CRISPR prime editing, site-specific recombinases and long-read sequencing. We use these tools to randomize the genome at two scales.
First, we tile enhancer clusters with recombinase sites to understand the grammar of regulatory elements. Second, we target prime editors to repetitive elements to insert hundreds of recombinase sites all across the genome to understand its higher level organisational principles and dispensability

Predicting gene editing outcomes

The DNA mutation produced by cellular repair of a CRISPR–Cas9-generated double-strand break determines its phenotypic effect. It is known that the mutational outcomes are not random, but depend on DNA sequence at the targeted location. We systematically study the influence of flanking DNA sequence on repair outcome.
Base editing
CRISPR/Cas base editors promise nucleotide-level control over DNA sequences, but the determinants of their activity remain incompletely understood. We measured base editing frequencies in two human cell lines for two cytosine and two adenine base editors at ∼14 000 target sequences and find that base editing activity is sequence-biased, with largest effects from nucleotides flanking the target base.
Prime editing
Most short sequences can be precisely written into a selected genomic target using prime editing; however, it remains unclear what factors govern insertion. We design a library of 3,604 sequences of various lengths and measure the frequency of their insertion into four genomic sites in three human cell lines, using different prime editor systems in varying DNA repair contexts.

Large scale screens of coding variant effects in humans

We are interested in using genome editing tools, such as base and prime editing, for large scale pooled variant effect screens in human cells and comparing their properties to similar screens using established techniques such as saturation genome editing (SGE). These screens involve generating a pool of cells carrying a library of variants and then subjecting them to a selection pressure related to the gene of interest, which can be intrinsic or artificial, for instance linking activity to fluorescence.
Saturation mutagenesis screens measure the effect of all possible genetic variation in a gene or genomic region in a pooled manner. We are running saturation base editing and prime editing and analyze the data integrated with available SGE data from other groups at Sanger.

Finding sequence determinants of context-specific regulatory elements

Gene expression levels are precisely modulated by regulatory elements, but it is unclear which genes are regulated by which regulatory elements, what the relative contributions are of different regulatory elements, and which transcription factors cause this regulatory activity. We develop tools using synthetic biology approaches to measure the regulatory activities of sequences at scale. Using this data, we can more systematically analyse how smaller, more controlled DNA changes lead to different regulatory activities. We are exploring how we can engineer new regulatory elements with specific context-specific activities using machine learning models. We hope that this toolbox expands our understanding of how genes are regulated by their sequence context.

Modifiability of CRISPR perturbation effects in iPSCs with scRNA-seq and growth assays

Understanding the role of genetic variants is important to improving the prediction of an individual’s disease onset, risk, severity and treatment outcome. Recent techniques couple CRISPR-based perturbations with single-cell RNA sequencing (scRNA-seq), offering the ability to detect transcriptome-level changes of single cells due to perturbed genomic elements. These approaches have already been used to characterize cell-specific gene function, identify protein complex membership and predict causal genes. Unlike previous studies, which generally establish proof-of-concept by demonstrating knockout in a handful of genes or focus on a handful of cell-lines, we are performing these experiments at scale by knocking down thousands of genes across hundreds of individuals to then identify role of genetic background in phenotype penetrance.

Using CRISPR-based approaches to identify the genomic drivers of neurodegenerative disease

Human genetics analysis such as GWAS and the rise of population scale biobanks are revealing a growing list of genetic loci associated with neurodegenerative disease. However, even where there exist GWAS loci implicated in disease, the exact causative gene is unclear, as is its role in disease etiology. In collaboration with other labs at the Sanger, we are performing the first systematic comparison of the phenotypic consequences of mutation of different genetic risk factors contributing to Alzheimers disease, Parkinson’s disease and Amyotropic Lateral Sclerosis in iPSC-derived neurons, astrocytes and microglia. This ongoing analysis will connect disease-associated genotypes to disease-relevant cell types and dissect the genetic architecture within and across diseases, assess the most relevant cellular and transcriptomic phenotypes and functionally validate targets from GWAS, network analyses and transcriptomic analysis of patient cells.

Core team

Photo of Dr Alistair Dunham

Dr Alistair Dunham

Postdoctoral Fellow

Photo of Claudia Feng

Claudia Feng

PhD Student

Photo of Mr Gareth Girling

Mr Gareth Girling

Advanced Research Assistant

Photo of Jacob Hepkema

Jacob Hepkema

PhD Student

Photo of Elin Madli Peets

Elin Madli Peets

Advanced Research Assistant

Photo of Juliane Weller

Juliane Weller

PhD Student

Previous team members

Photo of Dr Felicity Allen

Dr Felicity Allen

Postdoctoral Fellow

Photo of Luca Crepaldi

Luca Crepaldi

Staff Scientist

Photo of Dr Michelle McRae

Dr Michelle McRae

Senior Research Assistant/Laboratory Manager

Photo of Danesh Moradigaravand

Danesh Moradigaravand

Senior Bioinformatician

Photo of Dr Daniele Muraro

Dr Daniele Muraro

Senior Bioinformatician

Photo of Kasia Tilgner

Kasia Tilgner

Visiting Scientist

Photo of Dr Yan Zhou

Dr Yan Zhou

Postdoctoral Fellow

Related groups


We work with the following groups


Cancer Dependency Map

The Cancer Dependency Map aims to find a targetable dependency in each cancer cell.



Loading publications...