Background
Since the publication of the human genome sequence the pace of discovery in human genetics has accelerated dramatically. We have begun to identify which changes in the genome are important for a variety of human diseases and which have occurred during recent human evolution. However, biological interpretation of these results is complicated because most of these changes do not occur inside known genes. In fact, many important genetic changes occur in the non-coding fraction of the genome, and are believed to affect the regulation of gene expression.
Understanding how changes in gene regulation alter observable phenotypes is important for:
- understanding the functional basis of genetic disease
- development of more accurate, powerful and specific diagnostics
- interpreting the biological changes that have occurred since we diverged from our common ancestors.
Recent technological developments mean that we can now assay key molecular phenotypes, including protein-coding and noncoding RNA transcription, transcription factor binding and chromatin accessibility, genome-wide and with high accuracy.
Our group studies epigenetic and gene expression variation in human populations. Recently, we have started work in human induced pluripotent stem cells as a model system for disease and development.
Research
Gene expression and regulatory variation in human populations
Part of our group's research focuses on using naturally occurring variation as a model system that we can use to test hypotheses about gene regulation. We look for genetic variants that correlate with differences in gene expression between individuals. The genetic and epigenetic context of these changes can inform about the biology of gene regulation, and can help pinpoint likely causal disease mutations.
Annotating active regulatory elements using next-generation sequencing
Our group uses experimental methods such as DNaseI digestion, chromatin-immunoprecipitation and formaldehyde-assisted recovery of regulatory elements (FAIRE) to identify active regulatory regions, and develops computational and statistical methods for interpreting these data.
Collaborations
We collaborate closely with a number of groups both at the Sanger Institute and elsewhere. We are currently working with Ludovic Vallier's lab in Cambridge on annotating regulatory elements in a variety of cell types. We also work with Duncan Odom's groups at the Sanger Institute and Cancer Research UK: Cambridge Research Institute to develop high-throughput methods for regulatory element annotation. We have close links with Ville Mustonen and Carl Anderson's groups at the Sanger Institute.
- Carl Anderson - Statistical genetics, The Wellome Trust Sanger Institute, Hinxton
- Duncan Odom - Regulatory evolution in mammalian tissues, The Wellome Trust Sanger Institute, Hinxton
- Ludovic Vallier - Gene expression variation in induced pluripotent stem cells, The Wellcome Trust Centre for Stem Cell Research, Cambridge
Selected Publications
-
Global properties and functional complexity of human gene regulatory variation.
PLoS genetics 2013;9;5;e1003501
PUBMED: 23737752; PMC: 3667745; DOI: 10.1371/journal.pgen.1003501
-
Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis.
Nature genetics 2012;44;10;1137-41
PUBMED: 22961000; PMC: 3459817; DOI: 10.1038/ng.2395
-
DNA sequence-dependent compartmentalization and silencing of chromatin at the nuclear lamina.
Cell 2012;149;7;1474-87
PUBMED: 22726435; DOI: 10.1016/j.cell.2012.04.035
-
DNase I sensitivity QTLs are a major determinant of human expression variation.
Nature 2012;482;7385;390-4
PUBMED: 22307276; PMC: 3501342; DOI: 10.1038/nature10808
-
The contribution of RNA decay quantitative trait loci to inter-individual variation in steady-state gene expression levels.
PLoS genetics 2012;8;10;e1003000
PUBMED: 23071454; PMC: 3469421; DOI: 10.1371/journal.pgen.1003000
-
Controls of nucleosome positioning in the human genome.
PLoS genetics 2012;8;11;e1003036
PUBMED: 23166509; PMC: 3499251; DOI: 10.1371/journal.pgen.1003036
-
Dissecting the regulatory architecture of gene expression QTLs.
Genome biology 2012;13;1;R7
PUBMED: 22293038; PMC: 3334587; DOI: 10.1186/gb-2012-13-1-r7
-
Exon-specific QTLs skew the inferred distribution of expression QTLs detected using gene expression array data.
PloS one 2012;7;2;e30629
PUBMED: 22359548; PMC: 3281037; DOI: 10.1371/journal.pone.0030629
-
False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions.
Bioinformatics (Oxford, England) 2011;27;15;2144-6
PUBMED: 21690102; PMC: 3137225; DOI: 10.1093/bioinformatics/btr354
-
Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data.
Genome research 2011;21;3;447-55
PUBMED: 21106904; PMC: 3044858; DOI: 10.1101/gr.112623.110
-
DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines.
Genome biology 2011;12;1;R10
PUBMED: 21251332; PMC: 3091299; DOI: 10.1186/gb-2011-12-1-r10
-
Alternative splicing is frequent during early embryonic development in mouse.
BMC genomics 2010;11;399
PUBMED: 20573213; PMC: 2898759; DOI: 10.1186/1471-2164-11-399
-
Effect of the assignment of ancestral CpG state on the estimation of nucleotide substitution rates in mammals.
BMC evolutionary biology 2008;8;265
PUBMED: 18826599; PMC: 2576242; DOI: 10.1186/1471-2148-8-265
-
Selective constraints in experimentally defined primate regulatory regions.
PLoS genetics 2008;4;8;e1000157
PUBMED: 18704158; PMC: 2490716; DOI: 10.1371/journal.pgen.1000157
-
Genomic selective constraints in murid noncoding DNA.
PLoS genetics 2006;2;11;e204
PUBMED: 17166057; PMC: 1657059; DOI: 10.1371/journal.pgen.0020204
-
The scale of mutational variation in the murid genome.
Genome research 2005;15;8;1086-94
PUBMED: 16024822; PMC: 1182221; DOI: 10.1101/gr.3895005
-
Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents.
Proceedings of the National Academy of Sciences of the United States of America 2003;100;23;13402-6
PUBMED: 14597721; PMC: 263826; DOI: 10.1073/pnas.2233252100
Team
Team members
- Daniel Gaffney
- CDF Informatics Group Leader
- Andrew Knights
- Senior Research Assistant
- Natsuhiko Kumasaka
- Postdoctoral Fellow
Daniel Gaffney
- CDF Informatics Group Leader
I earned my PhD in evolutionary genetics from Edinburgh University in 2006 under the supervision of Dr Peter Keightley. My graduate research used computational methods to study variation in the mutation rate and natural selection in noncoding DNA. From 2006 to 2008 I pursued a postdoc with Dr Jacek Majewski in McGill University and Genome Quebec Genome Centre, where I worked on the evolution of transcriptional regulation in primates, and the role of alternative splicing in embryonic development. From 2008 until 2011 I worked on population genetic variation in gene expression with Dr Jonathan Pritchard at the University of Chicago.
Research
Our current research is focused on understanding the impact of human genetic variation of molecular phenotypes such as gene transcription, and other important processes.
Andrew Knights
- Senior Research Assistant
I graduated with a BSc (Hons) in Biochemistry and Microbiology from the University of Sheffield in 1998. I then joined the Sanger Institute Library Construction Group, working on the Human Genome Project. In 2004, I left the Sanger Institute for the Babraham Institute, Cambridge, to carry out a PhD investigating vertebrate and invertebrate G protein-coupled receptors (GPCRs). Following a short post-doctoral appointment within the GPCR field, I returned to the Library Construction Group at the Sanger Institute, early 2010 as a Staff Scientist, with core duties focusing on the generation and optimisation of various transcriptome libraries for the Illumina platform.
Research
In late 2011, I joined Daniel Gaffney’s group. Using my molecular biology background, my objective is to set up the wet laboratory aspect of the project, introducing and optimising assays such as FAIRE-seq, ChIP-seq and DNAseI-seq. In combination with the computational side of the group, these assays are being used to study gene regulation in human populations, currently focusing on variation in iPS cells obtained from separate individuals, as well as different tissues from within individuals.
Natsuhiko Kumasaka
- Postdoctoral Fellow
I received my doctoral degree from the Graduate School of Science and Technology at Keio University, where my research focused on combining fields such as statistics, data visualization, computer science and graphic design, as a means for understanding phenomena hidden behind the data. I developed a new data visualization technique called Textile Plot with Professor Ritei Shibata. After completing my thesis, I spent time developing tools for calling copy number polymorphisms and predicting population structure analysis of SNP genotype data as a postdoc under Dr Naoyuki Kamatani at CGM, RIKEN. I was also involved several gnome-wide association studies at RIKEN.
Research
I'm currently a Postdoctoral fellow and involved in a project on investigating transcriptional and epigenetic variation in human induced pluripotent stem cells (hiPSCs). My role as a statistician is to develop a novel statistical model based on a negative-binomial regression to detect differentially expressed genes among hiPSCs derived from different tissue types while correcting known biological and technical biases in the RNA-seq data. I'm now extending the model in the generalised linear mixed model framework to take account of complex sample correlation structures.

Dr Daniel Gaffney
