Population genomics of adaptation

High-throughput sequencing has opened up a new chapter in the study of molecular evolution and genetics by allowing deep sequencing of whole populations of organisms and cells.

We are now in a unique position to study in detail how genetic composition of populations change as they respond to external pressures such as drug therapies. We can ask: What is the role of genetics in a person's susceptibility to develop a cancer, or another potentially fatal disease? Are the observed differences between individuals mostly a result of neutral evolution or do they bear a fitness advantage? These questions are not only interesting for understanding evolution but can also make a fundamental contribution to biomedical applications. The promise of personalised medicine will critically depend on finding and understanding molecular disease phenotypes and on developing algorithms to help bring actionable insights to clinics.

Our group contributes to this effort by developing scalable methods for biomedical applications of data. We further use these data to address basic biological research questions such as how drug resistance arises.

[Wellcome Photo Library, Wellcome Images]


Evolving populations can adapt to external changes such as the application of drug therapies. It is now broadly recognised that evolution of drug resistance, as observed in bacteria, viruses, parasites and cancer, is a key challenge for global health.

From the technological side, large-scale -omics data provide a new opportunity to study in detail how resistance evolves. However, data alone will not solve the problem of resistance. The development of cancer or the spread of infection within a host's cell population are dynamic processes. Similarly, therapeutic interventions against them will cause time-dependent responses.

Therefore, new evolutionary-theory based computational methods and ideas are needed to analyse these data. These methods can help to characterise the emergence of drug resistance in model systems and to design experiments that ultimately lead to novel approaches in combating resistance.


Our Aims

The main objective of Population genomics of adaption group is to increase understanding of functional consequences of naturally occurring variation in evolving populations.

Our Approach


New sequencing technologies enable high-resolution monitoring of evolving populations. However, the reads generated using so-called second generation sequencing platforms need a substantial informatics effort, for example assembly and imputation, before they can be used. Multiple groups at Sanger are contributing to this effort so that other investigators (including our group) can focus on downstream analyses, for example functional interpretation of the sequence variation and adaptive dynamics.

Population and evolutionary genomics of cancer cells

Individual cells within a cancer cell population share ancestry - a fact which can readily be observed from correlations between their genomic sequences. Elucidating the tempo and mode changing these correlations is central to better understand the development cancer.

Reconstruction of Clonal Heterogeneity...

Reconstruction of Clonal Heterogeneity... [Genome Research Limited]


For example, it has recently become clear that individual tumours can contain multiple competing lineages, so called subclones with private and shared mutations, related by their joint evolutionary history going back to the most recent common ancestor. Such heterogeneity poses an obvious challenge to cancer therapies. There is evidence that it can underpin the emergence of resistance and so adversely affects treatment outcomes. Therefore, the ability to track subclonal dynamics and changes in clonal composition can inform therapy. The challenge is that it is still not possible to sequence individual cells routinely to capture the full information about their genotype. Instead, short-read sequencing of cell populations will be the main technology for cancer genomics in the near future. This means that one needs to use computational methods to reconstruct the subclonal lineages from mixed sequence samples, see Figure 1. We have developed an algorithm, cloneHD, for this reconstruction problem. In the near future we will be analysing larger cancer sample sets to map their subclonal evolution. We can then start to find out systematically how therapies change the evolutionary dynamics.


We develop scalable methods for biomedical applications of data, e.g., to characterise genetic composition of cancer cell populations. We further address basic biological research questions such as how drug resistance arises.

Selected Publications

  • The value of monitoring to control evolving populations.

    Fischer A, Vázquez-García I and Mustonen V

    Proceedings of the National Academy of Sciences of the United States of America 2015;112;4;1007-12

  • Identifying selection in the within-host evolution of influenza using viral sequence data.

    Illingworth CJ, Fischer A and Mustonen V

    PLoS computational biology 2014;10;7;e1003755

  • High-definition reconstruction of clonal composition in cancer.

    Fischer A, Vázquez-García I, Illingworth CJ and Mustonen V

    Cell reports 2014;7;5;1740-52

  • Computational approaches to identify functional genetic variants in cancer genomes.

    Gonzalez-Perez A, Mustonen V, Reva B, Ritchie GR, Creixell P, Karchin R, Vazquez M, Fink JL, Kassahn KS, Pearson JV, Bader GD, Boutros PC, Muthuswamy L, Ouellette BF, Reimand J, Linding R, Shibata T, Valencia A, Butler A, Dronov S, Flicek P, Shannon NB, Carter H, Ding L, Sander C, Stuart JM, Stein LD, Lopez-Bigas N and International Cancer Genome Consortium Mutation Pathways and Consequences Subgroup of the Bioinformatics Analyses Working Group

    Nature methods 2013;10;8;723-9

  • EMu: probabilistic inference of mutational processes and their localization in the cancer genome.

    Fischer A, Illingworth CJ, Campbell PJ and Mustonen V

    Genome biology 2013;14;4;R39

  • Components of selection in the evolution of the influenza virus: linkage effects beat inherent selection.

    Illingworth CJ and Mustonen V

    PLoS pathogens 2012;8;12;e1003091

  • Quantifying selection acting on a complex trait using allele frequency time series data.

    Illingworth CJ, Parts L, Schiffels S, Liti G and Mustonen V

    Molecular biology and evolution 2012;29;4;1187-97

  • Fitness flux and ubiquity of adaptive evolution.

    Mustonen V and Lässig M

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;9;4248-53

  • From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation.

    Mustonen V and Lässig M

    Trends in genetics : TIG 2009;25;3;111-9

  • Energy-dependent fitness: a quantitative model for the evolution of yeast transcription factor binding sites.

    Mustonen V, Kinney J, Callan CG and Lässig M

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;34;12376-81

  • Molecular evolution under fitness fluctuations.

    Mustonen V and Lässig M

    Physical review letters 2008;100;10;108101

  • Adaptations to fluctuating selection in Drosophila.

    Mustonen V and Lässig M

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;7;2277-82

  • Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies.

    Mustonen V and Lässig M

    Proceedings of the National Academy of Sciences of the United States of America 2005;102;44;15936-41


Team members

Chris Illingworth

- unknown

I studied mathematics at St. John's College, Cambridge and subsequently completed a PhD at the University of Essex on the topic of flexibility in protein-ligand binding, encompassing issues of protein sequence, protein structure, and both quantum and classical molecular models. I subsequently moved to the University of Oxford, where I applied computational modelling to study electrical polarization in ion channels and to binding in the HIF-1α-pVHL complex, moving from there into a short-term post lecturing in physical chemistry and bioinformatics at the University of Essex. I moved to the Sanger Institute in June 2010.


Improvements in genome sequencing have led to the availability of data describing in detail the evolution of a biological system over a period of time. Such data has the potential to give insight into processes such as the development of drug resistance in bacteria, the adaptation of viruses to combat the human immune system, and the changes which make healthy cells become cancerous. I am working on the development of statistical models with which to best understand these processes, so as to combat the threat caused by cancer and infectious disease.


  • Components of selection in the evolution of the influenza virus: linkage effects beat inherent selection.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom. ci3@sanger.ac.uk

    The influenza virus is an important human pathogen, with a rapid rate of evolution in the human population. The rate of homologous recombination within genes of influenza is essentially zero. As such, where two alleles within the same gene are in linkage disequilibrium, interference between alleles will occur, whereby selection acting upon one allele has an influence upon the frequency of the other. We here measured the relative importance of selection and interference effects upon the evolution of influenza. We considered time-resolved allele frequency data from the global evolutionary history of the haemagglutinin gene of human influenza A/H3N2, conducting an in-depth analysis of sequences collected since 1996. Using a model that accounts for selection-caused interference between alleles in linkage disequilibrium, we estimated the inherent selective benefit of individual polymorphisms in the viral population. These inherent selection coefficients were in turn used to calculate the total selective effect of interference acting upon each polymorphism, considering the effect of the initial background upon which a mutation arose, and the subsequent effect of interference from other alleles that were under selection. Viewing events in retrospect, we estimated the influence of each of these components in determining whether a mutant allele eventually fixed or died in the global viral population. Our inherent selection coefficients, when combined across different regions of the protein, were consistent with previous measurements of dN/dS for the same system. Alleles going on to fix in the global population tended to be under more positive selection, to arise on more beneficial backgrounds, and to avoid strong negative interference from other alleles under selection. However, on average, the fate of a polymorphism was determined more by the combined influence of interference effects than by its inherent selection coefficient.

    Funded by: Wellcome Trust: 098051

    PLoS pathogens 2012;8;12;e1003091

  • Quantifying selection acting on a complex trait using allele frequency time series data.

    Illingworth CJ, Parts L, Schiffels S, Liti G and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    When selection is acting on a large genetically diverse population, beneficial alleles increase in frequency. This fact can be used to map quantitative trait loci by sequencing the pooled DNA from the population at consecutive time points and observing allele frequency changes. Here, we present a population genetic method to analyze time series data of allele frequencies from such an experiment. Beginning with a range of proposed evolutionary scenarios, the method measures the consistency of each with the observed frequency changes. Evolutionary theory is utilized to formulate equations of motion for the allele frequencies, following which likelihoods for having observed the sequencing data under each scenario are derived. Comparison of these likelihoods gives an insight into the prevailing dynamics of the system under study. We illustrate the method by quantifying selective effects from an experiment, in which two phenotypically different yeast strains were first crossed and then propagated under heat stress (Parts L, Cubillos FA, Warringer J, et al. [14 co-authors]. 2011. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res). From these data, we discover that about 6% of polymorphic sites evolve nonneutrally under heat stress conditions, either because of their linkage to beneficial (driver) alleles or because they are drivers themselves. We further identify 44 genomic regions containing one or more candidate driver alleles, quantify their apparent selective advantage, obtain estimates of recombination rates within the regions, and show that the dynamics of the drivers display a strong signature of selection going beyond additive models. Our approach is applicable to study adaptation in a range of systems under different evolutionary pressures.

    Funded by: Wellcome Trust: 098051, WT077192/Z/05/Z

    Molecular biology and evolution 2012;29;4;1187-97

  • A method to infer positive selection from marker dynamics in an asexual population.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: The observation of positive selection acting on a mutant indicates that the corresponding mutation has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data.

    Results: The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability and initial fitness defect.

    Funded by: Wellcome Trust: 098051

    Bioinformatics (Oxford, England) 2012;28;6;831-7

  • Distinguishing driver and passenger mutations in an evolutionary history categorized by interference.

    Illingworth CJ and Mustonen V

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    In many biological scenarios, from the development of drug resistance in pathogens to the progression of healthy cells toward cancer, quantifying the selection acting on observed mutations is a central question. One difficulty in answering this question is the complexity of the background upon which mutations can arise, with multiple potential interactions between genetic loci. We here present a method for discerning selection from a population history that accounts for interference between mutations. Given sequences sampled from multiple time points in the history of a population, we infer selection at each locus by maximizing a likelihood function derived from a multilocus evolution model. We apply the method to the question of distinguishing between loci where new mutations are under positive selection (drivers) and loci that emit neutral mutations (passengers) in a Wright-Fisher model of evolution. Relative to an otherwise equivalent method in which the genetic background of mutations was ignored, our method inferred selection coefficients more accurately for both driver mutations evolving under clonal interference and passenger mutations reaching fixation in the population through genetic drift or hitchhiking. In a population history recorded by 750 sets of sequences of 100 individuals taken at intervals of 100 generations, a set of 50 loci were divided into drivers and passengers with a mean accuracy of >0.95 across a range of numbers of driver loci. The potential application of our model, either in full or in part, to a range of biological systems, is discussed.

    Funded by: Wellcome Trust: 091747

    Genetics 2011;189;3;989-1000

* quick link - http://q.sanger.ac.uk/molphen