Genomic archaeology

How genetics is revealing the secrets of our past.

Exploring the human genome is enabling us to look back in time at the evolution of humans.

At the Wellcome Trust Sanger Institute, researchers are taking a range of approaches to understand the mysteries of the human genome. While the main aim is to use this knowledge in a medical context, these studies also provide fascinating insight into the changes in the human genome over time and the evolutionary development of modern humans.

The spread of the human species. 195 thousand years ago the human species first emerged in the African subcontinent. For hundreds of thousands of years those humans remained restricted to that region. Evidence suggests that the first humans left Africa at least 60 thousand years ago. From then humans made the journey through Europe and Asia, displacing other earlier populations such as Neanderthals, and eventually, around 15-25 thousand years ago, they successfully made the journey into the Americas.

The spread of the human species. 195 thousand years ago the human species first emerged in the African subcontinent. For hundreds of thousands of years those humans remained restricted to that region. Evidence suggests that the first humans left Africa at least 60 thousand years ago. From then humans made the journey through Europe and Asia, displacing other earlier populations such as Neanderthals, and eventually, around 15-25 thousand years ago, they successfully made the journey into the Americas.


Some time around 60 000 years ago, a small group of lithe, upright apes journeyed out of Africa, through the Arabian Peninsula. Some turned left and made their way into the Middle East and Europe; others headed right, through Asia and into Australasia and the Americas. They were following in the footsteps of several of their relatives, who had made similar journeys thousands of years before. Yet this ape was different. Possessed of a particularly sharp mind and ability to work collectively, it went on to displace all other hominid populations and dominate the planet. That ape was, of course, Homo sapiens - our species.

Charles Darwin skirted the issue of human evolution in On the Origin of Species. The idea of natural selection was incendiary enough without adding extra fuel to the fire by including the humans. But he clearly saw a continuity between humans and other animals and knew that natural selection could account for human evolution - covered in a later book, The Descent of Man.

Fossil evidence, archaeology and comparisons between humans and other primates have given us a picture of early human development. The human lineage originated in Africa, diverging from the chimpanzee lineage around 5-7 million years ago. Various species of hominid are known from sites across Africa. Some 2.5 million years ago the first members of our own genus, Homo, appeared and some left Africa and settled in other sites - H. erectus in Asia, H. neanderthalis in Europe.

Perhaps the biggest change, though, was the evolution, starting 200 000 years ago, of H. sapiens. Aided by a smart brain and, perhaps crucially, language, it was able to leave Africa and migrate to the four corners of the Earth, inventing agriculture on the way and obliterating all the other hominids it encountered. It is a fascinating story. And one that is now being recounted in ever more detail: for the story of our evolution is written into our genomes.

A history in our genes

Dr Chris Tyler-Smith, leader of the Sanger Institute Human Evolution Group.

Dr Chris Tyler-Smith, leader of the Sanger Institute Human Evolution Group. [Wellcome Library, London]


"Humans are really odd as an ape," points out Chris Tyler-Smith. There are several species and subspecies of gorillas and chimpanzees, and they are all rare. "If you contrast that with humans, there is just one species, no subspecies, little diversity but enormous numbers."

So humans are highly anomalous. "Something biologically very unusual has happened on the human lineage since it split with other apes."

In Africa, our ancestors first showed signs of characteristic 'modern' behaviour. "Between 200 000 years ago and around 50 000 years ago some really key changes happened in our evolution," says Dr Tyler-Smith. "That's seen in the fossil and archaeological record as the appearance of modern anatomy, development of more complex technologies, use of more and a wider selection of materials, transport over larger distances and soon after 100 000 years ago what some people have called the first appearance of art - the widespread use of ochre, including the production of inscribed ochre."

"But by 50-60 000 years ago it was possible for these humans, which were both anatomically and behaviourally modern, to expand out of Africa. After that everything seems to be different in that they expanded enormously into environments where no apes had gone before."

The consequences of our early history are still apparent today. Because humans appeared relatively recently, collectively we are very similar at a genetic level. Moreover, because we passed through a genetic 'bottleneck' on our way out of Africa, all people alive today outside Africa are descended from a remarkably small group and share the same subset of African genetic diversity.

What clues can our genomes provide to our distant past? While archaeological and fossil studies uncover physical evidence of the past, genomic archaeology reconstructs the past from analyses of DNA from modern populations.

Although DNA is passed on from generation to generation fairly faithfully, it occasionally changes (mutates). In addition, segments of DNA are regularly shuffled, during the processes that create sex cells. So family relationships can be built up revealing how segments of DNA have changed over time and what their common ancestors would have looked like. Moreover, because mutations accumulate at a steady rate, it is possible to estimate when related sequences began to diverge.

In fact, other valuable information can be extracted from DNA analysis. Crucially, some regions seem to be evolving faster than chance would suggest. These sites are likely to be under 'positive selection' - favoured by natural selection.

Reconstructing the past therefore depends on developing an understanding of genetic differences between individuals or populations today. For a genomic archaeologist, the human genome is thus a potentially rich field site.

Less is more

Image claimed to be Jews, identifiable by their hats, being killed by Crusaders, from a 1250 French Bible.

Image claimed to be Jews, identifiable by their hats, being killed by Crusaders, from a 1250 French Bible.


One surprising finding is that loss of gene function may sometimes be evolutionarily beneficial. "One of the nicest examples is the caspase12 gene," says Dr Tyler-Smith. "Most people have an inactive form with a stop codon in it but some have the active form. When we looked at the pattern of variation around that gene, it showed evidence of very strong Darwinian selection, or positive selection, for the inactive form." [16532395]

Absence of the gene seems to protect against severe sepsis - even today a major threat to health. Yet, points out Dr Tyler-Smith, that raises an odd question: "If it's so advantageous to lose the gene you might wonder why does anyone have it in the first place."

The answer, he suggests, is that when population densities were low and infectious disease less of a problem, the gene did provide some advantage. Once population densities increased and people were exposed to more infections, this benefit was outweighed by the enhanced susceptibility to sepsis.

Dr Tyler-Smith's group has recently identified another example of advantageous gene loss, affecting a gene on the X chromosome known as MAGEE2 [19200524]. This gene seems to have been selected against in Asia, though why remains a mystery: "We have absolutely no idea what the gene does or what the biological mechanism of its selection might have been."

Sometimes the patterns of selection look even more complex - as with the UGT2B17 gene, which codes for an enzyme that metabolises some environmental molecules and steroid hormones such as testosterone. While in Asia there were clear signs of selection for loss of the gene, in Europe things seemed to be different: "What we saw was the maintenance of both the presence and the absence of the gene. That's unusual, suggesting perhaps a different mode of selection - balancing selection - in Europe." [18760392]

In other words, competing selection pressures were at work, promoting both the active and the inactive forms. What is selecting the active form, and why it is only significant in Europe, is not clear.

UGT2B17 illustrates an emerging theme, suggests Dr Tyler-Smith: "It seems that the targets of selection have often been selected in multiple ways or on multiple occasions at different times or places."

It may seem surprising that gene loss is not only tolerated but sometimes positively beneficial. Yet recent research is even more startling, suggesting that 1 in 200 genes can be lost without detrimental effects on our health. A genome-wide search for genes containing 'nonsense' codons - which would lead to the production of defective proteins - identified 167 genes inactivated by nonsense mutations. Individuals carry on average at least 46 such variations. For 99 of the genes, both copies could be lost in adults living a normal existence. [19200524]

From Mongol hordes to Phoenician footprints

Portrait of Genghis Khan produced by a Chinese artist at the Imperial Court. Painting housed at the National Palace Museum, Taiwan.

Portrait of Genghis Khan produced by a Chinese artist at the Imperial Court. Painting housed at the National Palace Museum, Taiwan.


Research now also has the capacity to tie genetic patterns to historical events. Particularly powerful is analysis of the Y chromosome, which is passed on solely down the male line, and of mitochondrial DNA, which is inherited maternally.

Most of the Y chromosome does not recombine so is passed on relatively faithfully from generation to generation. Small changes are occasionally introduced so family trees can be drawn up showing how variants (known as haplotypes) are related to one another. Haplotypes show a geographical clustering - as one variant arises it tends to stay within its local population as people in the past migrated much less frequently than today.

As a result, an analysis of regional Y chromosomal haplotypes can highlight anomalous features, such as the presence of unexpected haplotypes that have infiltrated a population. "We've found genetic evidence for Crusader lineages in Lebanon," [18374297] says Dr Tyler-Smith, "and for Y chromosome lineages spread by the Phoenicians around the Mediterranean basin 2-3000 years ago." [18976729]

The Phoenicians were notable traders. From their homeland in the Levant they established colonies and trading posts throughout the Mediterranean, before suddenly disappearing from history. Even so, a comparison of known Phoenician and nearby non-Phoenician sites revealed Y chromosomal signatures specific to the Phoenicians. Although their culture has gone, their genetic legacy lives on.

Y chromosome analysis has revealed more recent historical influences on genetic diversity. "We discovered types of Y chromosome that had undergone quite extraordinary expansions. We could link one of those to Genghis Khan and the Mongol expansion that took place around 800 years ago." [12592608]

The Mongol Empire stretched from the Pacific to the Caspian Sea - the largest land empire ever created - and lasted several generations. Genghis Khan's grandson, Khubla Khan, went on to become Emperor of China. The enormous privileges associated with the male lineage enabled the Y chromosome to be spread far and wide - now 1 in every 200 men carries the Genghis Khan Y chromosome.

A similar story explains the more recent expansion of a Y haplotype in China over the past 400 years - a legacy of the Manchu leader Nurhaci and his grandfather Giocangga, who established a patrilineal elite. More than 3 per cent of east Asian males now carry the Manchu Y chromosome. [16380921]

Even more remarkably, genetic analyses have allowed Dr Tyler-Smith to tie patterns of genetic diversity to geoclimate. "When we studied, in a systematic way, many populations from east Asia, we found a very striking difference. The north east Asian populations began to expand before the southern ones - the north starting to expand before 25 000 years ago and the south after that time." [16489223]

Might environmental conditions provide an answer to this curious difference? This was the period of the last Ice Age and much of northern east Asia was covered by the 'Mammoth Steppe' - high land populated by large mammals, including mammoths.

"What we could guess was that any humans who could exploit those resources would have an advantage and could have started to expand early. In the more southern parts of east Asia there was no Mammoth Steppe, so their expansion came only after the climate started to warm up at the end of the Ice Age."

"So by studying patterns of variation in DNA we could get some insight into prehistoric diet. That was a surprise to me."

The genome in flux

Dr Matt Hurles, leader of the Sanger Institute Genomic mutation and genetic disease group.

Dr Matt Hurles, leader of the Sanger Institute Genomic mutation and genetic disease group. [Wellcome Library, London]


Genetic variation is alive and well in the human genome, but what form does it take? For many years, the focus of attention was almost exclusively on single base pair changes in the genome (known as SNPs). Less attention was paid to larger rearrangements and changes such as deletions or duplications - collectively known as structural variation.

"We knew it existed," admits Matt Hurles, who is interested in the mutational changes that shape the human genome, "but we didn't have any tools to find it, so we had a kind of collective amnesia in the field."

The main problem was that such changes were identified by analysis of chromosomes under the microscope, which revealed only large changes. An alternative, sequencing DNA directly, provided information only over short lengths.

After the sequencing of the human genome, the gap between the two began to be closed thanks to the development of a new tool - DNA microarrays. These broke down the reference genome into many thousands of fragments attached to a microscope slide in a regular grid. By seeing how many fragments from a test genome bound to each fragment of the reference genome, it became possible to tell whether a region of the genome had been duplicated or lost. "It was no great intellectual advance, more a technical one."

When this microarray technology was applied to the human genome, it threw up a big surprise: there was far more structural variation in the human genome than anyone had suspected [17122850]. And it seemed certain to be having important consequences. "It can cause the situation where there are different numbers of functional genes in every genome. That was the first observation that really sparked the field."

Working with Manolis Dermitzakis - formerly a Sanger Institute researcher and now at the Department of Genetic Medicine and Development at the University of Geneva - Dr Hurles compared sites of structural and single point variation, finding that they provided complementary information; studying just one type would mean missing potentially important variation [17289997]. The growth of 'personal genomics' and the release of the genome sequences of Craig Venter and Jim Watson further emphasised the importance of structural variation. Although SNPs outnumber sites of structural variation, in terms of the amount of DNA affected, structural variation has the upper hand.

As a result, says Dr Hurles, statistics on relatedness have had to be fundamentally revised. "We underestimated how different we were. We used to say that if we compared one human genome with another we were 99.9% identical. Because of structural variation we'll bring that down to 99.5%. We've greatly increased the number of bases that differ between any two genomes. Also, we used to say we were 99% similar to chimpanzees, and now that's probably gone down to 95 or 96% once all the forms of variation are taken into account."

Structural variation is also highly significant from an evolutionary point of view. Duplication of DNA is thought to be an important mechanism for generating extra copies of genes. While one gene of a pair can continue with its ancestral role, the second can diverge and potentially take on a new function. Just such a change seems to have happened to haemoglobin genes, creating a cluster of closely related genes.

Off and on

Structural variation. The development of DNA microarrays and other new methods made it possible for researchers to detect rearrangements such as deletions, duplications and inversions in DNA sequence - collectively known as structural variation.

Structural variation. The development of DNA microarrays and other new methods made it possible for researchers to detect rearrangements such as deletions, duplications and inversions in DNA sequence - collectively known as structural variation.


What other impact might this variation be having? Much attention has focused on protein-coding sequences, but Manolis Dermitzakis believes another type of DNA deserves close scrutiny: regulatory DNA - short sequences, often clustered around the start of genes, that control when and where a gene is active. Variation in DNA sequence does not alter the type of protein produced but changes the cell types, tissues or stages of development in which it is made.

To get a handle on the impact of variation in regulatory DNA, Dr Dermitzakis has to assess its effects on gene activity. "It's a relatively simple model," he says. Using cells derived from a variety of HapMap populations, he assesses gene activity across the entire genome. Then he looks for correlations between variation in the genome and differences in gene activity: does variation at a particular point in the genome affect expression of a particular gene?

HapMap project

The International HapMap Project is an international collaboration, established in 2002, that has been identifying and mapping sites of variation in the human genome, by analysing DNA from 269 individuals in four different populations (African, Han Chinese, Japanese and North Americans of West and Northern Europe descent). The Sanger Institute has been the main UK partner since its inception.

The HapMap Project has concentrated on single points of genetic variation - SNPs - nearly four million of which have been mapped to date.

All the data are made freely accessible through public databases. The first phase of data release was completed in 2005, the second in 2007. A third phase of data release was begun early in 2009.

As a standardized resource linked to global genetic diversity, HapMap Project materials have other valuable uses. Dr Tyler-Smith uses them in his evolutionary studies, Dr Hurles based his map of structural variation on them, and Dr Dermitzakis, uses cells derived from HapMap samples to study global variation in gene activity. The 1000 Genomes Project will also draw heavily upon HapMap subjects.


  • International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61.

In practice, he is looking for two types of effect: those where the variation is close to the gene, where a direct effect on transcription factor binding is likely; and secondary effects, where a site distant from the gene influences its activity (which probably depends up on a pathway of interactions). [16362079; 18846210]

Ultimately, studies of gene activity provide a link between DNA sequence and a living organism. If DNA is essentially a one-dimensional string of letters, gene expression can map variation to a three-dimensional organism - and even to a fourth dimension, with studies of the regulation of genes over time.

So far, his work has concentrated on one particular cell type - immortalised B cells, which are relatively easy to study. But gene expression will vary between cell types, so he is extending his analysis to other cell types - revealing which variation affects gene expression in all cell types and which in a more tissue-specific manner.

Notably, says Dr Dermitzakis, these studies begin to provide information about regions that are under selective pressure. "That gives us lots of interesting targets, confirming previous signals but also giving us lots of other processes that might have been the target of selection but we didn't necessarily know about." [19091723]

Moreover, he adds, "Many of these functions appear to be evolving relatively fast compared to protein-coding functions." In fact, regulatory regions seem to be a highly dynamic aspect of the human genome. "There's a lot of DNA that has been recently recruited to serve a function. Because it's been recently recruited there's no signature of that selection to maintain the sequence."

Dr Dermitzakis sees this as a fundamental aspect of the human genome. "It's a turnover of regulatory elements. One can even see the turnover happening within species, within humans. In the same way as we can have the birth and death of genes we can have the birth and death of regulatory elements." [12082130]

What might be the consequence of this turnover? The possibility is that regulatory elements are acting in groups, combinatorially, to tweak gene expression patterns - or even to achieve the same effect in multiple ways. "It's possible that you have binding sites that I don't have and I have other binding sites that you don't have but both are functioning to do exactly the same thing."

In evolutionary terms, this kind of action would allow fine-tuning and more subtle changes to biological functions - a 'tinkering' rather than the dramatic change that alteration of a protein might cause. In fact, suggests Dr Dermitzakis, this is a more realistic model of how complex organisms evolve: "I don't believe evolution has operated in jumps. If you have an engine you cannot change 90% of the engine or an important component and try to run it. You have to start changing little pieces here and there and see what happens."

Towards the map of variation

Dr Manolis Dermitzakis, former Sanger Institute researcher in population and comparative genetics.

Dr Manolis Dermitzakis, former Sanger Institute researcher in population and comparative genetics.


Despite the great advances in human genetics, a fundamental difficulty remains. While the complete human genome reference sequence has spawned an enormous amount of work, it provides limited information about how people differ genetically.

As a result, genetic studies may identify a region of interest in the genome, but each region may contain a set of variants, any one of which could be the crucial one (the causative or functional variant). This, says Richard Durbin, is a big problem for researchers: "They don't know if it's an 'unknown known' - one of the ones on a list they know about - or one that's not yet even on the list."

Fortunately, the list is about to get a major boost, thanks to the 1000 Genomes Project, which Dr Durbin leads, along with David Altshuler of the Broad Institute in the USA. "It's going to provide a list of positions that vary. Just a catalogue. But that shouldn't be looked down upon. It's a bit like saying the human genome just gave us a list of the sequences of all the genes. Before that you could get the sequence of a gene if you wanted to but it was an effort. Suddenly they were there, they were given. The same will be true of human variants."

Among those most keen to get their hands on this list are the groups tracking down the genetic variants underlying common human diseases. But that won't be its only use: "The primary foundation for the project is medical genetics," says Dr Durbin, "but I think it should also provide a wonderful resource for systematically looking at recent human evolution."

Dr Richard Durbin, co-leader of the 1000 Genomes Project.

Dr Richard Durbin, co-leader of the 1000 Genomes Project. [Wellcome Library, London]


As its name suggests the 1000 Genomes Project is analysing in depth the genomes of 1000 different individuals, from different populations around the world. A comparison of these sequences will reveal precisely which sites in the genome vary, even if only one in a 100 people carry a particular variant. The project is assessing both single point (SNP) and structural variation.

The spin off for evolutionary biology is that the list of variation will provide a much more refined tool for assessing how genome sequences have evolved over time. If earlier studies of variation created the Mappa Mundi of human genetic diversity, the 1000 Genomes Project will create the equivalent of Google Earth. It will underpin much future research at the Sanger Institute, and elsewhere.

The Project has been made possible by advances in sequencing technology, which can churn out sequence data at an astonishing rate. Even at this early stage of the endeavour, it is identifying considerable diversity: "There are an awful lot of variants. The first release of the 1000 Genome data was based on four individuals, the ones done to high-level detail, and there were something like 5 million variants. The second release has around 10 million variants, and those come from about 100 people. There's an enormous amount in there. In the 1000 Genomes Project, I think we'll identify more like 30 million - and that's a lot."

Ian Jones, Isinglass Consultancy Ltd.

Human evolution group

Genome dynamics and evolution group

Genome informatics group

  • Gene expression levels are a target of recent natural selection in the human genome.

    Kudaravalli S, Veyrieras JB, Stranger BE, Dermitzakis ET and Pritchard JK

    Department of Human Genetics, The University of Chicago, USA.

    Changes in gene expression may represent an important mode of human adaptation. However, to date, there are relatively few known examples in which selection has been shown to act directly on levels or patterns of gene expression. In order to test whether single nucleotide polymorphisms (SNPs) that affect gene expression in cis are frequently targets of positive natural selection in humans, we analyzed genome-wide SNP and expression data from cell lines associated with the International HapMap Project. Using a haplotype-based test for selection that was designed to detect incomplete selective sweeps, we found that SNPs showing signals of selection are more likely than random SNPs to be associated with gene expression levels in cis. This signal is significant in the Yoruba (which is the population that shows the strongest signals of selection overall) and shows a trend in the same direction in the other HapMap populations. Our results argue that selection on gene expression levels is an important type of human adaptation. Finally, our work provides an analytical framework for tackling a more general problem that will become increasingly important: namely, testing whether selection signals overlap significantly with SNPs that are associated with phenotypes of interest.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: HG002772, R01 HG002772, R01 HG002772-05; Wellcome Trust

    Molecular biology and evolution 2009;26;3;649-58

  • A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs.

    Yngvadottir B, Xue Y, Searle S, Hunt S, Delgado M, Morrison J, Whittaker P, Deloukas P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK.

    Nonsense SNPs introduce premature termination codons into genes and can result in the absence of a gene product or in a truncated and potentially harmful protein, so they are often considered disadvantageous and are associated with disease susceptibility. As such, we might expect the disrupted allele to be rare and, in healthy people, observed only in a heterozygous state. However, some, like those in the CASP12 and ACTN3 genes, are known to be present at high frequencies and to occur often in a homozygous state and seem to have been advantageous in recent human evolution. To evaluate the selective forces acting on nonsense SNPs as a class, we have carried out a large-scale experimental survey of nonsense SNPs in the human genome by genotyping 805 of them (plus control synonymous SNPs) in 1,151 individuals from 56 worldwide populations. We identified 169 genes containing nonsense SNPs that were variable in our samples, of which 99 were found with both copies inactivated in at least one individual. We found that the sampled humans differ on average by 24 genes (out of about 20,000) because of these nonsense SNPs alone. As might be expected, nonsense SNPs as a class were found to be slightly disadvantageous over evolutionary timescales, but a few nevertheless showed signs of being possibly advantageous, as indicated by unusually high levels of population differentiation, long haplotypes, and/or high frequencies of derived alleles. This study underlines the extent of variation in gene content within humans and emphasizes the importance of understanding this type of variation.

    Funded by: Wellcome Trust: 062023

    American journal of human genetics 2009;84;2;224-34

  • Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean.

    Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, Haber M, Xue Y, Izaabel H, Bosch E, Adams SM, Arroyo E, López-Parra AM, Aler M, Picornell A, Ramon M, Jobling MA, Comas D, Bertranpetit J, Wells RS, Tyler-Smith C and Genographic Consortium

    Lebanese American University, Chouran, Beirut 1102 2801, Lebanon.

    The Phoenicians were the dominant traders in the Mediterranean Sea two thousand to three thousand years ago and expanded from their homeland in the Levant to establish colonies and trading posts throughout the Mediterranean, but then they disappeared from history. We wished to identify their male genetic traces in modern populations. Therefore, we chose Phoenician-influenced sites on the basis of well-documented historical records and collected new Y-chromosomal data from 1330 men from six such sites, as well as comparative data from the literature. We then developed an analytical strategy to distinguish between lineages specifically associated with the Phoenicians and those spread by geographically similar but historically distinct events, such as the Neolithic, Greek, and Jewish expansions. This involved comparing historically documented Phoenician sites with neighboring non-Phoenician sites for the identification of weak but systematic signatures shared by the Phoenician sites that could not readily be explained by chance or by other expansions. From these comparisons, we found that haplogroup J2, in general, and six Y-STR haplotypes, in particular, exhibited a Phoenician signature that contributed > 6% to the modern Phoenician-influenced populations examined. Our methodology can be applied to any historically documented expansion in which contact and noncontact sites can be identified.

    Funded by: Wellcome Trust: 057559

    American journal of human genetics 2008;83;5;633-42

  • High-resolution mapping of expression-QTLs yields insight into human gene regulation.

    Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M and Pritchard JK

    Department of Human Genetics, The University of Chicago, Chicago, IL, USA.

    Recent studies of the HapMap lymphoblastoid cell lines have identified large numbers of quantitative trait loci for gene expression (eQTLs). Reanalyzing these data using a novel Bayesian hierarchical model, we were able to create a surprisingly high-resolution map of the typical locations of sites that affect mRNA levels in cis. Strikingly, we found a strong enrichment of eQTLs in the 250 bp just upstream of the transcription end site (TES), in addition to an enrichment around the transcription start site (TSS). Most eQTLs lie either within genes or close to genes; for example, we estimate that only 5% of eQTLs lie more than 20 kb upstream of the TSS. After controlling for position effects, SNPs in exons are approximately 2-fold more likely than SNPs in introns to be eQTLs. Our results suggest an important role for mRNA stability in determining steady-state mRNA levels, and highlight the potential of eQTL mapping as a high-resolution tool for studying the determinants of gene regulation.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: HG002772, HG02585-01; NIGMS NIH HHS: GM077959

    PLoS genetics 2008;4;10;e1000214

  • Adaptive evolution of UGT2B17 copy-number variation.

    Xue Y, Sun D, Daly A, Yang F, Zhou X, Zhao M, Huang N, Zerjal T, Lee C, Carter NP, Hurles ME and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    The human UGT2B17 gene varies in copy number from zero to two per individual and also differs in mean number between populations from Africa, Europe, and East Asia. We show that such a high degree of geographical variation is unusual and investigate its evolutionary history. This required first reinterpreting the reference sequence in this region of the genome, which is misassembled from the two different alleles separated by an artifactual gap. A corrected assembly identifies the polymorphism as a 117 kb deletion arising by nonallelic homologous recombination between approximately 4.9 kb segmental duplications and allows the deletion breakpoint to be identified. We resequenced approximately 12 kb of DNA spanning the breakpoint in 91 humans from three HapMap and one extended HapMap populations and one chimpanzee. Diversity was unusually high and the time to the most recent common ancestor was estimated at approximately 2.4 or approximately 3.0 million years by two different methods, with evidence of balancing selection in Europe. In contrast, diversity was low in East Asia where a single haplotype predominated, suggesting positive selection for the deletion in this part of the world.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;83;3;337-46

  • Y-chromosomal diversity in Lebanon is structured by recent historical events.

    Zalloua PA, Xue Y, Khalife J, Makhoul N, Debiane L, Platt DE, Royyuru AK, Herrera RJ, Hernanz DF, Blue-Smith J, Wells RS, Comas D, Bertranpetit J, Tyler-Smith C and Genographic Consortium

    The Lebanese American University, Chouran, Beirut 1102 2801, Lebanon.

    Lebanon is an eastern Mediterranean country inhabited by approximately four million people with a wide variety of ethnicities and religions, including Muslim, Christian, and Druze. In the present study, 926 Lebanese men were typed with Y-chromosomal SNP and STR markers, and unusually, male genetic variation within Lebanon was found to be more strongly structured by religious affiliation than by geography. We therefore tested the hypothesis that migrations within historical times could have contributed to this situation. Y-haplogroup J*(xJ2) was more frequent in the putative Muslim source region (the Arabian Peninsula) than in Lebanon, and it was also more frequent in Lebanese Muslims than in Lebanese non-Muslims. Conversely, haplogroup R1b was more frequent in the putative Christian source region (western Europe) than in Lebanon and was also more frequent in Lebanese Christians than in Lebanese non-Christians. The most common R1b STR-haplotype in Lebanese Christians was otherwise highly specific for western Europe and was unlikely to have reached its current frequency in Lebanese Christians without admixture. We therefore suggest that the Islamic expansion from the Arabian Peninsula beginning in the seventh century CE introduced lineages typical of this area into those who subsequently became Lebanese Muslims, whereas the Crusader activity in the 11(th)-13(th) centuries CE introduced western European lineages into Lebanese Christians.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;82;4;873-82

  • Relative impact of nucleotide and copy number variation on gene expression phenotypes.

    Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely, single-nucleotide polymorphisms (SNPs). In recent years, another type of common genetic variation has been characterized, namely, structural variation, including copy number variants (CNVs). To determine the overall contribution of CNVs to complex phenotypes, we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.

    Funded by: Wellcome Trust: 065535, 076113, 077009, 077014, 077046

    Science (New York, N.Y.) 2007;315;5813;848-53

  • Global variation in copy number in the human genome.

    Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

    Funded by: NHLBI NIH HHS: T32 HL007627; Wellcome Trust: 077008, 077009, 077014

    Nature 2006;444;7118;444-54

  • Male demography in East Asia: a north-south contrast in human population expansion times.

    Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, Xu J, Du R, Fu S, Li P, Hurles ME, Yang H and Tyler-Smith C

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

    The human population has increased greatly in size in the last 100,000 years, but the initial stimuli to growth, the times when expansion started, and their variation between different parts of the world are poorly understood. We have investigated male demography in East Asia, applying a Bayesian full-likelihood analysis to data from 988 men representing 27 populations from China, Mongolia, Korea, and Japan typed with 45 binary and 16 STR markers from the Y chromosome. According to our analysis, the northern populations examined all started to expand in number between 34 (18-68) and 22 (12-39) thousand years ago (KYA), before the last glacial maximum at 21-18 KYA, while the southern populations all started to expand between 18 (6-47) and 12 (1-45) KYA, but then grew faster. We suggest that the northern populations expanded earlier because they could exploit the abundant megafauna of the "Mammoth Steppe," while the southern populations could increase in number only when a warmer and more stable climate led to more plentiful plant resources such as tubers.

    Funded by: Wellcome Trust

    Genetics 2006;172;4;2431-9

  • Spread of an inactive form of caspase-12 in humans is due to recent positive selection.

    Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, Sabeti P, Chen Y, Stalker J, Huckle E, Burton J, Leonard S, Rogers J and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, United Kingdom.

    The human caspase-12 gene is polymorphic for the presence or absence of a stop codon, which results in the occurrence of both active (ancestral) and inactive (derived) forms of the gene in the population. It has been shown elsewhere that carriers of the inactive gene are more resistant to severe sepsis. We have now investigated whether the inactive form has spread because of neutral drift or positive selection. We determined its distribution in a worldwide sample of 52 populations and resequenced the gene in 77 individuals from the HapMap Yoruba, Han Chinese, and European populations. There is strong evidence of positive selection from low diversity, skewed allele-frequency spectra, and the predominance of a single haplotype. We suggest that the inactive form of the gene arose in Africa approximately 100-500 thousand years ago (KYA) and was initially neutral or almost neutral but that positive selection beginning approximately 60-100 KYA drove it to near fixation. We further propose that its selective advantage was sepsis resistance in populations that experienced more infectious diseases as population sizes and densities increased.

    Funded by: Wellcome Trust

    American journal of human genetics 2006;78;4;659-70

  • Genome-wide associations of gene expression variation in humans.

    Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavaré S, Deloukas P and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

    Funded by: NHGRI NIH HHS: HG02790, HG03229; NIGMS NIH HHS: GM065509; Wellcome Trust

    PLoS genetics 2005;1;6;e78

  • Recent spread of a Y-chromosomal lineage in northern China and Mongolia.

    Xue Y, Zerjal T, Bao W, Zhu S, Lim SK, Shu Q, Xu J, Du R, Fu S, Li P, Yang H and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    We have identified a Y-chromosomal lineage that is unusually frequent in northeastern China and Mongolia, in which a haplotype cluster defined by 15 Y short tandem repeats was carried by approximately 3.3% of the males sampled from East Asia. The most recent common ancestor of this lineage lived 590 +/- 340 years ago (mean +/- SD), and it was detected in Mongolians and six Chinese minority populations. We suggest that the lineage was spread by Qing Dynasty (1644-1912) nobility, who were a privileged elite sharing patrilineal descent from Giocangga (died 1582), the grandfather of Manchu leader Nurhaci, and whose documented members formed approximately 0.4% of the minority population by the end of the dynasty.

    Funded by: Wellcome Trust

    American journal of human genetics 2005;77;6;1112-6

  • The genetic legacy of the Mongols.

    Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q, Mohyuddin A, Fu S, Li P, Yuldasheva N, Ruzibakiev R, Xu J, Shu Q, Du R, Yang H, Hurles ME, Robinson E, Gerelsaikhan T, Dashnyam B, Mehdi SQ and Tyler-Smith C

    Department of Biochemistry, University of Oxford, Oxford, United Kingdom.

    We have identified a Y-chromosomal lineage with several unusual features. It was found in 16 populations throughout a large region of Asia, stretching from the Pacific to the Caspian Sea, and was present at high frequency: approximately 8% of the men in this region carry it, and it thus makes up approximately 0.5% of the world total. The pattern of variation within the lineage suggested that it originated in Mongolia approximately 1,000 years ago. Such a rapid spread cannot have occurred by chance; it must have been a result of selection. The lineage is carried by likely male-line descendants of Genghis Khan, and we therefore propose that it has spread by a novel form of social selection resulting from their behavior.

    American journal of human genetics 2003;72;3;717-21

  • Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover.

    Dermitzakis ET and Clark AG

    Department of Biology, Institute of Molecular Evolutionary Genetics, Pennsylvania State University, USA.

    Comparisons between human and rodent DNA sequences are widely used for the identification of regulatory regions (phylogenetic footprinting), and the importance of such intergenomic comparisons for promoter annotation is expanding. The efficacy of such comparisons for the identification of functional regulatory elements hinges on the evolutionary dynamics of promoter sequences. Although it is widely appreciated that conservation of sequence motifs may provide a suggestion of function, it is not known as to what proportion of the functional binding sites in humans is conserved in distant species. In this report, we present an analysis of the evolutionary dynamics of transcription factor binding sites whose function had been experimentally verified in promoters of 51 human genes and compare their sequence to homologous sequences in other primate species and rodents. Our results show that there is extensive divergence within the nucleotide sequence of transcription factor binding sites. Using direct experimental data from functional studies in both human and rodents for 20 of the regulatory regions, we estimate that 32%-40% of the human functional sites are not functional in rodents. This is evidence that there is widespread turnover of transcription factor binding sites. These results have important implications for the efficacy of phylogenetic footprinting and the interpretation of the pattern of evolution in regulatory sequences.

    Molecular biology and evolution 2002;19;7;1114-21

Further reading

  • Human Evolutionary Genetics: Origins, Peoples and Disease.

    Jobling MA, Hurles ME and Tyler-Smith C

    Biological psychiatry 2004

Contact the Press Office

Mark Thomson Senior Media and Public Relations Officer
Wellcome Trust Sanger Institute, Hinxton, Cambs, CB10 1SA, UK

Tel +44 (0)1223 492 384
Mobile +44 (0)7753 775 397
Fax +44 (0)1223 494 919

* quick link -