Wellcome Sanger Institute

African Genome Variation Project

Genetic studies of human disease are more challenging to perform in sub-Saharan Africa because genetic diversity is greater than in other populations. This pilot will increase our understanding of African genome variation and enable the design of large-scale genetic association studies in the region.


Studies into the genetic basis of disease in European populations have made major advances in the past few years, yet similar studies in sub-Saharan Africa have been slower to develop. The high level of genetic diversity that exists in populations from sub-Saharan Africa makes genetic associations with disease more difficult to identify.

The African Genome Variation Project aims to collect essential information about the structure of African genomes to provide a basic framework for genetic disease studies in Africa.


Genetic studies of human disease are more challenging to perform in Africa

Over the past few years, there have been major advances in studies that aim to investigate the genetic basis of human disease within European populations. Projects such as the International HapMap Project have revolutionised genetic studies in European and East Asian populations. There are three main factors contributing to advances in the investigation of the genetic basis of human disease: the availability of high-accuracy, high throughput genotyping technologies; large-scale sample sizes; and better understanding of human genome sequence variation. However, similar studies in sub-Saharan Africa are yet to be carried out.

When humans reproduce, ancestral chromosomes get broken up and shuffled through recombination events over each generation. However, some segments of DNA are not fragmented and are shared between multiple individuals. These segments are called haplotypes and can be used to look for genes associated with a specific disease. Haplotypes vary in length and can be associated with either protection against, or increased risk of disease.

Humans today are descended from ancestors who lived in Africa over 150,000 years ago. As human populations migrated out of Africa, they carried with them part, but not all, of the ancestral genetic variation, and as a result, the genetic variants seen outside Africa tend to be subsets of the genetic variants seen in Africa, and therefore genetic diversity or heterogeneity is higher in Africa than in Europe. The long demographic history and variability within and between African populations means that there are more haplotypes, and of shorter length, to analyse than within European populations. As a result, many Europeans share a disease haplotype regardless of where they are from. In contrast, the frequency of a haplotype associated with a disease in Africa may depend on the country and ethnic group of an individual. When investigating the genetic basis of disease, the long conserved haplotypes seen in European populations mean that it is easier to identify those associated with disease risk or protection than in African populations, but it also carries the drawback that it is harder to pinpoint the gene conferring this risk or protection within a European haplotype as the region for analysis is longer.

In addition, because European populations are genetically very similar, it has been relatively straightforward to combine data from different studies to gain a large enough data set to perform powerful enough meta-analyses. The diversity both within and between African populations means that combining data from studies of these populations is more difficult. Finally, most of the commercial chips that we use for sequencing have been developed using samples which are overwhelmingly of non-African descent and are unlikely to cover a significant proportion of common genetic variants in African populations.

Our Project

Genome Research Limited
Principal components analysis of 10 sub-Saharan African populations using the Illumina 2.5M genotype array. The data highlight the marked diversity among populations in Africa.

The African Genome Variation Project is providing a basic framework for genetic disease studies in Africa

It is essential that we understand how African genome structure differs from that in Europe in order to enable studies into the genetic basis of disease in Africa. As part of the African Genome Variation Project, we are genotyping 2.5 million genetic variants in 100 individuals each from over 10 ethnic groups across sub-Saharan Africa to provide a dense network of new information about genetic and genomic structure in African populations and ethnic groups. We are also assessing the feasibility of applying commercial genotyping chip platforms that are currently available to investigate African populations, which is important to facilitate the development of new platforms that give better coverage of this wide genetic diversity.

The African Genome Variation Project operates within a wide collaborative network of scientists, primarily from the African Partnership for Chronic Disease Research. The majority of ethnolinguistic groups are being genotyped at the Sanger Institute, and the 1000 Genomes Project is also contributing data for analysis. The genotyping data generated at the Sanger Institute will be submitted to the European Genotype Archive (EGA) and made available as a public resource for the scientific community in order to facilitate future larger-scale genome-wide association studies. We aim to generate a valuable resource for the scientific community, promote collaboration and synergies among contributing parties and provide a research framework and resource for additional analyses.

We envisage that the data generated by the African Genome Variation Project will increase our currently limited understanding of genetic variation in African populations and allow us to assess the feasibility of using existing commercial genotyping platforms. In turn, we hope this will support the design of wider scale experiments, making next-generation sequencing association studies in Africa a reality. The African Genome Variation Project also has a strong capacity building component. For example, we have already conducted a two-week genetics workshop for analysts from all contributing centres, with an emphasis on statistical genetics skill development.


If you need help or have any queries, please contact us using the details below.

External partners and funders


Wellcome Trust

This work is supported by the Wellcome Trust