Background
The availability of resources like the HapMap with over 4 million mapped variants coupled with advances in array technology have enabled the conduct of large-scale association studies to identify disease genes. We have set up a high-throughput facility for genotyping and expression analysis undertaking genetic studies of common diseases such as type 2 diabetes, cardiovascular, obesity and malaria as well as pharmacogenetic studies of the anticoagulant drug warfarin.
Whole genome association studies in large, well-phenotyped collections are finding disease genes; we have reported several new loci as part of the Wellcome Trust Case Control Consortium. Population samples of healthy individuals on which multiple phenotypic measurements have been collected offer a unique resource to map quantitative traits. In that context gene expression can also be analysed providing a further link between genotype and measured trait. Research interests evolve around both the technical and analytical optimisation of such studies as well as the deployment of molecular tools to further characterise the genomic regions resulting from them; identification of the actual causative variant requires functional analysis.
The challenge in understanding the basis of complex traits includes finding the environmental factors and how they interact with the genetic factors.
Research
The Wellcome Trust Case Control Consortium
The Wellcome Trust Case Control Consortium (WTCCC) was formed in 2004 with the aim to explore the utility, design, execution and analysis of genome-wide association (GWA) studies. It comprises over 50 UK research groups working on the genetics of common human diseases, and collectively covering the fields of clinical, genotyping, informatics and statistical analysis.
The Consortium has undertaken three experiments so far:
1. GWA studies of 2000 cases and 3000 shared controls for seven complex human diseases of major public health importance: bipolar disorder (BD), coronary artery disease (CAD), Crohns disease (CD), hypertension (HT), rheumatoid arthritis (RA), type 1 diabetes (T1D), and type 2 diabetes (T2D).
2. A GWA study for tuberculosis in 1500 cases and 1500 controls, sampled from The Gambia.
3. An association study of 1500 common controls with 1000 cases for each of breast cancer, multiple sclerosis, ankylosing spondylitis, and autoimmune thyroid disease, all typed at around 15,000 mainly non-synonymous SNPs.
By simultaneously studying seven diseases with differing aetiologies, we hoped to develop insights, not only into the specific genetic contributions to each of the diseases, but also into differences in allelic architecture across the diseases. A further major aim was to address important methodological issues of relevance to all GWA studies, such as quality control, design and analysis. In addition to our main association results, we address several of these issues below, including the choice of controls for genetic studies, the extent of population structure within the UK, sample sizes necessary to detect genetic effects of varying sizes, and improvements in genotype calling algorithms and analytical methods.

Scan of an individuals DNA with an array harbouring a genome wide set of 550,000 tag SNP markers (Illumina).
Genotyping
Genetic studies in both haploid and diploid organisms rely heavily on our ability to interrogate accurately polymorphic sites in the genome (single base positions or sequence segments in the genome that occur in two or more alleles in the population). Bi-allelic markers such as Single Nucleotide Polymorphisms (SNPs) and small insertion deletions (INDELS) have largely replaced microsatellites as they are amenable to automation and high level of multiplexing.
We set up a high-throughput facility with the aim to identify and implement a combination of robust genotyping platforms to undertake large-scale genetic analysis. In addition to accuracy, we select platforms on the basis of throughput, cost efficiency, and DNA consumption. The latter is very important in studies with irreplaceable clinical samples available in finite quantities. Research activities focus on issues surrounding sample quality and optimisation of calling algorithms, both of paramount importance since the advent of genome-wide genotyping based on array technology developed by Affymetrix and Illumina. Disease association studies aim to identify variants with differences in frequency between cases and controls which are often small. Thus any bias in genotype calling introduced by either DNA quality and / or the calling algorithm used can lead to false positive associations. Our informatics team is developing tools for automating data handling and quality control as well as data storage and visualisation.
Our Facility runs multiple genotyping platforms including Illumina (Golden Gate and Infinium assays), Affymetrix (Gene Chip), Sequenom (iPLEX and homogeneous mass extend assays) and Taqman (ABI). We have made substantial contributions to major international projects such as those undertaken by The SNP Consortium and the HapMap consortium.
Genotyping quality control
This document outlines aspects of the process and quality control implemented in the genotyping pipeline. Many details are given, but it should be noted that these may vary depending on the nature of each project.
Selected publications
-
The DNA sequence and comparative analysis of human chromosome 10.
Nature 2004;429;6990;375-81
PUBMED: 15164054; DOI: 10.1038/nature02462
-
The DNA sequence and comparative analysis of human chromosome 20.
Nature 2001;414;6866;865-71
PUBMED: 11780052; DOI: 10.1038/414865a

Dr Panos Deloukas