Analytical Genomics of Complex Traits

The Analytical Genomics of Complex Traits group, led by Eleftheria Zeggini, aims to help identify the genetic determinants of complex human traits by using next-generation association studies to detect novel disease loci.

The overarching goal of our research is to elucidate the aetiopathological underpinnings of complex human disease. We carry out large-scale studies to investigate the genetic architecture of complex traits, with a primary focus on cardiometabolic and musculoskeletal phenotypes. In doing so, we identify and address statistical genetics challenges by designing, evaluating and proposing analytical strategies.

[Genome Research Limited]

Background

Advances in high-throughput genotyping and sequencing, coupled with the availability of large sample sets and a better understanding of human genome sequence variation, have made next-generation genetic studies feasible. It is widely accepted that, in the area of complex trait association studies, technology is in danger of outstripping our capacity to analyse and interpret the results obtained. The Analytical Genomics of Complex Traits group conducts next-generation association studies for complex phenotypes, such as type 2 diabetes, obesity and related metabolic traits, and develops appropriate robust methodologies to analyse and interpret the data where necessary.

Our research aims to:

  • Identify novel complex trait loci by carrying out association studies;
  • Develop, extend and make publicly available analytical tools;
  • Understand the molecular mechanisms underpinning disease pathogenesis through functional genomics studies.

Team members

  • Arthur Gilly
  • Konstantinos Hatzikotoulas
  • Laura Huckins
  • Britt Kilian
  • Kalliope Panoutsopoulou
  • Bram Prins
  • Will Rayner
  • Graham Ritchie
  • Loz Southam
  • Julia Steinberg
  • Daniel Suveges
  • Ioanna Tachmazidou
  • Eleftheria Zeggini
  • Our group is supported by Paris Litterick and Danielle Walker (Research Administrator)

Research

Our team conducts next generation genetic association studies in order to identify complex disease loci, and establishes robust analytical strategies to achieve this. We study the role of common, low frequency and rare sequence variants using different approaches. For example, we explore powerful ways to make use of the genetic homogeneity that characterises population isolates in order to identify low frequency variant associations. We also conduct studies to enhance our understanding of the allelic architecture and genetic heterogeneity attributes that underlie populations of African descent, for complex trait signal fine mapping and de novo discovery. Finally, we employ functional genomics approaches to better understand the molecular landscape of disease pathogenesis and to understand disease pathways.

Representative list of ongoing projects:

Genomics of population isolates

Population isolates have well-documented characteristics that can facilitate the detection of rare variants associated with complex traits, including reduced phenotypic, environmental and genetic heterogeneity. We are generating whole genome sequence data across well-phenotyped founder population cohorts (e.g. HELIC study) with a focus on medically relevant metabolic traits.

Genomics of African populations

Building on The African Genome Variation Project, which was set up to facilitate GWAS in African populations, the Uganda 2000 Genomes project has produced whole-genome sequence data on 2000 individuals and dense genotype data on 5000 individuals from Uganda with phenotypic information on more than 50 traits, including cardiometabolic, anthropometric, haematological, liver, renal function and infectious disease traits.

Metabolic traits

Body weight and fat distribution measures are associated with increased risk of cardiometabolic disease. As part of the UK10K study, we are exploring the contribution of low frequency variation to 12 anthropometric traits in up to 63,000 individuals with whole genome sequence or imputed data.

Musculoskeletal disease

We carry out large-scale GWAS in diseases like osteoarthritis (e.g. arcOGEN study) and developmental dysplasia of the hip, and investigate the effect of rare variants through exome chip studies. We also examine radiograph-derived joint morphology traits, and have instigated novel patient collections with access to a wide range of relevant endophenotypes. We have applied functional genomics technologies (quantitative proteomics, RNA sequencing and methylation arrays) to articular chondrocytes from osteoarthritis patients undergoing total joint replacement with the aim of further characterising the genomic basis of disease and to inform the results of association studies.

Method development

We have an active interest in developing methods for the analysis of rare variants, estimating genome-wide significance thresholds for sequence-based studies and for studies in African populations, and work on developing methods and tools to interpret the functional consequences of sequence variants, including the GWAVA algorithm for annotating non-coding variants. We have been refining a variant calling and imputation pipeline for very low-depth sequence data, and are currently developing a meta-analysis method in the presence of sample relatedness or overlap.

Collaborations

We are part of a wide collaborative network and are actively involved in several national and international consortia, including:

  • 1000 Genomes Project
  • 10001 Dalmatians
  • arcOGEN
  • AGVP: African Genome Variation Project
  • ARGO: Arthroplasty and the Genetics of Osteoarthritis
  • DIAGRAM: Diabetes Genetics Replication and Meta-analysis consortium
  • EGG: Early Growth Genetics consortium
  • ENDGAME: Enhancing Development of Genome-wide Association Methods
  • ENGAGE
  • GCAN: Genetics Consortium for Anorexia Nervosa
  • GDC: Global Diabetes Consortium
  • GIANT: Genome-wide International ANThropometrics consortium
  • GlobalBPGen: Global Blood Pressure Genetics consortium
  • GOMAP: Genetic Overlap between Metabolic And Psychiatric diseases
  • HELIC: HELlenic Isolated Cohorts
  • INCHARGE: INternational CHildhood ARthritis GEnetics consortium
  • International T2D 1q Consortium
  • MAGIC: Meta-analysis of Glucose and Insulin traits Consortium
  • QTGEN: QT interval GENetics consortium
  • SILC: Sequencing Isolates Consortium
  • TreatOA
  • UK10K
  • UK Exome Chip consortium
  • UKRAG: UK Rheumatoid Arthritis Genetics consortium
  • UKT2DGC: UK Type 2 Diabetes Genetics Consortium
  • WTCCC: Wellcome Trust Case Control Consortium
  • WTCCC+
  • WTCCC2
  • WTCCC3

Selected Publications

  • The African Genome Variation Project shapes medical genetics in Africa.

    Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, Karthikeyan S, Iles L, Pollard MO, Choudhury A, Ritchie GR, Xue Y, Asimit J, Nsubuga RN, Young EH, Pomilla C, Kivinen K, Rockett K, Kamali A, Doumatey AP, Asiki G, Seeley J, Sisay-Joof F, Jallow M, Tollman S, Mekonnen E, Ekong R, Oljira T, Bradman N, Bojang K, Ramsay M, Adeyemo A, Bekele E, Motala A, Norris SA, Pirie F, Kaleebu P, Kwiatkowski D, Tyler-Smith C, Rotimi C, Zeggini E and Sandhu MS

    Nature 2015;517;7534;327-32

  • Functional annotation of noncoding sequence variants.

    Ritchie GR, Dunham I, Zeggini E and Flicek P

    Nature methods 2014;11;3;294-6

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Nature communications 2014;5;5345

  • In search of low-frequency and rare variants affecting complex traits.

    Panoutsopoulou K, Tachmazidou I and Zeggini E

    Human molecular genetics 2013;22;R1;R16-21

  • A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

    Tachmazidou I, Dedoussis G, Southam L, Farmaki AE, Ritchie GR, Xifara DK, Matchan A, Hatzikotoulas K, Rayner NW, Chen Y, Pollin TI, O'Connell JR, Yerges-Armstrong LM, Kiagiadaki C, Panoutsopoulou K, Schwartzentruber J, Moutsianas L, UK10K consortium, Tsafantakis E, Tyler-Smith C, McVean G, Xue Y and Zeggini E

    Nature communications 2013;4;2872

  • Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study.

    arcOGEN Consortium, arcOGEN Collaborators, Zeggini E, Panoutsopoulou K, Southam L, Rayner NW, Day-Williams AG, Lopes MC, Boraska V, Esko T, Evangelou E, Hoffman A, Houwing-Duistermaat JJ, Ingvarsson T, Jonsdottir I, Jonnson H, Kerkhof HJ, Kloppenburg M, Bos SD, Mangino M, Metrustry S, Slagboom PE, Thorleifsson G, Raine EV, Ratnayake M, Ricketts M, Beazley C, Blackburn H, Bumpstead S, Elliott KS, Hunt SE, Potter SC, Shin SY, Yadav VK, Zhai G, Sherburn K, Dixon K, Arden E, Aslam N, Battley PK, Carluke I, Doherty S, Gordon A, Joseph J, Keen R, Koller NC, Mitchell S, O'Neill F, Paling E, Reed MR, Rivadeneira F, Swift D, Walker K, Watkins B, Wheeler M, Birrell F, Ioannidis JP, Meulenbelt I, Metspalu A, Rai A, Salter D, Stefansson K, Stykarsdottir U, Uitterlinden AG, van Meurs JB, Chapman K, Deloukas P, Ollier WE, Wallis GA, Arden N, Carr A, Doherty M, McCaskie A, Willkinson JM, Ralston SH, Valdes AM, Spector TD and Loughlin J

    Lancet 2012;380;9844;815-23

  • A variant in MCF2L is associated with osteoarthritis.

    Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, Esko T, Estrada K, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, Keis A, Kerkhof HJ, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Loughlin J, McCaskie A, Ollier WE, Ralston SH, Spector TD, Wallis GA, Wilkinson JM, Aslam N, Birell F, Carluke I, Joseph J, Rai A, Reed M, Walker K, arcOGEN Consortium, Doherty SA, Jonsdottir I, Maciewicz RA, Muir KR, Metspalu A, Rivadeneira F, Stefansson K, Styrkarsdottir U, Uitterlinden AG, van Meurs JB, Zhang W, Valdes AM, Doherty M and Zeggini E

    American journal of human genetics 2011;89;3;446-50

  • Next-generation association studies for complex traits.

    Zeggini E

    Nature genetics 2011;43;4;287-8

  • Synthetic associations in the context of genome-wide association scan signals.

    Orozco G, Barrett JC and Zeggini E

    Human molecular genetics 2010;19;R2;R137-44

  • Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis.

    Voight BF, Scott LJ, Steinthorsdottir V, Morris AP, Dina C, Welch RP, Zeggini E, Huth C, Aulchenko YS, Thorleifsson G, McCulloch LJ, Ferreira T, Grallert H, Amin N, Wu G, Willer CJ, Raychaudhuri S, McCarroll SA, Langenberg C, Hofmann OM, Dupuis J, Qi L, Segrè AV, van Hoek M, Navarro P, Ardlie K, Balkau B, Benediktsson R, Bennett AJ, Blagieva R, Boerwinkle E, Bonnycastle LL, Bengtsson Boström K, Bravenboer B, Bumpstead S, Burtt NP, Charpentier G, Chines PS, Cornelis M, Couper DJ, Crawford G, Doney AS, Elliott KS, Elliott AL, Erdos MR, Fox CS, Franklin CS, Ganser M, Gieger C, Grarup N, Green T, Griffin S, Groves CJ, Guiducci C, Hadjadj S, Hassanali N, Herder C, Isomaa B, Jackson AU, Johnson PR, Jørgensen T, Kao WH, Klopp N, Kong A, Kraft P, Kuusisto J, Lauritzen T, Li M, Lieverse A, Lindgren CM, Lyssenko V, Marre M, Meitinger T, Midthjell K, Morken MA, Narisu N, Nilsson P, Owen KR, Payne F, Perry JR, Petersen AK, Platou C, Proença C, Prokopenko I, Rathmann W, Rayner NW, Robertson NR, Rocheleau G, Roden M, Sampson MJ, Saxena R, Shields BM, Shrader P, Sigurdsson G, Sparsø T, Strassburger K, Stringham HM, Sun Q, Swift AJ, Thorand B, Tichet J, Tuomi T, van Dam RM, van Haeften TW, van Herpt T, van Vliet-Ostaptchouk JV, Walters GB, Weedon MN, Wijmenga C, Witteman J, Bergman RN, Cauchi S, Collins FS, Gloyn AL, Gyllensten U, Hansen T, Hide WA, Hitman GA, Hofman A, Hunter DJ, Hveem K, Laakso M, Mohlke KL, Morris AD, Palmer CN, Pramstaller PP, Rudan I, Sijbrands E, Stein LD, Tuomilehto J, Uitterlinden A, Walker M, Wareham NJ, Watanabe RM, Abecasis GR, Boehm BO, Campbell H, Daly MJ, Hattersley AT, Hu FB, Meigs JB, Pankow JS, Pedersen O, Wichmann HE, Barroso I, Florez JC, Frayling TM, Groop L, Sladek R, Thorsteinsdottir U, Wilson JF, Illig T, Froguel P, van Duijn CM, Stefansson K, Altshuler D, Boehnke M, McCarthy MI, MAGIC investigators and GIANT Consortium

    Nature genetics 2010;42;7;579-89

  • An evaluation of statistical approaches to rare variant analysis in genetic association studies.

    Morris AP and Zeggini E

    Genetic epidemiology 2010;34;2;188-93

  • Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes.

    Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RN, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, Ding CJ, Doney AS, Duren WL, Elliott KS, Erdos MR, Frayling TM, Freathy RM, Gianniny L, Grallert H, Grarup N, Groves CJ, Guiducci C, Hansen T, Herder C, Hitman GA, Hughes TE, Isomaa B, Jackson AU, Jørgensen T, Kong A, Kubalanza K, Kuruvilla FG, Kuusisto J, Langenberg C, Lango H, Lauritzen T, Li Y, Lindgren CM, Lyssenko V, Marvelle AF, Meisinger C, Midthjell K, Mohlke KL, Morken MA, Morris AD, Narisu N, Nilsson P, Owen KR, Palmer CN, Payne F, Perry JR, Pettersen E, Platou C, Prokopenko I, Qi L, Qin L, Rayner NW, Rees M, Roix JJ, Sandbaek A, Shields B, Sjögren M, Steinthorsdottir V, Stringham HM, Swift AJ, Thorleifsson G, Thorsteinsdottir U, Timpson NJ, Tuomi T, Tuomilehto J, Walker M, Watanabe RM, Weedon MN, Willer CJ, Wellcome Trust Case Control Consortium, Illig T, Hveem K, Hu FB, Laakso M, Stefansson K, Pedersen O, Wareham NJ, Barroso I, Hattersley AT, Collins FS, Groop L, McCarthy MI, Boehnke M and Altshuler D

    Nature genetics 2008;40;5;638-45

Team

Team members

Arthur Gilly
Statistical Geneticist
Konstantinos Hatzikotoulas
Postdoctoral Fellow
Laura Huckins
PhD Student
Britt Kilian
Informatics & Data Manager
Kalliope Panoutsopoulou
Arthirtis Research UK Career Development Fellow
Bram Prins
Statistical Geneticist
Graham Ritchie
Postdoctoral Fellow
Loz Southam
Senior Staff Scientist
Julia Steinberg
Postdoctoral Fellow
Ioanna Tachmazidou
it3@sanger.ac.ukStatistical Geneticist

Arthur Gilly

- Statistical Geneticist

I graduated from the Grenoble INP - ENSIMAG School of Engineering with a BSc in Engineering and a MSc in Applied Mathematics. First employed in the financial sector, where I dealt with pricing models of complex derivatives, I turned to bioinformatics in 2012 with a position at the CEA/Genoscope in Evry, France.

I joined the Sanger Institute in 2013.

Research

My research revolves around the statistical analysis of high-throughput, high-dimensional biological data. Within the group, my work mainly focuses on low-depth sequence data and the ways it can illuminate our understanding of the aetiology of complex traits.

References

  • TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V and Aury JM

    BackgroundTransposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.ResultsWe present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker.ConclusionsWe show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

    BMC bioinformatics 2014;15;1;377

  • Using population isolates in genetic association studies.

    Hatzikotoulas K, Gilly A and Zeggini E

    The use of genetically isolated populations can empower next-generation association studies. In this review, we discuss the advantages of this approach and review study design and analytical considerations of genetic association studies focusing on isolates. We cite successful examples of using population isolates in association studies and outline potential ways forward.

    Funded by: Wellcome Trust: 098051

    Briefings in functional genomics 2014;13;5;371-7

  • Mapping the epigenetic basis of complex traits.

    Cortijo S, Wardenaar R, Colomé-Tatché M, Gilly A, Etcheverry M, Labadie K, Caillieux E, Hospital F, Aury JM, Wincker P, Roudier F, Jansen RC, Colot V and Johannes F

    Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique (CNRS), UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U 1024, Paris F-75005, France.

    Quantifying the impact of heritable epigenetic variation on complex traits is an emerging challenge in population genetics. Here, we analyze a population of isogenic Arabidopsis lines that segregate experimentally induced DNA methylation changes at hundreds of regions across the genome. We demonstrate that several of these differentially methylated regions (DMRs) act as bona fide epigenetic quantitative trait loci (QTL(epi)), accounting for 60 to 90% of the heritability for two complex traits, flowering time and primary root length. These QTL(epi) are reproducible and can be subjected to artificial selection. Many of the experimentally induced DMRs are also variable in natural populations of this species and may thus provide an epigenetic basis for Darwinian evolution independently of DNA sequence changes.

    Science (New York, N.Y.) 2014;343;6175;1145-8

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10(-26)). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10(-36)). We demonstrate significant power gains in detecting medical trait associations.

    Funded by: European Research Council: 280559; NHGRI NIH HHS: U41HG006941; Wellcome Trust: 098051

    Nature communications 2014;5;5345

Konstantinos Hatzikotoulas

- Postdoctoral Fellow

I graduated with a degree in the Department of Physical Education and Sport Science from Aristotle University of Thessaloniki, Greece (1999). I received my M.Sc. degree in Coaching and Exercise Physiology from Inter University Graduate Program of the Department of Physical Education and Sport Science at Aristotle University of Thessaloniki in 2002. In 2008 I received my Ph.D. degree in Neuromuscular Control from the Aristotle University of Thessaloniki, Greece. I joined the analytical genomics of complex traits group in 2012.

Research

I am currently a postdoctoral fellow and my principal role at the Sanger is to carry out large scale association studies of complex diseases/traits. Mainly I am involved in HELIC, Anorexia Nervosa, African Genome Variation and Developmental Dysplasia of Hip studies but I also contribute to other projects in the team.

References

  • The African Genome Variation Project shapes medical genetics in Africa.

    Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, Karthikeyan S, Iles L, Pollard MO, Choudhury A, Ritchie GR, Xue Y, Asimit J, Nsubuga RN, Young EH, Pomilla C, Kivinen K, Rockett K, Kamali A, Doumatey AP, Asiki G, Seeley J, Sisay-Joof F, Jallow M, Tollman S, Mekonnen E, Ekong R, Oljira T, Bradman N, Bojang K, Ramsay M, Adeyemo A, Bekele E, Motala A, Norris SA, Pirie F, Kaleebu P, Kwiatkowski D, Tyler-Smith C, Rotimi C, Zeggini E and Sandhu MS

    1] Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK [2] Department of Public Health and Primary Care, University of Cambridge, 2 Wort's Causeway, Cambridge, CB1 8RN, UK.

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

    Funded by: Intramural NIH HHS: Z01 HG200362-01, ZIA HG200362-02, ZIA HG200362-03, ZIA HG200362-04, ZIA HG200362-05, ZIA HG200362-06; Medical Research Council: G0600718, G0801566, G0901213-92157, MR/K013491/1; NHGRI NIH HHS: Z01HG200362; Wellcome Trust: 100891, WT077383/Z/05/Z

    Nature 2015;517;7534;327-32

  • Using population isolates in genetic association studies.

    Hatzikotoulas K, Gilly A and Zeggini E

    The use of genetically isolated populations can empower next-generation association studies. In this review, we discuss the advantages of this approach and review study design and analytical considerations of genetic association studies focusing on isolates. We cite successful examples of using population isolates in association studies and outline potential ways forward.

    Funded by: Wellcome Trust: 098051

    Briefings in functional genomics 2014;13;5;371-7

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10(-26)). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10(-36)). We demonstrate significant power gains in detecting medical trait associations.

    Funded by: European Research Council: 280559; NHGRI NIH HHS: U41HG006941; Wellcome Trust: 098051

    Nature communications 2014;5;5345

  • A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

    Tachmazidou I, Dedoussis G, Southam L, Farmaki AE, Ritchie GR, Xifara DK, Matchan A, Hatzikotoulas K, Rayner NW, Chen Y, Pollin TI, O'Connell JR, Yerges-Armstrong LM, Kiagiadaki C, Panoutsopoulou K, Schwartzentruber J, Moutsianas L, UK10K consortium, Tsafantakis E, Tyler-Smith C, McVean G, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.

    Funded by: Department of Health: NF-SI-0510-10268; NHLBI NIH HHS: K01 HL116770, R01 HL104193, U01 HL072515, U01 HL105198; NIDDK NIH HHS: P30 DK072488; Wellcome Trust: 090532, 098051, WT091310

    Nature communications 2013;4;2872

Laura Huckins

- PhD Student

I graduated from Imperial College London in 2011 with an MEng in Biomedical Engineering. My studies covered a broad range of electrical and computational approaches, applied to biological data or clinical challenges. My masters project at Imperial focused on computational neuroscience, and integrated knowledge of programming and statistical theory with investigations into the complex wiring and electrical processes of the cerebral cortex.

Research

My research at the Sanger Institute focuses on Psychiatric Genetics, with a specific focus on the genetics and epigenetics underlying Anorexia Nervosa (AN). I plan to perform a comprehensive study of candidate genes associated with AN, and with diseases showing significant co-morbidity. This study comprises a range of staticstical analyses on next-generation sequencing results, and analysis of mechanistic function in a mouse knock-out model. This project should reveal new genes associated with eating disorders, and provide a functional assessment of the mechanisms and pathways involved.

References

  • A genome-wide association study of anorexia nervosa.

    Boraska V, Franklin CS, Floyd JA, Thornton LM, Huckins LM, Southam L, Rayner NW, Tachmazidou I, Klump KL, Treasure J, Lewis CM, Schmidt U, Tozzi F, Kiezebrink K, Hebebrand J, Gorwood P, Adan RA, Kas MJ, Favaro A, Santonastaso P, Fernández-Aranda F, Gratacos M, Rybakowski F, Dmitrzak-Weglarz M, Kaprio J, Keski-Rahkonen A, Raevuori A, Van Furth EF, Slof-Op 't Landt MC, Hudson JI, Reichborn-Kjennerud T, Knudsen GP, Monteleone P, Kaplan AS, Karwautz A, Hakonarson H, Berrettini WH, Guo Y, Li D, Schork NJ, Komaki G, Ando T, Inoko H, Esko T, Fischer K, Männik K, Metspalu A, Baker JH, Cone RD, Dackor J, DeSocio JE, Hilliard CE, O'Toole JK, Pantel J, Szatkiewicz JP, Taico C, Zerwas S, Trace SE, Davis OS, Helder S, Bühren K, Burghardt R, de Zwaan M, Egberts K, Ehrlich S, Herpertz-Dahlmann B, Herzog W, Imgart H, Scherag A, Scherag S, Zipfel S, Boni C, Ramoz N, Versini A, Brandys MK, Danner UN, de Kovel C, Hendriks J, Koeleman BP, Ophoff RA, Strengman E, van Elburg AA, Bruson A, Clementi M, Degortes D, Forzan M, Tenconi E, Docampo E, Escaramís G, Jiménez-Murcia S, Lissowska J, Rajewski A, Szeszenia-Dabrowska N, Slopien A, Hauser J, Karhunen L, Meulenbelt I, Slagboom PE, Tortorella A, Maj M, Dedoussis G, Dikeos D, Gonidakis F, Tziouvas K, Tsitsika A, Papezova H, Slachtova L, Martaskova D, Kennedy JL, Levitan RD, Yilmaz Z, Huemer J, Koubek D, Merl E, Wagner G, Lichtenstein P, Breen G, Cohen-Woods S, Farmer A, McGuffin P, Cichon S, Giegling I, Herms S, Rujescu D, Schreiber S, Wichmann HE, Dina C, Sladek R, Gambaro G, Soranzo N, Julia A, Marsal S, Rabionet R, Gaborieau V, Dick DM, Palotie A, Ripatti S, Widén E, Andreassen OA, Espeseth T, Lundervold A, Reinvang I, Steen VM, Le Hellard S, Mattingsdal M, Ntalla I, Bencko V, Foretova L, Janout V, Navratilova M, Gallinger S, Pinto D, Scherer SW, Aschauer H, Carlberg L, Schosser A, Alfredsson L, Ding B, Klareskog L, Padyukov L, Courtet P, Guillaume S, Jaussent I, Finan C, Kalsi G, Roberts M, Logan DW, Peltonen L, Ritchie GR, Barrett JC, Wellcome Trust Case Control Consortium 3, Estivill X, Hinney A, Sullivan PF, Collier DA, Zeggini E and Bulik CM

    1] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK [2] University of Split School of Medicine, Split, Croatia.

    Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome-wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2907 cases with AN from 14 countries (15 sites) and 14 860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery data sets. Seventy-six (72 independent) single nucleotide polymorphisms were taken forward for in silico (two data sets) or de novo (13 data sets) replication genotyping in 2677 independent AN cases and 8629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication data sets comprised 5551 AN cases and 21 080 controls. AN subtype analyses (1606 AN restricting; 1445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01 × 10(-7)) in SOX2OT and rs17030795 (P=5.84 × 10(-6)) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76 × 10(-)(6)) between CUL3 and FAM124B and rs1886797 (P=8.05 × 10(-)(6)) near SPATA13. Comparing discovery with replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4 × 10(-6)), strongly suggesting that true findings exist but our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.

    Funded by: Department of Health: NF-SI-0512-10074; Medical Research Council: MR/J006742/1, MR/K500999/1; NCATS NIH HHS: UL1TR000083; NCI NIH HHS: 3P50CA093459, 5P50CA097007, 5R01CA133996, U24 CA074783; NCRR NIH HHS: U54 RR0252204-01; NIAAA NIH HHS: AA-00145, AA-09203, AA-12502, AA15416, K02 AA018755, K02AA018755; NICHD NIH HHS: K12HD001441; NIEHS NIH HHS: 5R01ES011740, R01 ES011740; NIMH NIH HHS: MH066117, MH066122, MH066145, MH066146, MH066147, MH066193, MH0662, MH066287, MH066288, MH066296, R01 MH066117, R01 MH066122, R01 MH066145, R01 MH066146, R01 MH066147, R01 MH066193, R01 MH066287, R01 MH066288, R01 MH066296, T32 MH076694; Wellcome Trust: 090532, WT088827/Z/09, WT088984

    Molecular psychiatry 2014;19;10;1085-94

  • Using ancestry-informative markers to identify fine structure across 15 populations of European origin.

    Huckins LM, Boraska V, Franklin CS, Floyd JA, Southam L, GCAN, WTCCC3, Sullivan PF, Bulik CM, Collier DA, Tyler-Smith C, Zeggini E, Tachmazidou I, GCAN and WTCCC3

    The Wellcome Trust Sanger Institute (WTSI), Hinxton, UK.

    The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.

    Funded by: Department of Health: NF-SI-0510-10214, NF-SI-0512-10074; Medical Research Council: MR/J006742/1, MR/J500355/1, MR/K500999/1; NIA NIH HHS: U19 AG023122; Wellcome Trust: 090532, 098051

    European journal of human genetics : EJHG 2014;22;10;1190-200

  • Olfaction and olfactory-mediated behaviour in psychiatric disease models.

    Huckins LM, Logan DW and Sánchez-Andrade G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Rats and mice are the most widely used species for modelling psychiatric disease. Assessment of these rodent models typically involves the analysis of aberrant behaviour with behavioural interactions often being manipulated to generate the model. Rodents rely heavily on their excellent sense of smell and almost all their social interactions have a strong olfactory component. Therefore, experimental paradigms that exploit these olfactory-mediated behaviours are among the most robust available and are highly prevalent in psychiatric disease research. These include tests of aggression and maternal instinct, foraging, olfactory memory and habituation and the establishment of social hierarchies. An appreciation of the way that rodents regulate these behaviours in an ethological context can assist experimenters to generate better data from their models and to avoid common pitfalls. We describe some of the more commonly used behavioural paradigms from a rodent olfactory perspective and discuss their application in existing models of psychiatric disease. We introduce the four olfactory subsystems that integrate to mediate the behavioural responses and the types of sensory cue that promote them and discuss their control and practical implementation to improve experimental outcomes. In addition, because smell is critical for normal behaviour in rodents and yet olfactory dysfunction is often associated with neuropsychiatric disease, we introduce some tests for olfactory function that can be applied to rodent models of psychiatric disorders as part of behavioural analysis.

    Cell and tissue research 2013;354;1;69-80

Britt Kilian

- Informatics & Data Manager

I graduated from the University of Dortmund, Germany, with a Diploma degree (Dipl.Inf.) in Computer Science and a minor in Medical Science. With a strong interest in bioinformatics I joined the RZPD, a non-for-profit service centre for genomics and proteomics research, in Berlin. In 2007 I started a position as bioinformatician at the Sanger institute, initially in the Zebrafish Genome Analysis team and later as part of the Genome Reference Informatics team.

Research

Since 2014 I am part of the Analytical Genomics of Complex Traits team where I am providing informatics and data management support to the Zeggini Group. I am working primarily on the HELIC and UKHLS projects but am also responsible for operational support.

Kalliope Panoutsopoulou

- Arthirtis Research UK Career Development Fellow

2013 - present Arthritis Research UK Career Development Fellow Wellcome Trust Sanger Institute, Hinxton, UK

2011 - 2013 Staff Scientist Wellcome Trust Sanger Institute, Hinxton, UK

2008 - 2011 Postdoctoral Research Fellow Wellcome Trust Centre for Human Genetics, Oxford, UK & Wellcome Trust Sanger Institute, Hinxton, UK

2002 - 2005 Research Associate UMIST, Manchester, UK

2002 PhD in Genetics University of Manchester, UK

1998 BSc (Hons) in Biochemistry and Applied Molecular Biology UMIST, Manchester, UK

Research

My research interests span the field of common complex disease genetics with an emphasis on the genetics of osteoarthritis (OA) and OA-related traits. OA is a highly heterogeneous disease characterized by various clinical manifestations. The aim of my current research is to identify the genetic determinants of OA development using broad definitions of OA as well as an expanded set of radiographically-derived phenotype definitions closer to the biology of the disease. To achieve this I am using large-scale, high-throughput genotyping approaches followed by genome-wide association studies with a focus on studying the role of low-frequency and rare variants in OA.

References

  • The effect of FTO variation on increased osteoarthritis risk is mediated through body mass index: a Mendelian randomisation study.

    Panoutsopoulou K, Metrustry S, Doherty SA, Laslett LL, Maciewicz RA, Hart DJ, Zhang W, Muir KR, Wheeler M, Cooper C, Spector TD, Cicuttini FM, Jones G, Arden NK, Doherty M, Zeggini E, Valdes AM and arcOGEN Consortium

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Objective: Variation in the fat mass and obesity-associated (FTO) gene influences susceptibility to obesity. A variant in the FTO gene has been implicated in genetic risk to osteoarthritis (OA). We examined the role of the FTO polymorphism rs8044769 in risk of knee and hip OA in cases and controls incorporating body mass index (BMI) information.

    Methods: 5409 knee OA patients, 4355 hip OA patients and up to 5362 healthy controls from 7 independent cohorts from the UK and Australia were genotyped for rs8044769. The association of the FTO variant with OA was investigated in case/control analyses with and without BMI adjustment and in analyses matched for BMI category. A mendelian randomisation approach was employed using the FTO variant as the instrumental variable to evaluate the role of overweight on OA.

    Results: In the meta-analysis of all overweight (BMI≥25) samples versus normal-weight controls irrespective of OA status the association of rs8044769 with overweight is highly significant (OR[CIs] for allele G=1.14 [01.08 to 1.19], p=7.5×10(-7)). A significant association with knee OA is present in the analysis without BMI adjustment (OR[CIs]=1.08[1.02 to 1.14], p=0.009) but the signal fully attenuates after BMI adjustment (OR[CIs]=0.99[0.93 to 1.05], p=0.666). We observe no evidence for association in the BMI-matched meta-analyses. Using mendelian randomisation approaches we confirm the causal role of overweight on OA.

    Conclusions: Our data highlight the contribution of genetic risk to overweight in defining risk to OA but the association is exclusively mediated by the effect on BMI. This is consistent with what is known of the biology of the FTO gene and supports the causative role of high BMI in OA.

    Funded by: Medical Research Council: MC_U122886349, MC_UP_A620_1014

    Annals of the rheumatic diseases 2014;73;12;2082-6

  • Meta-analysis identifies loci affecting levels of the potential osteoarthritis biomarkers sCOMP and uCTX-II with genome wide significance.

    Ramos YF, Metrustry S, Arden N, Bay-Jensen AC, Beekman M, de Craen AJ, Cupples LA, Esko T, Evangelou E, Felson DT, Hart DJ, Ioannidis JP, Karsdal M, Kloppenburg M, Lafeber F, Metspalu A, Panoutsopoulou K, Slagboom PE, Spector TD, van Spil EW, Uitterlinden AG, Zhu Y, arcOGEN Consortium, TreatOA Collaborators, Valdes AM, van Meurs JB and Meulenbelt I

    Department of Molecular Epidemiology, LUMC, Leiden, The Netherlands The Netherlands Genomics Initiative-Sponsored Netherlands Consortium for Healthy Aging, Leiden and Rotterdam, The Netherlands.

    Background: Research for the use of biomarkers in osteoarthritis (OA) is promising, however, adequate discrimination between patients and controls may be hampered due to innate differences. We set out to identify loci influencing levels of serum cartilage oligomeric protein (sCOMP) and urinary C-telopeptide of type II collagen (uCTX-II).

    Methods: Meta-analysis of genome-wide association studies was applied to standardised residuals of sCOMP (N=3316) and uCTX-II (N=4654) levels available in 6 and 7 studies, respectively, from TreatOA. Effects were estimated using a fixed-effects model. Six promising signals were followed up by de novo genotyping in the Cohort Hip and Cohort Knee study (N = 964). Subsequently, their role in OA susceptibility was investigated in large-scale genome-wide association studies meta-analyses for OA. Differential expression of annotated genes was assessed in cartilage.

    Results: Genome-wide significant association with sCOMP levels was found for a SNP within MRC1 (rs691461, p = 1.7 × 10(-12)) and a SNP within CSMD1 associated with variation in uCTX-II levels with borderline genome-wide significance (rs1983474, p = 8.5 × 10(-8)). Indication for association with sCOMP levels was also found for a locus close to the COMP gene itself (rs10038, p = 7.1 × 10(-6)). The latter SNP was subsequently found to be associated with hip OA whereas COMP expression appeared responsive to the OA pathophysiology in cartilage.

    Conclusions: We have identified genetic loci affecting either uCTX-II or sCOMP levels. The genome wide significant association of MRC1 with sCOMP levels was found likely to act independent of OA subtypes. Increased sensitivity of biomarkers with OA may be accomplished by taking genetic variation into account.

    Funded by: Arthritis Research UK: 18030; Wellcome Trust: 091746/Z/10/Z, ref. 079771

    Journal of medical genetics 2014;51;9;596-604

  • Assessment of osteoarthritis candidate genes in a meta-analysis of nine genome-wide association studies.

    Rodriguez-Fontenla C, Calaza M, Evangelou E, Valdes AM, Arden N, Blanco FJ, Carr A, Chapman K, Deloukas P, Doherty M, Esko T, Garcés Aletá CM, Gomez-Reino Carnota JJ, Helgadottir H, Hofman A, Jonsdottir I, Kerkhof HJ, Kloppenburg M, McCaskie A, Ntzani EE, Ollier WE, Oreiro N, Panoutsopoulou K, Ralston SH, Ramos YF, Riancho JA, Rivadeneira F, Slagboom PE, Styrkarsdottir U, Thorsteinsdottir U, Thorleifsson G, Tsezou A, Uitterlinden AG, Wallis GA, Wilkinson JM, Zhai G, Zhu Y, arcOGEN Consortium, Felson DT, Ioannidis JP, Loughlin J, Metspalu A, Meulenbelt I, Stefansson K, van Meurs JB, Zeggini E, Spector TD and Gonzalez A

    Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain.

    Objective: To assess candidate genes for association with osteoarthritis (OA) and identify promising genetic factors and, secondarily, to assess the candidate gene approach in OA.

    Methods: A total of 199 candidate genes for association with OA were identified using Human Genome Epidemiology (HuGE) Navigator. All of their single-nucleotide polymorphisms (SNPs) with an allele frequency of >5% were assessed by fixed-effects meta-analysis of 9 genome-wide association studies (GWAS) that included 5,636 patients with knee OA and 16,972 control subjects and 4,349 patients with hip OA and 17,836 control subjects of European ancestry. An additional 5,921 individuals were genotyped for significantly associated SNPs in the meta-analysis. After correction for the number of independent tests, P values less than 1.58 × 10(-5) were considered significant.

    Results: SNPs at only 2 of the 199 candidate genes (COL11A1 and VEGF) were associated with OA in the meta-analysis. Two SNPs in COL11A1 showed association with hip OA in the combined analysis: rs4907986 (P = 1.29 × 10(-5) , odds ratio [OR] 1.12, 95% confidence interval [95% CI] 1.06-1.17) and rs1241164 (P = 1.47 × 10(-5) , OR 0.82, 95% CI 0.74-0.89). The sex-stratified analysis also showed association of COL11A1 SNP rs4908291 in women (P = 1.29 × 10(-5) , OR 0.87, 95% CI 0.82-0.92); this SNP showed linkage disequilibrium with rs4907986. A single SNP of VEGF, rs833058, showed association with hip OA in men (P = 1.35 × 10(-5) , OR 0.85, 95% CI 0.79-0.91). After additional samples were genotyped, association at one of the COL11A1 signals was reinforced, whereas association at VEGF was slightly weakened.

    Conclusion: Two candidate genes, COL11A1 and VEGF, were significantly associated with OA in this focused meta-analysis. The remaining candidate genes were not associated.

    Funded by: Arthritis Research UK: 20231; Department of Health: NF-SI-0611-10216

    Arthritis & rheumatology (Hoboken, N.J.) 2014;66;4;940-9

  • Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes.

    Ayub Q, Moutsianas L, Chen Y, Panoutsopoulou K, Colonna V, Pagani L, Prokopenko I, Ritchie GR, Tyler-Smith C, McCarthy MI, Zeggini E and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.

    We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.

    Funded by: Department of Health: NF-SI-0611-10099; Wellcome Trust: 090532, 098051, 098381, WT090367MA

    American journal of human genetics 2014;94;2;176-85

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10(-26)). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10(-36)). We demonstrate significant power gains in detecting medical trait associations.

    Funded by: European Research Council: 280559; NHGRI NIH HHS: U41HG006941; Wellcome Trust: 098051

    Nature communications 2014;5;5345

  • The DOT1L rs12982744 polymorphism is associated with osteoarthritis of the hip with genome-wide statistical significance in males.

    Evangelou E, Valdes AM, Castano-Betancourt MC, Doherty M, Doherty S, Esko T, Ingvarsson T, Ioannidis JP, Kloppenburg M, Metspalu A, Ntzani EE, Panoutsopoulou K, Slagboom PE, Southam L, Spector TD, Styrkarsdottir U, Stefanson K, Uitterlinden AG, Wheeler M, Zeggini E, Meulenbelt I, van Meurs JB and arcOGEN consortium, the TREAT-OA consortium

    Funded by: Arthritis Research UK: 18030, 19542; Wellcome Trust: 098051

    Annals of the rheumatic diseases 2013;72;7;1264-5

  • A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

    Tachmazidou I, Dedoussis G, Southam L, Farmaki AE, Ritchie GR, Xifara DK, Matchan A, Hatzikotoulas K, Rayner NW, Chen Y, Pollin TI, O'Connell JR, Yerges-Armstrong LM, Kiagiadaki C, Panoutsopoulou K, Schwartzentruber J, Moutsianas L, UK10K consortium, Tsafantakis E, Tyler-Smith C, McVean G, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.

    Funded by: Department of Health: NF-SI-0510-10268; NHLBI NIH HHS: K01 HL116770, R01 HL104193, U01 HL072515, U01 HL105198; NIDDK NIH HHS: P30 DK072488; Wellcome Trust: 090532, 098051, WT091310

    Nature communications 2013;4;2872

  • Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study.

    arcOGEN Consortium, arcOGEN Collaborators, Zeggini E, Panoutsopoulou K, Southam L, Rayner NW, Day-Williams AG, Lopes MC, Boraska V, Esko T, Evangelou E, Hoffman A, Houwing-Duistermaat JJ, Ingvarsson T, Jonsdottir I, Jonnson H, Kerkhof HJ, Kloppenburg M, Bos SD, Mangino M, Metrustry S, Slagboom PE, Thorleifsson G, Raine EV, Ratnayake M, Ricketts M, Beazley C, Blackburn H, Bumpstead S, Elliott KS, Hunt SE, Potter SC, Shin SY, Yadav VK, Zhai G, Sherburn K, Dixon K, Arden E, Aslam N, Battley PK, Carluke I, Doherty S, Gordon A, Joseph J, Keen R, Koller NC, Mitchell S, O'Neill F, Paling E, Reed MR, Rivadeneira F, Swift D, Walker K, Watkins B, Wheeler M, Birrell F, Ioannidis JP, Meulenbelt I, Metspalu A, Rai A, Salter D, Stefansson K, Stykarsdottir U, Uitterlinden AG, van Meurs JB, Chapman K, Deloukas P, Ollier WE, Wallis GA, Arden N, Carr A, Doherty M, McCaskie A, Willkinson JM, Ralston SH, Valdes AM, Spector TD and Loughlin J

    Wellcome Trust Sanger Institute, Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. eleftheria@sanger.ac.uk

    Background: Osteoarthritis is the most common form of arthritis worldwide and is a major cause of pain and disability in elderly people. The health economic burden of osteoarthritis is increasing commensurate with obesity prevalence and longevity. Osteoarthritis has a strong genetic component but the success of previous genetic studies has been restricted due to insufficient sample sizes and phenotype heterogeneity.

    Methods: We undertook a large genome-wide association study (GWAS) in 7410 unrelated and retrospectively and prospectively selected patients with severe osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We replicated the most promising signals in an independent set of up to 7473 cases and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent.

    Findings: We identified five genome-wide significant loci (binomial test p≤5·0×10(-8)) for association with osteoarthritis and three loci just below this threshold. The strongest association was on chromosome 3 with rs6976 (odds ratio 1·12 [95% CI 1·08-1·16]; p=7·24×10(-11)), which is in perfect linkage disequilibrium with rs11177. This SNP encodes a missense polymorphism within the nucleostemin-encoding gene GNL3. Levels of nucleostemin were raised in chondrocytes from patients with osteoarthritis in functional studies. Other significant loci were on chromosome 9 close to ASTN2, chromosome 6 between FILIP1 and SENP6, chromosome 12 close to KLHDC5 and PTHLH, and in another region of chromosome 12 close to CHST11. One of the signals close to genome-wide significance was within the FTO gene, which is involved in regulation of bodyweight-a strong risk factor for osteoarthritis. All risk variants were common in frequency and exerted small effects.

    Interpretation: Our findings provide insight into the genetics of arthritis and identify new pathways that might be amenable to future therapeutic intervention.

    Funding: arcOGEN was funded by a special purpose grant from Arthritis Research UK.

    Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461, MC_U122886349

    Lancet 2012;380;9844;815-23

  • A variant in MCF2L is associated with osteoarthritis.

    Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, Esko T, Estrada K, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, Keis A, Kerkhof HJ, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Loughlin J, McCaskie A, Ollier WE, Ralston SH, Spector TD, Wallis GA, Wilkinson JM, Aslam N, Birell F, Carluke I, Joseph J, Rai A, Reed M, Walker K, arcOGEN Consortium, Doherty SA, Jonsdottir I, Maciewicz RA, Muir KR, Metspalu A, Rivadeneira F, Stefansson K, Styrkarsdottir U, Uitterlinden AG, van Meurs JB, Zhang W, Valdes AM, Doherty M and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Osteoarthritis (OA) is a prevalent, heritable degenerative joint disease with a substantial public health impact. We used a 1000-Genomes-Project-based imputation in a genome-wide association scan for osteoarthritis (3177 OA cases and 4894 controls) to detect a previously unidentified risk locus. We discovered a small disease-associated set of variants on chromosome 13. Through large-scale replication, we establish a robust association with SNPs in MCF2L (rs11842874, combined odds ratio [95% confidence interval] 1.17 [1.11-1.23], p = 2.1 × 10(-8)) across a total of 19,041 OA cases and 24,504 controls of European descent. This risk locus represents the third established signal for OA overall. MCF2L regulates a nerve growth factor (NGF), and treatment with a humanized monoclonal antibody against NGF is associated with reduction in pain and improvement in function for knee OA patients.

    Funded by: Medical Research Council: G0100594, G0901461, MC_U122886349

    American journal of human genetics 2011;89;3;446-50

  • Insights into the genetic architecture of osteoarthritis from stage 1 of the arcOGEN study.

    Panoutsopoulou K, Southam L, Elliott KS, Wrayner N, Zhai G, Beazley C, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Doherty M, McCaskie A, Ollier WE, Ralston SH, Spector TD, Valdes AM, Wallis GA, Wilkinson JM, Arden E, Battley K, Blackburn H, Blanco FJ, Bumpstead S, Cupples LA, Day-Williams AG, Dixon K, Doherty SA, Esko T, Evangelou E, Felson D, Gomez-Reino JJ, Gonzalez A, Gordon A, Gwilliam R, Halldorsson BV, Hauksson VB, Hofman A, Hunt SE, Ioannidis JP, Ingvarsson T, Jonsdottir I, Jonsson H, Keen R, Kerkhof HJ, Kloppenburg MG, Koller N, Lakenberg N, Lane NE, Lee AT, Metspalu A, Meulenbelt I, Nevitt MC, O'Neill F, Parimi N, Potter SC, Rego-Perez I, Riancho JA, Sherburn K, Slagboom PE, Stefansson K, Styrkarsdottir U, Sumillera M, Swift D, Thorsteinsdottir U, Tsezou A, Uitterlinden AG, van Meurs JB, Watkins B, Wheeler M, Mitchell S, Zhu Y, Zmuda JM, arcOGEN Consortium, Zeggini E and Loughlin J

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.

    Objectives: The genetic aetiology of osteoarthritis has not yet been elucidated. To enable a well-powered genome-wide association study (GWAS) for osteoarthritis, the authors have formed the arcOGEN Consortium, a UK-wide collaborative effort aiming to scan genome-wide over 7500 osteoarthritis cases in a two-stage genome-wide association scan. Here the authors report the findings of the stage 1 interim analysis.

    Methods: The authors have performed a genome-wide association scan for knee and hip osteoarthritis in 3177 cases and 4894 population-based controls from the UK. Replication of promising signals was carried out in silico in five further scans (44,449 individuals), and de novo in 14 534 independent samples, all of European descent.

    Results: None of the association signals the authors identified reach genome-wide levels of statistical significance, therefore stressing the need for corroboration in sample sets of a larger size. Application of analytical approaches to examine the allelic architecture of disease to the stage 1 genome-wide association scan data suggests that osteoarthritis is a highly polygenic disease with multiple risk variants conferring small effects.

    Conclusions: Identifying loci conferring susceptibility to osteoarthritis will require large-scale sample sizes and well-defined phenotypes to minimise heterogeneity.

    Funded by: Arthritis Research UK: 17489; Medical Research Council: G0901461, MC_U122886349; NIAMS NIH HHS: K24 AR048841, R01 AR052000

    Annals of the rheumatic diseases 2011;70;5;864-7

Bram Prins

- Statistical Geneticist

I completed a Master’s degree in Medical Biology at the University of Groningen in 2008. Hereafter, I spent one and a half years working at the Institut Pasteur Korea studying host-pathogen interactions (M. tuberculosis). In 2010, I commenced my PhD in the Unit of Genetic Epidemiology and Bioinformatics , University Medical Center Groningen (UMCG). My projects mainly involved investigating the role of common variants in determining cytokine levels using genomes-wide approaches. In 2013 I took up a postdoctoral position in the Cardiogenetics Lab at St George's University of London aiming to identify rare variants involved in ECG traits and arrhythmias.

Research

I am currently working on the genetic component of the Understanding Society (https://www.understandingsociety.ac.uk) project, where my role is to identify (rare) variants underpinning levels of various biomarkers in blood. Additionally, through my involvement the Genetic Overlap between Metabolic and Psychiatric diseases (GOMAP) I am aiming to identify shared genetic architecture between schizophrenia and type 2 diabetes using various approaches. The other part of my role is contributing genetic analyses to various consortia that our group is involved, using various cohort data resources that we have in-house. Additionally, I perform quality control on newly generated genotype data.

References

  • An in silico Post-GWAS Analysis of C-Reactive Protein Loci Suggests an Important Role for Interferons.

    Vaez A, Jansen R, Prins BP, Hottenga JJ, de Geus EJ, Boomsma DI, Penninx BW, Nolte IM, Snieder H and Alizadeh BZ

    Department of Epidemiology, University of Groningen & University Medical Center Groningen, Groningen, the Netherlands a.vaez@umcg.nl.

    Background: -Genome-wide association studies (GWASs) have successfully identified a number of Single Nucleotide Polymorphisms (SNPs) associated with serum levels of C-reactive protein (CRP). An important limitation of GWASs is that the identified variants merely flag the nearby genomic region and do not necessarily provide a direct link to the biological mechanisms underlying their corresponding phenotype. Here we apply a bioinformatics-based approach to uncover the functional characteristics of the 18 SNPs that had previously been associated with CRP at a genome-wide significant level.

    Methods and results: -In the first phase of 'in silico' sequencing we explore the vicinity of GWAS SNPs to identify all linked variants. In the second phase of eQTL analysis, we attempt to identify all nearby genes whose expression levels are associated with the corresponding GWAS SNPs. These two phases generate a number of relevant genes that serve as input to the next phase of functional network analysis. Our in silico sequencing analysis using 1000 Genomes Project data identified seven non-synonymous SNPs which are in moderate to high LD (r(2)>0.5) with the GWAS SNPs. Our eQTL analysis, which was based on one of the largest single datasets of genome-wide expression probes (n>5,000) identified 23 significantly associated expression probes belonging to 15 genes (FDR<0.01). The final phase of functional network analysis revealed 93 significantly enriched biological processes (FDR<0.01).

    Conclusions: -Our post-GWAS analysis of CRP GWAS SNPs confirmed the previously known overlap between CRP and lipids biology. Additionally, it suggested an important role for interferons in the metabolism of CRP.

    Circulation. Cardiovascular genetics 2015

  • Role of common and rare variants in SCN10A: results from the Brugada syndrome QRS locus gene discovery collaborative study.

    Behr ER, Savio-Galimberti E, Barc J, Holst AG, Petropoulou E, Prins BP, Jabbari J, Torchio M, Berthet M, Mizusawa Y, Yang T, Nannenberg EA, Dagradi F, Weeke P, Bastiaenan R, Ackerman MJ, Haunso S, Leenhardt A, Kääb S, Probst V, Redon R, Sharma S, Wilde A, Tfelt-Hansen J, Schwartz P, Roden DM, Bezzina CR, Olesen M, Darbar D, Guicheney P, Crotti L, UK10K Consortium and Jamshidi Y

    Human Genetics Research Centre, ICCS, St George's University of London, London SW17 0RE, UK yjamshid@sgul.ac.uk ebehr@sgul.ac.uk.

    Aims: Brugada syndrome (BrS) remains genetically heterogeneous and is associated with slowed cardiac conduction. We aimed to identify genetic variation in BrS cases at loci associated with QRS duration.

    Methods and results: A multi-centre study sequenced seven candidate genes (SCN10A, HAND1, PLN, CASQ2, TKT, TBX3, and TBX5) in 156 Caucasian SCN5A mutation-negative BrS patients (80% male; mean age 48) with symptoms (64%) and/or a family history of sudden death (47%) or BrS (18%). Forty-nine variants were identified: 18 were rare (MAF <1%) and non-synonymous; and 11/18 (61.1%), mostly in SCN10A, were predicted as pathogenic using multiple bioinformatics tools. Allele frequencies were compared with the Exome Sequencing and UK10K Projects. SKAT methods tested rare variation in SCN10A finding no statistically significant difference between cases and controls. Co-segregation analysis was possible for four of seven probands carrying a novel pathogenic variant. Only one pedigree (I671V/G1299A in SCN10A) showed co-segregation. The SCN10A SNP V1073 was, however, associated strongly with BrS [66.9 vs. 40.1% (UK10K) OR (95% CI) = 3.02 (2.35-3.87), P = 8.07 × 10-19]. Voltage-clamp experiments for NaV1.8 were performed for SCN10A common variants V1073, A1073, and rare variants of interest: A200V and I671V. V1073, A200V and I671V, demonstrated significant reductions in peak INa compared with ancestral allele A1073 (rs6795970).

    Conclusion: Rare variants in the screened QRS-associated genes (including SCN10A) are not responsible for a significant proportion of SCN5A mutation negative BrS. The common SNP SCN10A V1073 was strongly associated with BrS and demonstrated loss of NaV1.8 function, as did rare variants in isolated patients.

    Funded by: British Heart Foundation: PG/12/38/29615; NHLBI NIH HHS: R01 HL092217

    Cardiovascular research 2015

  • Sequencing of SCN5A identifies rare and common variants associated with cardiac conduction: Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium.

    Magnani JW, Brody JA, Prins BP, Arking DE, Lin H, Yin X, Liu CT, Morrison AC, Zhang F, Spector TD, Alonso A, Bis JC, Heckbert SR, Lumley T, Sitlani CM, Cupples LA, Lubitz SA, Soliman EZ, Pulit SL, Newton-Cheh C, O'Donnell CJ, Ellinor PT, Benjamin EJ, Muzny DM, Gibbs RA, Santibanez J, Taylor HA, Rotter JI, Lange LA, Psaty BM, Jackson R, Rich SS, Boerwinkle E, Jamshidi Y, Sotoodehnia N, CHARGE Consortium, NHLBI Exome Sequencing Project (ESP) and UK10K

    Background: The cardiac sodium channel SCN5A regulates atrioventricular and ventricular conduction. Genetic variants in this gene are associated with PR and QRS intervals. We sought to characterize further the contribution of rare and common coding variation in SCN5A to cardiac conduction.

    Methods and results: In Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Targeted Sequencing Study, we performed targeted exonic sequencing of SCN5A (n=3699, European ancestry individuals) and identified 4 common (minor allele frequency >1%) and 157 rare variants. Common and rare SCN5A coding variants were examined for association with PR and QRS intervals through meta-analysis of European ancestry participants from CHARGE, National Heart, Lung, and Blood Institute's Exome Sequencing Project (n=607), and the UK10K (n=1275) and by examining Exome Sequencing Project African ancestry participants (n=972). Rare coding SCN5A variants in aggregate were associated with PR interval in European and African ancestry participants (P=1.3×10(-3)). Three common variants were associated with PR and QRS interval duration among European ancestry participants and one among African ancestry participants. These included 2 well-known missense variants: rs1805124 (H558R) was associated with PR and QRS shortening in European ancestry participants (P=6.25×10(-4) and P=5.2×10(-3), respectively) and rs7626962 (S1102Y) was associated with PR shortening in those of African ancestry (P=2.82×10(-3)). Among European ancestry participants, 2 novel synonymous variants, rs1805126 and rs6599230, were associated with cardiac conduction. Our top signal, rs1805126 was associated with PR and QRS lengthening (P=3.35×10(-7) and P=2.69×10(-4), respectively) and rs6599230 was associated with PR shortening (P=2.67×10(-5)).

    Conclusions: By sequencing SCN5A, we identified novel common and rare coding variants associated with cardiac conduction.

    Funded by: British Heart Foundation: PG/12/38/29615; Department of Health: NF-SI-0510-10268; NCATS NIH HHS: UL1 TR000124; NHGRI NIH HHS: U54 HG003273; NHLBI NIH HHS: 1K24HL105780, 1R01 HL102214L, 1R01HL092577, 1R01HL104156, 1RC1HL101056, 5RC2HL102419, HHSN268200800007C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, HHSN268201200036C, HL080295, HL087652, HL105756, K23 HL114724, K23HL11472, K24 HL105780, N01-HC-25195, N01HC25195, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, N02 HL64278, N02-HL-6-4278, R01 HL080295, R01 HL087652, R01 HL092577, R01 HL102214, R01 HL104156, R01 HL105756, RC1 HL101056, RC2 HL-102923, RC2 HL-102924, RC2 HL-102925, RC2 HL-102926, RC2 HL-103010, RC2 HL102419, RC2 HL102923, RC2 HL102924, RC2 HL102925, RC2 HL102926, RC2 HL103010, U01 HL080295; NIA NIH HHS: 1R03AG045075, AG023629, R01 AG023629, R03 AG045075, R56 AG023629; NIDA NIH HHS: 5R21DA027021, R21 DA027021; NIDDK NIH HHS: P30 DK063491; NINDS NIH HHS: 6R01-NS 17950NIH, R01 NS017950; PHS HHS: HHSN268200800007C, HHSN2682011000010C, HHSN2682011000011C, HHSN2682011000012C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201200036C; Wellcome Trust: 100140, HEALTH-F4-2007-201413, WT091310

    Circulation. Cardiovascular genetics 2014;7;3;365-73

  • QCGWAS: A flexible R package for automated quality control of genome-wide association results.

    van der Most PJ, Vaez A, Prins BP, Munoz ML, Snieder H, Alizadeh BZ and Nolte IM

    Department of Epidemiology, University of Groningen, University Medical Center Groningen, P.O. box 30.001, 9700 RB Groningen, The Netherlands and Cardiogenetics Lab, Human Genetics Research Centre, St. George's Hospital Medical School, London SW17 0RE, UK.

    Summary: QCGWAS is an R package that automates the quality control of genome-wide association result files. Its main purpose is to facilitate the quality control of a large number of such files before meta-analysis. Alternatively, it can be used by individual cohorts to check their own result files. QCGWAS is flexible and has a wide range of options, allowing rapid generation of high-quality input files for meta-analysis of genome-wide association studies.

    Availability: http://cran.r-project.org/web/packages/QCGWAS CONTACT: i.m.nolte@umcg.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2014

  • Genetics of coronary artery disease: genome-wide association studies and beyond.

    Prins BP, Lagou V, Asselbergs FW, Snieder H and Fu J

    Unit of Genetic Epidemiology and Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

    Genome-wide association (GWA) studies on coronary artery disease (CAD) have been very successful, identifying a total of 32 susceptibility loci so far. Although these loci have provided valuable insights into the etiology of CAD, their cumulative effect explains surprisingly little of the total CAD heritability. In this review, we first highlight and describe the type of genetic variants potentially underlying the missing heritability of CAD: single nucleotide polymorphisms (SNPs) or structural variants, each of which may either be common or rare. Although finding missing heritability is important, we further argue in this review that it constitutes only a first step towards a fuller understanding of the etiology of CAD development. To close the gap between the genotype and phenotype, we propose a systems genetics approach in the post-GWA study era. This approach that integrates genetic, epigenetic, transcriptomic, proteomic, metabolic and intermediate outcome variables has potential to significantly aid the understanding of CAD etiology.

    Atherosclerosis 2012;225;1;1-10

  • Discovery and fine mapping of serum protein loci through transethnic meta-analysis.

    Franceschini N, van Rooij FJ, Prins BP, Feitosa MF, Karakas M, Eckfeldt JH, Folsom AR, Kopp J, Vaez A, Andrews JS, Baumert J, Boraska V, Broer L, Hayward C, Ngwa JS, Okada Y, Polasek O, Westra HJ, Wang YA, Del Greco M F, Glazer NL, Kapur K, Kema IP, Lopez LM, Schillert A, Smith AV, Winkler CA, Zgaga L, LifeLines Cohort Study, Bandinelli S, Bergmann S, Boban M, Bochud M, Chen YD, Davies G, Dehghan A, Ding J, Doering A, Durda JP, Ferrucci L, Franco OH, Franke L, Gunjaca G, Hofman A, Hsu FC, Kolcic I, Kraja A, Kubo M, Lackner KJ, Launer L, Loehr LR, Li G, Meisinger C, Nakamura Y, Schwienbacher C, Starr JM, Takahashi A, Torlak V, Uitterlinden AG, Vitart V, Waldenberger M, Wild PS, Kirin M, Zeller T, Zemunik T, Zhang Q, Ziegler A, Blankenberg S, Boerwinkle E, Borecki IB, Campbell H, Deary IJ, Frayling TM, Gieger C, Harris TB, Hicks AA, Koenig W, O' Donnell CJ, Fox CS, Pramstaller PP, Psaty BM, Reiner AP, Rotter JI, Rudan I, Snieder H, Tanaka T, van Duijn CM, Vollenweider P, Waeber G, Wilson JF, Witteman JC, Wolffenbuttel BH, Wright AF, Wu Q, Liu Y, Jenny NS, North KE, Felix JF, Alizadeh BZ, Cupples LA, Perry JR and Morris AP

    Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27599, USA. noraf@unc.edu

    Many disorders are associated with altered serum protein concentrations, including malnutrition, cancer, and cardiovascular, kidney, and inflammatory diseases. Although these protein concentrations are highly heritable, relatively little is known about their underlying genetic determinants. Through transethnic meta-analysis of European-ancestry and Japanese genome-wide association studies, we identified six loci at genome-wide significance (p < 5 × 10(-8)) for serum albumin (HPN-SCN1B, GCKR-FNDC4, SERPINF2-WDR81, TNFRSF11A-ZCCHC2, FRMD5-WDR76, and RPS11-FCGRT, in up to 53,190 European-ancestry and 9,380 Japanese individuals) and three loci for total protein (TNFRS13B, 6q21.3, and ELL2, in up to 25,539 European-ancestry and 10,168 Japanese individuals). We observed little evidence of heterogeneity in allelic effects at these loci between groups of European and Japanese ancestry but obtained substantial improvements in the resolution of fine mapping of potential causal variants by leveraging transethnic differences in the distribution of linkage disequilibrium. We demonstrated a functional role for the most strongly associated serum albumin locus, HPN, for which Hpn knockout mice manifest low plasma albumin concentrations. Other loci associated with serum albumin harbor genes related to ribosome function, protein translation, and proteasomal degradation, whereas those associated with serum total protein include genes related to immune function. Our results highlight the advantages of transethnic meta-analysis for the discovery and fine mapping of complex trait loci and have provided initial insights into the underlying genetic architecture of serum protein concentrations and their association with human disease.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Chief Scientist Office: CZB/4/710; Medical Research Council: G0700704, MC_PC_U127561128, MC_U127561128; NHGRI NIH HHS: U01HG004803; NHLBI NIH HHS: R01 HL105756, R01HL089651; NIDDK NIH HHS: P30 DK063491; NIGMS NIH HHS: T32 GM074905; PHS HHS: AHA0675001N; Wellcome Trust: 090532, 098017, WT064890, WT081682, WT090532, WT098017

    American journal of human genetics 2012;91;4;744-53

  • Beyond genome-wide association studies: new strategies for identifying genetic determinants of hypertension.

    Wang X, Prins BP, Sõber S, Laan M and Snieder H

    Georgia Prevention Institute, Department of Pediatrics, Medical College of Georgia, Augusta, GA, USA. xwang@georgiahealth.edu

    Genetic linkage and association methods have long been the most important tools for gene identification in humans. These approaches can either be hypothesis-based (i.e., candidate-gene studies) or hypothesis-free (i.e., genome-wide studies). The first part of this review offers an overview of the latest successes in gene finding for blood pressure (BP) and essential hypertension using these DNA sequence-based discovery techniques. We further emphasize the importance of post-genome-wide association study (post-GWAS) analysis, which aims to prioritize genetic variants for functional follow-up. Whole-genome next-generation sequencing will eventually be necessary to provide a more comprehensive picture of all DNA variants affecting BP and hypertension. The second part of this review discusses promising novel approaches that move beyond the DNA sequence and aim to discover BP genes that are differentially regulated by epigenetic mechanisms, including microRNAs, histone modification, and methylation.

    Funded by: NHLBI NIH HHS: R01 HL104125

    Current hypertension reports 2011;13;6;442-51

  • Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma.

    Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, Holm H, Sanna S, Kavousi M, Baumeister SE, Coin LJ, Deng G, Gieger C, Heard-Costa NL, Hottenga JJ, Kühnel B, Kumar V, Lagou V, Liang L, Luan J, Vidal PM, Mateo Leach I, O'Reilly PF, Peden JF, Rahmioglu N, Soininen P, Speliotes EK, Yuan X, Thorleifsson G, Alizadeh BZ, Atwood LD, Borecki IB, Brown MJ, Charoen P, Cucca F, Das D, de Geus EJ, Dixon AL, Döring A, Ehret G, Eyjolfsson GI, Farrall M, Forouhi NG, Friedrich N, Goessling W, Gudbjartsson DF, Harris TB, Hartikainen AL, Heath S, Hirschfield GM, Hofman A, Homuth G, Hyppönen E, Janssen HL, Johnson T, Kangas AJ, Kema IP, Kühn JP, Lai S, Lathrop M, Lerch MM, Li Y, Liang TJ, Lin JP, Loos RJ, Martin NG, Moffatt MF, Montgomery GW, Munroe PB, Musunuru K, Nakamura Y, O'Donnell CJ, Olafsson I, Penninx BW, Pouta A, Prins BP, Prokopenko I, Puls R, Ruokonen A, Savolainen MJ, Schlessinger D, Schouten JN, Seedorf U, Sen-Chowdhry S, Siminovitch KA, Smit JH, Spector TD, Tan W, Teslovich TM, Tukiainen T, Uitterlinden AG, Van der Klauw MM, Vasan RS, Wallace C, Wallaschofski H, Wichmann HE, Willemsen G, Würtz P, Xu C, Yerges-Armstrong LM, Alcohol Genome-wide Association (AlcGen) Consortium, Diabetes Genetics Replication and Meta-analyses (DIAGRAM+) Study, Genetic Investigation of Anthropometric Traits (GIANT) Consortium, Global Lipids Genetics Consortium, Genetics of Liver Disease (GOLD) Consortium, International Consortium for Blood Pressure (ICBP-GWAS), Meta-analyses of Glucose and Insulin-Related Traits Consortium (MAGIC), Abecasis GR, Ahmadi KR, Boomsma DI, Caulfield M, Cookson WO, van Duijn CM, Froguel P, Matsuda K, McCarthy MI, Meisinger C, Mooser V, Pietiläinen KH, Schumann G, Snieder H, Sternberg MJ, Stolk RP, Thomas HC, Thorsteinsdottir U, Uda M, Waeber G, Wareham NJ, Waterworth DM, Watkins H, Whitfield JB, Witteman JC, Wolffenbuttel BH, Fox CS, Ala-Korpela M, Stefansson K, Vollenweider P, Völzke H, Schadt EE, Scott J, Järvelin MR, Elliott P and Kooner JS

    Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London, UK. john.chambers@ic.ac.uk

    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.

    Funded by: British Heart Foundation: FS/10/011/27881, PG/09/002/26056, PG/09/023/26806, RG/07/008/23674; Cancer Research UK: 14136; Department of Health: PHCS/C4/4/016; Intramural NIH HHS: Z99 DK999999, ZIA DK075013-05, ZIA DK075013-07; Medical Research Council: G0100222, G0401527, G0601653, G0601966, G0700342, G0700931, G0701863, G0801056B, G0902037, G1000143, G19/35, G8802774, G9521010, G9817803B, MC_PC_U127561128, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647; NIAAA NIH HHS: K05 AA017688; NIDDK NIH HHS: T32 DK007191; Wellcome Trust: 090532

    Nature genetics 2011;43;11;1131-8

  • Poor replication of candidate genes for major depressive disorder using genome-wide association data.

    Bosker FJ, Hartman CA, Nolte IM, Prins BP, Terpstra P, Posthuma D, van Veen T, Willemsen G, DeRijk RH, de Geus EJ, Hoogendijk WJ, Sullivan PF, Penninx BW, Boomsma DI, Snieder H and Nolen WA

    Department of Psychiatry, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands. f.j.bosker@psy.umcg.nl

    Data from the Genetic Association Information Network (GAIN) genome-wide association study (GWAS) in major depressive disorder (MDD) were used to explore previously reported candidate gene and single-nucleotide polymorphism (SNP) associations in MDD. A systematic literature search of candidate genes associated with MDD in case-control studies was performed before the results of the GAIN MDD study became available. Measured and imputed candidate SNPs and genes were tested in the GAIN MDD study encompassing 1738 cases and 1802 controls. Imputation was used to increase the number of SNPs from the GWAS and to improve coverage of SNPs in the candidate genes selected. Tests were carried out for individual SNPs and the entire gene using different statistical approaches, with permutation analysis as the final arbiter. In all, 78 papers reporting on 57 genes were identified, from which 92 SNPs could be mapped. In the GAIN MDD study, two SNPs were associated with MDD: C5orf20 (rs12520799; P=0.038; odds ratio (OR) AT=1.10, 95% CI 0.95-1.29; OR TT=1.21, 95% confidence interval (CI) 1.01-1.47) and NPY (rs16139; P=0.034; OR C allele=0.73, 95% CI 0.55-0.97), constituting a direct replication of previously identified SNPs. At the gene level, TNF (rs76917; OR T=1.35, 95% CI 1.13-1.63; P=0.0034) was identified as the only gene for which the association with MDD remained significant after correction for multiple testing. For SLC6A2 (norepinephrine transporter (NET)) significantly more SNPs (19 out of 100; P=0.039) than expected were associated while accounting for the linkage disequilibrium (LD) structure. Thus, we found support for involvement in MDD for only four genes. However, given the number of candidate SNPs and genes that were tested, even these significant may well be false positives. The poor replication may point to publication bias and false-positive findings in previous candidate gene studies, and may also be related to heterogeneity of the MDD phenotype as well as contextual genetic or environmental factors.

    Molecular psychiatry 2011;16;5;516-32

  • Restoring E-cadherin-mediated cell-cell adhesion increases PTEN protein level and stability in human breast carcinoma cells.

    Li Z, Wang L, Zhang W, Fu Y, Zhao H, Hu Y, Prins BP and Zha X

    Key Laboratory of Glycoconjugate Research, Ministry of Health, Department of Biochemistry and Molecular Biology, Shanghai Medical College, Fudan University, 138 Yi Xue Yuan Road, Shanghai 200032, China.

    The phosphatase and tensin homolog deleted on chromosome 10 (PTEN) is a well-characterized tumor suppressor that negatively regulates cell growth and survival. Despite the critical role of PTEN in cell signaling, the mechanisms of its regulation are still under investigation. We reported here that PTEN expression could be controlled by overexpression or knock-down of E-cadherin in several mammary carcinoma cell lines. Furthermore, we showed that the accumulation of PTEN protein in E-cadherin overexpressing cells was due to increased PTEN protein stability rather than the regulation of its transcription. The proteasome-dependent PTEN degradation pathway was impaired after restoring E-cadherin expression. Moreover, maintenance of E-cadherin mediated cell-cell adhesion was necessary for its regulating PTEN. Altogether, our results suggested that E-cadherin mediated cell-cell adhesion was essential for preventing the proteasome degradation of PTEN, which might explain how breast carcinoma cells which lost cell-cell contact proliferate rapidly and are prone to metastasis.

    Biochemical and biophysical research communications 2007;363;1;165-70

Graham Ritchie

- Postdoctoral Fellow

I originally trained in Computer Science at St Andrews, and then completed a Masters in Artificial Intelligence and a PhD in Informatics in Edinburgh. After a brief stint in the civil service I moved into bioinformatics, initially at the Sanger Institute where I worked in the Vertebrate Genome Analysis team on tools and pipelines for genome annotation. I then moved to the EBI to work in the Ensembl variation team, where I focussed on developing tools to annotate genetic variation.

Research

I am currently a postdoctoral fellow joint between the Sanger Institute and the EBI in the ESPOD scheme. I work on developing methods and tools to interpret genomic sequence variation, with a particular focus on non-coding variation. I recently developed the GWAVA algorithm which aims to identify likely functional regulatory variants. I also contribute to variant annotation and interpretation for other projects in the team. I have contributed to several international consortia focussed on identifying and characterising genetic variation in humans, including the 1000 genomes and UK10K projects.

References

  • The Ensembl REST API: Ensembl Data for Any Language.

    Yates A, Beal K, Keenan S, McLaren W, Pignatelli M, Ritchie GR, Ruffier M, Taylor K, Vullo A and Flicek P

    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language.

    Availability and implementation: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.

    Funded by: Wellcome Trust: 095908, WT095908

    Bioinformatics (Oxford, England) 2015;31;1;143-5

  • Functional annotation of noncoding sequence variants.

    Ritchie GR, Dunham I, Zeggini E and Flicek P

    1] European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK. [2] Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult. Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.

    Funded by: Wellcome Trust: 095908, 098051

    Nature methods 2014;11;3;294-6

  • Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes.

    Ayub Q, Moutsianas L, Chen Y, Panoutsopoulou K, Colonna V, Pagani L, Prokopenko I, Ritchie GR, Tyler-Smith C, McCarthy MI, Zeggini E and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.

    We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.

    Funded by: Department of Health: NF-SI-0611-10099; Wellcome Trust: 090532, 098051, 098381, WT090367MA

    American journal of human genetics 2014;94;2;176-85

  • Computational approaches to interpreting genomic sequence variation.

    Ritchie GR and Flicek P

    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK ; Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK.

    Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.

    Funded by: Wellcome Trust: 095908

    Genome medicine 2014;6;10;87

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10(-26)). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10(-36)). We demonstrate significant power gains in detecting medical trait associations.

    Funded by: European Research Council: 280559; NHGRI NIH HHS: U41HG006941; Wellcome Trust: 098051

    Nature communications 2014;5;5345

  • Integrative annotation of variants from 1092 humans: application to cancer genomics.

    Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüs ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GR, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, 1000 Genomes Project Consortium, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C and Gerstein M

    Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.

    Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.

    Funded by: NCATS NIH HHS: UL1 TR000457; NCI NIH HHS: CA167824, R01 CA166661, R01CA152057, U01 CA111275; NCRR NIH HHS: G12 RR003050; NHGRI NIH HHS: HG005718, HG007000, R01 HG002898, R01HG4719, U01 HG005718, U01HG6513, U41 HG007000, U54 HG003079; NIGMS NIH HHS: GM104424; NIMHD NIH HHS: G12 MD007579; Wellcome Trust: 085532, 090532, 095908, 098051, WT085532, WT095908

    Science (New York, N.Y.) 2013;342;6154;1235587

  • Computational approaches to identify functional genetic variants in cancer genomes.

    Gonzalez-Perez A, Mustonen V, Reva B, Ritchie GR, Creixell P, Karchin R, Vazquez M, Fink JL, Kassahn KS, Pearson JV, Bader GD, Boutros PC, Muthuswamy L, Ouellette BF, Reimand J, Linding R, Shibata T, Valencia A, Butler A, Dronov S, Flicek P, Shannon NB, Carter H, Ding L, Sander C, Stuart JM, Stein LD, Lopez-Bigas N and International Cancer Genome Consortium Mutation Pathways and Consequences Subgroup of the Bioinformatics Analyses Working Group

    Research Unit on Biomedical Informatics, University Pompeu Fabra, Barcelona, Spain.

    The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.

    Funded by: NCI NIH HHS: R01 CA180778; NHGRI NIH HHS: U01 HG006517, U54 HG003079; Wellcome Trust: 095908

    Nature methods 2013;10;8;723-9

  • A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

    Tachmazidou I, Dedoussis G, Southam L, Farmaki AE, Ritchie GR, Xifara DK, Matchan A, Hatzikotoulas K, Rayner NW, Chen Y, Pollin TI, O'Connell JR, Yerges-Armstrong LM, Kiagiadaki C, Panoutsopoulou K, Schwartzentruber J, Moutsianas L, UK10K consortium, Tsafantakis E, Tyler-Smith C, McVean G, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.

    Funded by: Department of Health: NF-SI-0510-10268; NHLBI NIH HHS: K01 HL116770, R01 HL104193, U01 HL072515, U01 HL105198; NIDDK NIH HHS: P30 DK072488; Wellcome Trust: 090532, 098051, WT091310

    Nature communications 2013;4;2872

  • An integrated map of genetic variation from 1,092 human genomes.

    1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT and McVean GA

    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I021213/1, BB/I02593X/1; British Heart Foundation: RG/09/012/28096, RG/09/12/28096; Howard Hughes Medical Institute; Medical Research Council: G0701805, G0801823, G0900747, G0900747(91070); NCI NIH HHS: R01 CA166661, R01CA166661; NCRR NIH HHS: G12 RR003050, UL1RR024131; NHGRI NIH HHS: P01 HG004120, P01HG4120, P41HG2371, P41HG4221, R01 HG002898, R01 HG004960, R01 HG007022, R01HG2898, R01HG3698, R01HG4719, R01HG4960, R01HG5701, RC2HG5552, RC2HG5581, U01 HG005728, U01 HG006513, U01HG5208, U01HG5209, U01HG5211, U01HG5214, U01HG5715, U01HG5725, U01HG5728, U01HG6513, U01HG6569, U41HG4568, U54 HG003273, U54HG3067, U54HG3079, U54HG3273; NHLBI NIH HHS: HL078885, R01HL95045, RC2HL102925, T32HL94284; NIA NIH HHS: P30 AG038072; NIAID NIH HHS: AI077439, AI2009061; NIEHS NIH HHS: ES015794; NIGMS NIH HHS: R01GM59290, T32GM7748, T32GM8283; NIH HHS: DP2OD6514; NIMH NIH HHS: R01MH84698; NIMHD NIH HHS: G12 MD007579; NLM NIH HHS: T15 LM007056, T15LM7033; PHS HHS: HHSN268201100040C; Wellcome Trust: 086084, 090532, 095908, WT085475/Z/08/Z, WT085532AIA, WT086084/Z/08/Z, WT089250/Z/09/Z, WT090532/Z/09/Z, WT095552/Z/11/Z, WT098051

    Nature 2012;491;7422;56-65

  • A combined functional annotation score for non-synonymous variants.

    Lopes MC, Joyce C, Ritchie GR, John SL, Cunningham F, Asimit J and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, Hinxton, UK. ml10@sanger.ac.uk

    Aims: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants.

    Methods: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls.

    Results: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage.

    Conclusion: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.

    Funded by: Wellcome Trust: 095908, 098051, WT088885/Z/09/Z

    Human heredity 2012;73;1;47-51

Loz Southam

- Senior Staff Scientist

I graduated from Bradford University in 1995 with a Bsc first-class honours degree in Biomedical Science. Afterwards I worked for Oxford University for 15 years including 1 year at Medical Research Council, Harwell, Oxfordshire. During this time I was primarily laboratory based with my research revolving around the genetics of complex diseases, mainly type 2 diabetes and osteoarthritis. After becoming involved in a genome-wide association analysis of osteoarthritis I developed a keen interest in the applied statistical genetic research of complex traits and I joined the Sanger in 2010 to pursue this.

Research

My research here at the Sanger is diverse involving the analysis of many complex disease traits in different populations. I have contributed results to numerous international consortia aiming to identify both common and rare variants that predispose to complex diseases, these consortia include UK10K, GIANT, GIANTexome, CHARGE and MAGIC. Currently my key research interest is isolated populations, in particular our HELIC cohorts which are founder populations from Greece with a broad spectrum of different phenotypes available. My current analysis is focused upon identification of low frequency and rare variants in the aetiology of complex disease.

References

  • Using ancestry-informative markers to identify fine structure across 15 populations of European origin.

    Huckins LM, Boraska V, Franklin CS, Floyd JA, Southam L, GCAN, WTCCC3, Sullivan PF, Bulik CM, Collier DA, Tyler-Smith C, Zeggini E, Tachmazidou I, GCAN and WTCCC3

    The Wellcome Trust Sanger Institute (WTSI), Hinxton, UK.

    The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.

    Funded by: Department of Health: NF-SI-0510-10214, NF-SI-0512-10074; Medical Research Council: MR/J006742/1, MR/J500355/1, MR/K500999/1; NIA NIH HHS: U19 AG023122; Wellcome Trust: 090532, 098051

    European journal of human genetics : EJHG 2014;22;10;1190-200

  • A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans.

    Timpson NJ, Walter K, Min JL, Tachmazidou I, Malerba G, Shin SY, Chen L, Futema M, Southam L, Iotchkova V, Cocca M, Huang J, Memari Y, McCarthy S, Danecek P, Muddyman D, Mangino M, Menni C, Perry JR, Ring SM, Gaye A, Dedoussis G, Farmaki AE, Burton P, Talmud PJ, Gambaro G, Spector TD, Smith GD, Durbin R, Richards JB, Humphries SE, Zeggini E, Soranzo N, UK1OK Consortium Members and UK1OK Consortium Members

    MRC Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.

    The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (-1.43 s.d. (s.e.=0.27 per minor allele (P-value=8.0 × 10(-8))) discovered in 3,202 individuals with low read-depth, whole-genome sequence. We replicate this in 12,831 participants from five additional samples of Northern and Southern European origin (-1.0 s.d. (s.e.=0.173), P-value=7.32 × 10(-9)). This is consistent with an effect between 0.5 and 1.5 mmol l(-1) dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale.

    Funded by: British Heart Foundation: PG008/08; Medical Research Council: G1001799, G9815508, MC_UU_12012/5/B, MC_UU_12013/1, MC_UU_12013/1-9, MC_UU_12013/3; Wellcome Trust: 076113, 091310, 091551, 092731, 095219, 095515, 098051, 100574, 102215, WT091310, WT095219MA, WT098051

    Nature communications 2014;5;4871

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10(-26)). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10(-36)). We demonstrate significant power gains in detecting medical trait associations.

    Funded by: European Research Council: 280559; NHGRI NIH HHS: U41HG006941; Wellcome Trust: 098051

    Nature communications 2014;5;5345

  • Replication of established common genetic variants for adult BMI and childhood obesity in Greek adolescents: the TEENAGE study.

    Ntalla I, Panoutsopoulou K, Vlachou P, Southam L, William Rayner N, Zeggini E and Dedoussis GV

    Harokopio University of Athens, Department of Nutrition and Dietetics, 17671 Athens, Greece. iontalla@hua.gr

    Multiple genetic loci have been associated with body mass index (BMI) and obesity. The aim of this study was to investigate the effects of established adult BMI and childhood obesity loci in a Greek adolescent cohort. For this purpose, 34 variants were selected for investigation in 707 (55.9% females) adolescents of Greek origin aged 13.42 ± 0.88 years. Cumulative effects of variants were assessed by calculating a genetic risk score (GRS-34) for each subject. Variants at the FTO, TMEM18, FAIM2, RBJ, ZNF608 and QPCTL loci yielded nominal evidence for association with BMI and/or overweight risk (p < 0.05). Variants at TFAP2B and NEGR1 loci showed nominal association (p < 0.05) with BMI and/or overweight risk in males and females respectively. Even though we did not detect any genome-wide significant associations, 27 out of 34 variants yielded directionally consistent effects with those reported by large-scale meta-analyses (binomial sign p = 0.0008). The GRS-34 was associated with both BMI (beta = 0.17 kg/m(2) /allele; p < 0.001) and overweight risk (OR = 1.09/allele; 95% CI: 1.04-1.16; p = 0.001). In conclusion, we replicate associations of established BMI and childhood obesity variants in a Greek adolescent cohort and confirm directionally consistent effects for most of them.

    Funded by: Wellcome Trust: 098051

    Annals of human genetics 2013;77;3;268-74

  • A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

    Tachmazidou I, Dedoussis G, Southam L, Farmaki AE, Ritchie GR, Xifara DK, Matchan A, Hatzikotoulas K, Rayner NW, Chen Y, Pollin TI, O'Connell JR, Yerges-Armstrong LM, Kiagiadaki C, Panoutsopoulou K, Schwartzentruber J, Moutsianas L, UK10K consortium, Tsafantakis E, Tyler-Smith C, McVean G, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.

    Funded by: Department of Health: NF-SI-0510-10268; NHLBI NIH HHS: K01 HL116770, R01 HL104193, U01 HL072515, U01 HL105198; NIDDK NIH HHS: P30 DK072488; Wellcome Trust: 090532, 098051, WT091310

    Nature communications 2013;4;2872

  • Genome-wide meta-analysis of common variant differences between men and women.

    Boraska V, Jerončić A, Colonna V, Southam L, Nyholt DR, Rayner NW, Perry JR, Toniolo D, Albrecht E, Ang W, Bandinelli S, Barbalic M, Barroso I, Beckmann JS, Biffar R, Boomsma D, Campbell H, Corre T, Erdmann J, Esko T, Fischer K, Franceschini N, Frayling TM, Girotto G, Gonzalez JR, Harris TB, Heath AC, Heid IM, Hoffmann W, Hofman A, Horikoshi M, Zhao JH, Jackson AU, Hottenga JJ, Jula A, Kähönen M, Khaw KT, Kiemeney LA, Klopp N, Kutalik Z, Lagou V, Launer LJ, Lehtimäki T, Lemire M, Lokki ML, Loley C, Luan J, Mangino M, Mateo Leach I, Medland SE, Mihailov E, Montgomery GW, Navis G, Newnham J, Nieminen MS, Palotie A, Panoutsopoulou K, Peters A, Pirastu N, Polasek O, Rehnström K, Ripatti S, Ritchie GR, Rivadeneira F, Robino A, Samani NJ, Shin SY, Sinisalo J, Smit JH, Soranzo N, Stolk L, Swinkels DW, Tanaka T, Teumer A, Tönjes A, Traglia M, Tuomilehto J, Valsesia A, van Gilst WH, van Meurs JB, Smith AV, Viikari J, Vink JM, Waeber G, Warrington NM, Widen E, Willemsen G, Wright AF, Zanke BW, Zgaga L, Wellcome Trust Case Control Consortium, Boehnke M, d'Adamo AP, de Geus E, Demerath EW, den Heijer M, Eriksson JG, Ferrucci L, Gieger C, Gudnason V, Hayward C, Hengstenberg C, Hudson TJ, Järvelin MR, Kogevinas M, Loos RJ, Martin NG, Metspalu A, Pennell CE, Penninx BW, Perola M, Raitakari O, Salomaa V, Schreiber S, Schunkert H, Spector TD, Stumvoll M, Uitterlinden AG, Ulivi S, van der Harst P, Vollenweider P, Völzke H, Wareham NJ, Wichmann HE, Wilson JF, Rudan I, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. vboraska@mefst.hr

    The male-to-female sex ratio at birth is constant across world populations with an average of 1.06 (106 male to 100 female live births) for populations of European descent. The sex ratio is considered to be affected by numerous biological and environmental factors and to have a heritable component. The aim of this study was to investigate the presence of common allele modest effects at autosomal and chromosome X variants that could explain the observed sex ratio at birth. We conducted a large-scale genome-wide association scan (GWAS) meta-analysis across 51 studies, comprising overall 114 863 individuals (61 094 women and 53 769 men) of European ancestry and 2 623 828 common (minor allele frequency >0.05) single-nucleotide polymorphisms (SNPs). Allele frequencies were compared between men and women for directly-typed and imputed variants within each study. Forward-time simulations for unlinked, neutral, autosomal, common loci were performed under the demographic model for European populations with a fixed sex ratio and a random mating scheme to assess the probability of detecting significant allele frequency differences. We do not detect any genome-wide significant (P < 5 × 10(-8)) common SNP differences between men and women in this well-powered meta-analysis. The simulated data provided results entirely consistent with these findings. This large-scale investigation across ~115 000 individuals shows no detectable contribution from common genetic variants to the observed skew in the sex ratio. The absence of sex-specific differences is useful in guiding genetic association study design, for example when using mixed controls for sex-biased traits.

    Funded by: Canadian Institutes of Health Research: MOP-82893; Cancer Research UK; Chief Scientist Office: CZB/4/710; Medical Research Council: G0401527, G0801056B, G1000143, G1001799, MC_PC_U127561128, MC_U106179471, MC_U127561128; NCRR NIH HHS: RR018787, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: HL65234, HL67466, R01HL086694, R01HL087641, R01HL59367; NIA NIH HHS: N.1-AG-1-1, N.1-AG-1-2111, N01-AG-1-2100, N01-AG-5-0002; NIAAA NIH HHS: AA07535, AA10248, AA13320, AA13321, AA13326, AA14041, K05 AA017688; NIDDK NIH HHS: DK062370; NIMH NIH HHS: MH081802, MH66206, R01 MH059160, U24 MH068457-06; NLM NIH HHS: LM010098; PHS HHS: HHSN268200625226C, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C; Wellcome Trust: 076113, 089062/Z/09/Z, 092447/Z/10/Z, 095831, 098051, 89061/Z/09/Z

    Human molecular genetics 2012;21;21;4805-15

  • Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study.

    arcOGEN Consortium, arcOGEN Collaborators, Zeggini E, Panoutsopoulou K, Southam L, Rayner NW, Day-Williams AG, Lopes MC, Boraska V, Esko T, Evangelou E, Hoffman A, Houwing-Duistermaat JJ, Ingvarsson T, Jonsdottir I, Jonnson H, Kerkhof HJ, Kloppenburg M, Bos SD, Mangino M, Metrustry S, Slagboom PE, Thorleifsson G, Raine EV, Ratnayake M, Ricketts M, Beazley C, Blackburn H, Bumpstead S, Elliott KS, Hunt SE, Potter SC, Shin SY, Yadav VK, Zhai G, Sherburn K, Dixon K, Arden E, Aslam N, Battley PK, Carluke I, Doherty S, Gordon A, Joseph J, Keen R, Koller NC, Mitchell S, O'Neill F, Paling E, Reed MR, Rivadeneira F, Swift D, Walker K, Watkins B, Wheeler M, Birrell F, Ioannidis JP, Meulenbelt I, Metspalu A, Rai A, Salter D, Stefansson K, Stykarsdottir U, Uitterlinden AG, van Meurs JB, Chapman K, Deloukas P, Ollier WE, Wallis GA, Arden N, Carr A, Doherty M, McCaskie A, Willkinson JM, Ralston SH, Valdes AM, Spector TD and Loughlin J

    Wellcome Trust Sanger Institute, Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. eleftheria@sanger.ac.uk

    Background: Osteoarthritis is the most common form of arthritis worldwide and is a major cause of pain and disability in elderly people. The health economic burden of osteoarthritis is increasing commensurate with obesity prevalence and longevity. Osteoarthritis has a strong genetic component but the success of previous genetic studies has been restricted due to insufficient sample sizes and phenotype heterogeneity.

    Methods: We undertook a large genome-wide association study (GWAS) in 7410 unrelated and retrospectively and prospectively selected patients with severe osteoarthritis in the arcOGEN study, 80% of whom had undergone total joint replacement, and 11,009 unrelated controls from the UK. We replicated the most promising signals in an independent set of up to 7473 cases and 42,938 controls, from studies in Iceland, Estonia, the Netherlands, and the UK. All patients and controls were of European descent.

    Findings: We identified five genome-wide significant loci (binomial test p≤5·0×10(-8)) for association with osteoarthritis and three loci just below this threshold. The strongest association was on chromosome 3 with rs6976 (odds ratio 1·12 [95% CI 1·08-1·16]; p=7·24×10(-11)), which is in perfect linkage disequilibrium with rs11177. This SNP encodes a missense polymorphism within the nucleostemin-encoding gene GNL3. Levels of nucleostemin were raised in chondrocytes from patients with osteoarthritis in functional studies. Other significant loci were on chromosome 9 close to ASTN2, chromosome 6 between FILIP1 and SENP6, chromosome 12 close to KLHDC5 and PTHLH, and in another region of chromosome 12 close to CHST11. One of the signals close to genome-wide significance was within the FTO gene, which is involved in regulation of bodyweight-a strong risk factor for osteoarthritis. All risk variants were common in frequency and exerted small effects.

    Interpretation: Our findings provide insight into the genetics of arthritis and identify new pathways that might be amenable to future therapeutic intervention.

    Funding: arcOGEN was funded by a special purpose grant from Arthritis Research UK.

    Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461, MC_U122886349

    Lancet 2012;380;9844;815-23

  • A variant in MCF2L is associated with osteoarthritis.

    Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, Esko T, Estrada K, Helgadottir HT, Hofman A, Ingvarsson T, Jonsson H, Keis A, Kerkhof HJ, Thorleifsson G, Arden NK, Carr A, Chapman K, Deloukas P, Loughlin J, McCaskie A, Ollier WE, Ralston SH, Spector TD, Wallis GA, Wilkinson JM, Aslam N, Birell F, Carluke I, Joseph J, Rai A, Reed M, Walker K, arcOGEN Consortium, Doherty SA, Jonsdottir I, Maciewicz RA, Muir KR, Metspalu A, Rivadeneira F, Stefansson K, Styrkarsdottir U, Uitterlinden AG, van Meurs JB, Zhang W, Valdes AM, Doherty M and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK.

    Osteoarthritis (OA) is a prevalent, heritable degenerative joint disease with a substantial public health impact. We used a 1000-Genomes-Project-based imputation in a genome-wide association scan for osteoarthritis (3177 OA cases and 4894 controls) to detect a previously unidentified risk locus. We discovered a small disease-associated set of variants on chromosome 13. Through large-scale replication, we establish a robust association with SNPs in MCF2L (rs11842874, combined odds ratio [95% confidence interval] 1.17 [1.11-1.23], p = 2.1 × 10(-8)) across a total of 19,041 OA cases and 24,504 controls of European descent. This risk locus represents the third established signal for OA overall. MCF2L regulates a nerve growth factor (NGF), and treatment with a humanized monoclonal antibody against NGF is associated with reduction in pain and improvement in function for knee OA patients.

    Funded by: Medical Research Council: G0100594, G0901461, MC_U122886349

    American journal of human genetics 2011;89;3;446-50

  • The effect of genome-wide association scan quality control on imputation outcome for common variants.

    Southam L, Panoutsopoulou K, Rayner NW, Chapman K, Durrant C, Ferreira T, Arden N, Carr A, Deloukas P, Doherty M, Loughlin J, McCaskie A, Ollier WE, Ralston S, Spector TD, Valdes AM, Wallis GA, Wilkinson JM, arcOGEN consortium, Marchini J and Zeggini E

    Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

    Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610 k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.

    Funded by: Arthritis Research UK: 18030; Medical Research Council: G0100594, G0901461; Wellcome Trust: 079557, 088885, 090532, WT079557MA, WT088885/Z/09/Z

    European journal of human genetics : EJHG 2011;19;5;610-4

  • Association of a functional microsatellite within intron 1 of the BMP5 gene with susceptibility to osteoarthritis.

    Wilkins JM, Southam L, Mustafa Z, Chapman K and Loughlin J

    University of Oxford, Institute of Musculoskeletal Sciences, Botnar Research Centre, Nuffield Orthopaedic Centre, Oxford, OX3 7LD, UK. james_wilkins@hms.harvard.edu

    Background: In a previous study carried out by our group, the genotyping of 36 microsatellite markers from within a narrow interval of chromosome 6p12.3-q13 generated evidence for linkage and for association to female hip osteoarthritis (OA), with the most compelling association found for a marker within intron 1 of the bone morphogenetic protein 5 gene (BMP5). In this study, we aimed to further categorize the association of variants within intron 1 of BMP5 with OA through an expanded genetic association study of the intron and subsequent functional analysis of associated polymorphisms.

    Methods: We genotyped 18 common polymorphisms including 8 microsatellites and 9 single nucleotide polymorphisms (SNPs) and 1 insertion/deletion (INDEL) from within highly conserved regions between human and mouse within intron 1 of BMP5. These markers were then tested for association to OA by a two-stage approach in which the polymorphisms were initially genotyped in a case-control cohort comprising 361 individuals with associated polymorphisms (P < or = 0.05) then genotyped in a second case-control cohort comprising 1185 individuals.

    Results: Two BMP5 intron 1 polymorphisms demonstrated association in the combined case-control cohort of 1546 individuals (765 cases and 781 controls): microsatellite D6S1276 (P = 0.018) and SNP rs921126 (P = 0.013). Functional analyses in osteoblastic, chondrocytic, and adipocytic cell lines indicated that allelic variants of D6S1276 have significant effects on the transcriptional activity of the BMP5 promoter in vitro.

    Conclusion: Variability in gene expression of BMP5 may be an important contributor to OA genetic susceptibility.

    Funded by: Arthritis Research UK: 16239

    BMC medical genetics 2009;10;141

Julia Steinberg

- Postdoctoral Fellow

I graduated from the University of Oxford with a Master in Mathematics in 2010 (double first and Gibb's Prize). I then completed a 4-year DPhil in Genomic Medicine and Statistics, also at Oxford. My focus was on finding biological pathways and gene networks that contribute to complex human disorders. I both developed new methods and applied them to study autism, schizophrenia, bipolar disorder, and intellectual disability. Amongst others, I developed approaches to 1) test multiple-hit models for disorders, 2) predict which human genes are haploinsufficient, and 3) integrate various genomic datasets into gene networks tuned to a disorder or tissue.

Research

My main interest is to gain insights into complex human disease by identifying the underlying biological pathways and gene networks. Recently developed next-generation technologies offer unprecedented opportunities to study human tissue from different angles. For example, we can look at genetic variation, gene expression, methylation and protein levels. The integration of these diverse data is crucial to gain a full picture of disease processes, but still a major challenge. Addressing this challenge is the main focus of my work at the Wellcome Trust Sanger Institute. In particular, I am currently looking into osteoarthritis, as the diseased tissue is readily accessible.

References

  • GeneNet Toolbox for MATLAB: a flexible platform for the analysis of gene connectivity in biological networks.

    Taylor A, Steinberg J, Andrews TS and Webber C

    MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK and The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.

    Summary: We present GeneNet Toolbox for MATLAB (also available as a set of standalone applications for Linux). The toolbox, available as command-line or with a graphical user interface, enables biologists to assess connectivity among a set of genes of interest ('seed-genes') within a biological network of their choosing. Two methods are implemented for calculating the significance of connectivity among seed-genes: 'seed randomization' and 'network permutation'. Options include restricting analyses to a specified subnetwork of the primary biological network, and calculating connectivity from the seed-genes to a second set of interesting genes. Pre-analysis tools help the user choose the best connectivity-analysis algorithm for their network. The toolbox also enables visualization of the connections among seed-genes. GeneNet Toolbox functions execute in reasonable time for very large networks (∼10 million edges) on a desktop computer.

    Availability and implementation: GeneNet Toolbox is open source and freely available from http://avigailtaylor.github.io/gntat14.

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Contact: avigail.taylor@dpag.ox.ac.uk.

    Funded by: Medical Research Council: MC_UP_A320_1004, MC_UU_12021/4

    Bioinformatics (Oxford, England) 2015;31;3;442-4

  • The roles of FMRP-regulated genes in autism spectrum disorder: single- and multiple-hit genetic etiologies.

    Steinberg J and Webber C

    Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3QX, UK; The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.

    Autism spectrum disorder (ASD) is a highly heritable complex neurodevelopmental condition characterized by impairments in social interaction and communication and restricted and repetitive behaviors. Although roles for both de novo and familial genetic variation have been documented, the underlying disease mechanisms remain poorly elucidated. In this study, we defined and explored distinct etiologies of genetic variants that affect genes regulated by Fragile-X mental retardation protein (FMRP), thought to play a key role in neuroplasticity and neuronal translation, in ASD-affected individuals. In particular, we developed the Trend test, a pathway-association test that is able to robustly detect multiple-hit etiologies and is more powerful than existing approaches. Exploiting detailed spatiotemporal maps of gene expression within the human brain, we identified four discrete FMRP-target subpopulations that exhibit distinct functional biases and contribute to ASD via different types of genetic variation. We also demonstrated that FMRP target genes are more likely than other genes with similar expression patterns to contribute to disease. We developed the hypothesis that FMRP targets contribute to ASD via two distinct etiologies: (1) ultra-rare and highly penetrant single disruptions of embryonically upregulated FMRP targets ("single-hit etiology") or (2) the combination of multiple less penetrant disruptions of nonembryonic, synaptic FMRP targets ("multiple-hit etiology"). The Trend test provides rigorous support for a multiple-hit genetic etiology in a subset of autism cases and is easily extendible to combining information from multiple types of genetic variation (i.e., copy-number and exome variants), increasing its value to next-generation sequencing approaches.

    Funded by: Medical Research Council: MC_UP_A320_1004; NIMH NIH HHS: 1U24MH081810; Wellcome Trust: 090532/Z/09/Z, 093941/Z/10/Z

    American journal of human genetics 2013;93;5;825-39

Ioanna Tachmazidou

it3@sanger.ac.uk Statistical Geneticist

I graduated from the Aristotle University of Thessaloniki in Greece in 2002 with a BSc in Mathematics and completed an MSc in Statistics at University College London in 2003. I then undertook a 4-years PhD in Bioinformatics at Imperial College London, in the first year of which I completed an MSc in Bioinformatics. I earned my PhD in Statistical Genetics in 2008. Subsequently, I took up a Career Development Fellowship at the Biostatistics Unit of the MRC in Cambridge. I joined the Analytical Genomics of Complex Traits group as a staff scientist in 2011.

Research

My research interests are on all aspects of statistical genetics, and in particular in the genetic etiology of common disease. My current work is primarily focused on analyzing whole genome sequence data for metabolic related traits, as well as developing statistical methodology and software for the discovery of disease susceptibility loci. My research interests include: fine-scale mapping, design and analysis of sequence studies and rare variants analysis, multivariate analysis of sequence data, Bayesian survival analysis, Bayesian inference and model selection.

I am involved in a number of international consortia, including the AGVP, UK10K, SCOOP, HELIC, and the INCHARGE projects.

References

  • The African Genome Variation Project shapes medical genetics in Africa.

    Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, Karthikeyan S, Iles L, Pollard MO, Choudhury A, Ritchie GR, Xue Y, Asimit J, Nsubuga RN, Young EH, Pomilla C, Kivinen K, Rockett K, Kamali A, Doumatey AP, Asiki G, Seeley J, Sisay-Joof F, Jallow M, Tollman S, Mekonnen E, Ekong R, Oljira T, Bradman N, Bojang K, Ramsay M, Adeyemo A, Bekele E, Motala A, Norris SA, Pirie F, Kaleebu P, Kwiatkowski D, Tyler-Smith C, Rotimi C, Zeggini E and Sandhu MS

    1] Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK [2] Department of Public Health and Primary Care, University of Cambridge, 2 Wort's Causeway, Cambridge, CB1 8RN, UK.

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

    Funded by: Intramural NIH HHS: Z01 HG200362-01, ZIA HG200362-02, ZIA HG200362-03, ZIA HG200362-04, ZIA HG200362-05, ZIA HG200362-06; Medical Research Council: G0600718, G0801566, G0901213-92157, MR/K013491/1; NHGRI NIH HHS: Z01HG200362; Wellcome Trust: 100891, WT077383/Z/05/Z

    Nature 2015;517;7534;327-32

  • Using ancestry-informative markers to identify fine structure across 15 populations of European origin.

    Huckins LM, Boraska V, Franklin CS, Floyd JA, Southam L, GCAN, WTCCC3, Sullivan PF, Bulik CM, Collier DA, Tyler-Smith C, Zeggini E, Tachmazidou I, GCAN and WTCCC3

    The Wellcome Trust Sanger Institute (WTSI), Hinxton, UK.

    The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.

    Funded by: Department of Health: NF-SI-0510-10214, NF-SI-0512-10074; Medical Research Council: MR/J006742/1, MR/J500355/1, MR/K500999/1; NIA NIH HHS: U19 AG023122; Wellcome Trust: 090532, 098051

    European journal of human genetics : EJHG 2014;22;10;1190-200

  • Estimating genome-wide significance for whole-genome sequencing studies.

    Xu C, Tachmazidou I, Walter K, Ciampi A, Zeggini E, Greenwood CM and UK10K Consortium

    Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.

    Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.

    Funded by: Canadian Institutes of Health Research: MOP-115110; Department of Health: NF-SI-0510-10268; Wellcome Trust: WT091310, WT098051

    Genetic epidemiology 2014;38;4;281-90

  • A genome-wide association study and biological pathway analysis of epilepsy prognosis in a prospective cohort of newly treated epilepsy.

    Speed D, Hoggart C, Petrovski S, Tachmazidou I, Coffey A, Jorgensen A, Eleftherohorinou H, De Iorio M, Todaro M, De T, Smith D, Smith PE, Jackson M, Cooper P, Kellett M, Howell S, Newton M, Yerra R, Tan M, French C, Reuber M, Sills GE, Chadwick D, Pirmohamed M, Bentley D, Scheffer I, Berkovic S, Balding D, Palotie A, Marson A, O'Brien TJ and Johnson MR

    UCL Genetics Institute, University College London WC1E 6BT, UK.

    We present the analysis of a prospective multicentre study to investigate genetic effects on the prognosis of newly treated epilepsy. Patients with a new clinical diagnosis of epilepsy requiring medication were recruited and followed up prospectively. The clinical outcome was defined as freedom from seizures for a minimum of 12 months in accordance with the consensus statement from the International League Against Epilepsy (ILAE). Genetic effects on remission of seizures after starting treatment were analysed with and without adjustment for significant clinical prognostic factors, and the results from each cohort were combined using a fixed-effects meta-analysis. After quality control (QC), we analysed 889 newly treated epilepsy patients using 472 450 genotyped and 6.9 × 10(6) imputed single-nucleotide polymorphisms. Suggestive evidence for association (defined as Pmeta < 5.0 × 10(-7)) with remission of seizures after starting treatment was observed at three loci: 6p12.2 (rs492146, Pmeta = 2.1 × 10(-7), OR[G] = 0.57), 9p23 (rs72700966, Pmeta = 3.1 × 10(-7), OR[C] = 2.70) and 15q13.2 (rs143536437, Pmeta = 3.2 × 10(-7), OR[C] = 1.92). Genes of biological interest at these loci include PTPRD and ARHGAP11B (encoding functions implicated in neuronal development) and GSTA4 (a phase II biotransformation enzyme). Pathway analysis using two independent methods implicated a number of pathways in the prognosis of epilepsy, including KEGG categories 'calcium signaling pathway' and 'phosphatidylinositol signaling pathway'. Through a series of power curves, we conclude that it is unlikely any single common variant explains >4.4% of the variation in the outcome of newly treated epilepsy.

    Funded by: Department of Health: NF-SI-0512-10064; Medical Research Council: 25105, G0901388, MR/L006758/1; Wellcome Trust: WT066056

    Human molecular genetics 2014;23;1;247-58

  • A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans.

    Timpson NJ, Walter K, Min JL, Tachmazidou I, Malerba G, Shin SY, Chen L, Futema M, Southam L, Iotchkova V, Cocca M, Huang J, Memari Y, McCarthy S, Danecek P, Muddyman D, Mangino M, Menni C, Perry JR, Ring SM, Gaye A, Dedoussis G, Farmaki AE, Burton P, Talmud PJ, Gambaro G, Spector TD, Smith GD, Durbin R, Richards JB, Humphries SE, Zeggini E, Soranzo N, UK1OK Consortium Members and UK1OK Consortium Members

    MRC Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.

    The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (-1.43 s.d. (s.e.=0.27 per minor allele (P-value=8.0 × 10(-8))) discovered in 3,202 individuals with low read-depth, whole-genome sequence. We replicate this in 12,831 participants from five additional samples of Northern and Southern European origin (-1.0 s.d. (s.e.=0.173), P-value=7.32 × 10(-9)). This is consistent with an effect between 0.5 and 1.5 mmol l(-1) dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale.

    Funded by: British Heart Foundation: PG008/08; Medical Research Council: G1001799, G9815508, MC_UU_12012/5/B, MC_UU_12013/1, MC_UU_12013/1-9, MC_UU_12013/3; Wellcome Trust: 076113, 091310, 091551, 092731, 095219, 095515, 098051, 100574, 102215, WT091310, WT095219MA, WT098051

    Nature communications 2014;5;4871

  • Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.

    Panoutsopoulou K, Hatzikotoulas K, Xifara DK, Colonna V, Farmaki AE, Ritchie GR, Southam L, Gilly A, Tachmazidou I, Fatumo S, Matchan A, Rayner NW, Ntalla I, Mezzavilla M, Chen Y, Kiagiadaki C, Zengini E, Mamakou V, Athanasiadis A, Giannakopoulou M, Kariakli VE, Nsubuga RN, Karabarinde A, Sandhu M, McVean G, Tyler-Smith C, Tsafantakis E, Karaleftheri M, Xue Y, Dedoussis G and Zeggini E

    Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton CB10 1HH, UK.

    Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10(-26)). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10(-36)). We demonstrate significant power gains in detecting medical trait associations.

    Funded by: European Research Council: 280559; NHGRI NIH HHS: U41HG006941; Wellcome Trust: 098051

    Nature communications 2014;5;5345

  • In search of low-frequency and rare variants affecting complex traits.

    Panoutsopoulou K, Tachmazidou I and Zeggini E

    The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common frequency and rare variants. Targeted genotyping arrays and next-generation sequencing technologies at the whole-genome sequencing (WGS) and whole-exome scales (WES) are increasingly employed to access sequence variation across the full minor allele frequency (MAF) spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.

    Funded by: Arthritis Research UK: 19542; Wellcome Trust: 098051

    Human molecular genetics 2013;22;R1;R16-21

  • A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

    Tachmazidou I, Dedoussis G, Southam L, Farmaki AE, Ritchie GR, Xifara DK, Matchan A, Hatzikotoulas K, Rayner NW, Chen Y, Pollin TI, O'Connell JR, Yerges-Armstrong LM, Kiagiadaki C, Panoutsopoulou K, Schwartzentruber J, Moutsianas L, UK10K consortium, Tsafantakis E, Tyler-Smith C, McVean G, Xue Y and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.

    Funded by: Department of Health: NF-SI-0510-10268; NHLBI NIH HHS: K01 HL116770, R01 HL104193, U01 HL072515, U01 HL105198; NIDDK NIH HHS: P30 DK072488; Wellcome Trust: 090532, 098051, WT091310

    Nature communications 2013;4;2872

  • Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI.

    Vimaleswaran KS, Tachmazidou I, Zhao JH, Hirschhorn JN, Dudbridge F and Loos RJ

    MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, UK.

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.

    Funded by: Medical Research Council: G1000718, MC_U106188470

    Human molecular genetics 2012;21;20;4537-42

  • Rare variant association testing for next-generation sequencing data via hierarchical clustering.

    Tachmazidou I, Morris A and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton, UK. ioanna.tachmazidou@sanger.ac.uk

    Objectives: It is thought that a proportion of the genetic susceptibility to complex diseases is due to low-frequency and rare variants. Next-generation sequencing in large populations facilitates the detection of rare variant associations to disease risk. In order to achieve adequate power to detect association at low-frequency and rare variants, locus-specific statistical methods are being developed that combine information across variants within a functional unit and test for association with this enriched signal through so-called burden tests.

    Methods: We propose a hierarchical clustering approach and a similarity kernel-based association test for continuous phenotypes. This method clusters individuals into groups, within which samples are assumed to be genetically similar, and subsequently tests the group effects among the different clusters.

    Results: The power of this approach is comparable to that of collapsing methods when causal variants have the same direction of effect, but its power is significantly higher compared to burden tests when both protective and risk variants are present in the region of interest. Overall, we observe that the Sequence Kernel Association Test (SKAT) is the most powerful approach under the allelic architectures considered.

    Conclusions: In our overall comparison, we find the analytical framework within which SKAT operates to yield higher power and to control type I error appropriately.

    Funded by: Wellcome Trust: 098051, WT090532, WT098017

    Human heredity 2012;74;3-4;165-71

Group leader

Ele's photo Professor Eleftheria Zeggini
Ele's profile

Software

  • AMELIA - allele matching empirical locus-specific integrated association test
  • ARIEL - accumulation of rare variants integrated and extended locus-specific test
  • CAROL - a combined functional annotation score of non-synonymous coding variants
  • CCRaVAT - rare variant case-control analysis tool
  • GLIDERS - HapMap based long-range LD search engine
  • GGSD - open-source, web-based and relational database driven data management software for large-scale genetic studies
  • GWAVA - A functional annotation tool for non-coding sequence variation
  • KATE - a program that analyses the effects of low frequency and rare variants on quantitative traits within a chromosomal region
  • QuTie - rare variant quantitative trait analysis tool
* quick link - http://q.sanger.ac.uk/appstgen