Human evolution

The Human evolution team uses information on genetic variation in modern humans and apes to answer questions about our species' past. This allows us to understand more about the genetic influences on our current health and disease.

We study human genetic variation, including both single nucleotide polymorphisms (SNPs) and structural variants, in diverse human populations, and also variation in closely-related species. With this information, we investigate human origins, expansions and migrations, and how natural selection has shaped our species.

[Genome Research Limited]

Background

We are one of the great apes, but differ from orangutans, gorillas, chimpanzees and bonobos in our enormous numbers, distribution all over the world, yet surprisingly low genetic diversity and even distribution of this diversity among populations. All of these human-specific characteristics are explained in a simple way: recent expansion of modern humans from a small population in Africa within the last 100,000 years. All human populations therefore share most of their genetic variants and susceptibilities because these were present in the ancestral population. But populations differ slightly because of a combination of random genetic drift and natural selection affecting them differently during the expansions into new environments over the last 50,000 years.

One view of the expansion of anatomically and behaviourally modern humans out of Africa around 50 thousand years ago (KYA). Times and routes are very uncertain.

One view of the expansion of anatomically and behaviourally modern humans out of Africa around 50 thousand years ago (KYA). Times and routes are very uncertain. [Genome Research Limited]
Enlarge this image (1000 x 478)

With the availability of genomic sequences from humans and apes and accumulation of extensive information about the variation within humans, we can now begin to reconstruct these expansions and search directly for the functional genetic variants that have contributed to the characteristics of modern humans. Most DNA variants are evolutionarily neutral (they have no effect on fitness) but provide information on past population sizes and migrations, and we continue to investigate these, particularly using the Y chromosome and mitochondrial DNA. A few variants increase fitness and are of particular interest. We can recognise these from the patterns of variation in the surrounding DNA, or by carrying out functional studies. We would like to catalogue the positively selected regions in the human genome and understand the basis for their selection.

Disease-associated alleles are generally expected to decrease fitness, so why are they present at all and not eliminated by negative selection? New disease variants arise continually by mutation, and while some are eliminated rapidly, those that confer only a small decrease in fitness may persist in the population for many generations. Indeed, if the disease develops only after an individual has reproduced, the causal variant may be, in evolutionary terms, neutral. Occasionally, a disease-associated allele may actually confer a fitness advantage in certain circumstances and be positively selected, as the sickle allele has been in malaria-endemic regions. An evolutionary perspective can thus help us to understand our disease susceptibilities more fully.

By exploring the genetic signals left in our gene pool in these ways we can reconstruct human evolutionary history and advance our understanding of what makes us human, what makes populations differ from one another, and why we suffer from some diseases.

Selected publications

Research

Current projects

Previous projects

  • Gene number variation and human evolution
  • Population differentiation and human evolution
  • Y-chromosomal variation and human evolution

Publications

Team publications 2013

  • FOXP2 Targets Show Evidence of Positive Selection in European Populations.

    Ayub Q, Yngvadottir B, Chen Y, Xue Y, Hu M, Vernes SC, Fisher SE and Tyler-Smith C

    American journal of human genetics 2013

  • Y-chromosome and mtDNA genetics reveal significant contrasts in affinities of modern Middle Eastern populations with European and African populations.

    Badro DA, Douaihy B, Haber M, Youhanna SC, Salloum A, Ghassibe-Sabbagh M, Johnsrud B, Khazen G, Matisoo-Smith E, Soria-Hernanz DF, Wells RS, Tyler-Smith C, Platt DE, Zalloua PA and Genographic Consortium

    PloS one 2013;8;1;e54616

  • Genome-wide diversity in the levant reveals recent structuring by culture.

    Haber M, Gauguier D, Youhanna S, Patterson N, Moorjani P, Botigué LR, Platt DE, Matisoo-Smith E, Soria-Hernanz DF, Wells RS, Bertranpetit J, Tyler-Smith C, Comas D and Zalloua PA

    PLoS genetics 2013;9;2;e1003316

  • Genetic basis of Y-linked hearing impairment.

    Wang Q, Xue Y, Zhang Y, Long Q, an, Yang F, Turner DJ, Fitzgerald T, Ng BL, Zhao Y, Chen Y, Liu Q, Yang W, Han D, Quail MA, Swerdlow H, Burton J, Fahey C, Ning Z, Hurles ME, Carter NP, Yang H and Tyler-Smith C

    American journal of human genetics 2013;92;2;301-6

  • A calibrated human Y-chromosomal phylogeny based on resequencing.

    Wei W, Ayub Q, Chen Y, McCarthy S, Hou Y, Carbone I, Xue Y and Tyler-Smith C

    Genome research 2013;23;2;388-95

Team publications 2012

  • An integrated map of genetic variation from 1,092 human genomes.

    1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT and McVean GA

    Nature 2012;491;7422;56-65

  • Population differentiation of southern Indian male lineages correlates with agricultural expansions predating the caste system.

    Arunkumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, Ashokan KS, Gandhirajan KT, Vijayakumar K, Narayanan M, Jayalakshmi M, Ziegle JS, Royyuru AK, Parida L, Wells RS, Renfrew C, Schurr TG, Smith CT, Platt DE, Pitchappan R and Genographic Consortium

    PloS one 2012;7;11;e50269

  • Genome-wide meta-analysis of common variant differences between men and women.

    Boraska V, Jerončić A, Colonna V, Southam L, Nyholt DR, Rayner NW, Perry JR, Toniolo D, Albrecht E, Ang W, Bandinelli S, Barbalic M, Barroso I, Beckmann JS, Biffar R, Boomsma D, Campbell H, Corre T, Erdmann J, Esko T, Fischer K, Franceschini N, Frayling TM, Girotto G, Gonzalez JR, Harris TB, Heath AC, Heid IM, Hoffmann W, Hofman A, Horikoshi M, Zhao JH, Jackson AU, Hottenga JJ, Jula A, Kähönen M, Khaw KT, Kiemeney LA, Klopp N, Kutalik Z, Lagou V, Launer LJ, Lehtimäki T, Lemire M, Lokki ML, Loley C, Luan J, Mangino M, Mateo Leach I, Medland SE, Mihailov E, Montgomery GW, Navis G, Newnham J, Nieminen MS, Palotie A, Panoutsopoulou K, Peters A, Pirastu N, Polasek O, Rehnström K, Ripatti S, Ritchie GR, Rivadeneira F, Robino A, Samani NJ, Shin SY, Sinisalo J, Smit JH, Soranzo N, Stolk L, Swinkels DW, Tanaka T, Teumer A, Tönjes A, Traglia M, Tuomilehto J, Valsesia A, van Gilst WH, van Meurs JB, Smith AV, Viikari J, Vink JM, Waeber G, Warrington NM, Widen E, Willemsen G, Wright AF, Zanke BW, Zgaga L, Wellcome Trust Case Control Consortium, Boehnke M, d'Adamo AP, de Geus E, Demerath EW, den Heijer M, Eriksson JG, Ferrucci L, Gieger C, Gudnason V, Hayward C, Hengstenberg C, Hudson TJ, Järvelin MR, Kogevinas M, Loos RJ, Martin NG, Metspalu A, Pennell CE, Penninx BW, Perola M, Raitakari O, Salomaa V, Schreiber S, Schunkert H, Spector TD, Stumvoll M, Uitterlinden AG, Ulivi S, van der Harst P, Vollenweider P, Völzke H, Wareham NJ, Wichmann HE, Wilson JF, Rudan I, Xue Y and Zeggini E

    Human molecular genetics 2012;21;21;4805-15

  • 'Sifting the significance from the data' - the impact of high-throughput genomic technologies on human genetics and health care.

    Clarke AJ, Cooper DN, Krawczak M, Tyler-Smith C, Wallace HM, Wilkie AO, Raymond FL, Chadwick R, Craddock N, John R, Gallacher J and Chiano M

    Human genomics 2012;6;11

  • IFITM3 restricts the morbidity and mortality associated with influenza.

    Everitt AR, Clare S, Pertel T, John SP, Wash RS, Smith SE, Chin CR, Feeley EM, Sims JS, Adams DJ, Wise HM, Kane L, Goulding D, Digard P, Anttila V, Baillie JK, Walsh TS, Hume DA, Palotie A, Xue Y, Colonna V, Tyler-Smith C, Dunning J, Gordon SB, GenISIS Investigators, MOSAIC Investigators, Smyth RL, Openshaw PJ, Dougan G, Brass AL and Kellam P

    Nature 2012;484;7395;519-23

  • Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events.

    Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B, Douaihy B, Ghassibe-Sabbagh M, Rafatpanah H, Ghanbari M, Whale J, Balanovsky O, Wells RS, Comas D, Tyler-Smith C, Zalloua PA and Genographic Consortium

    PloS one 2012;7;3;e34288

  • Exploration of signals of positive selection derived from genotype-based human genome scans using re-sequencing data.

    Hu M, Ayub Q, Guerra-Assunção JA, Long Q, Ning Z, Huang N, Romero IG, Mamanova L, Akan P, Liu X, Coffey AJ, Turner DJ, Swerdlow H, Burton J, Quail MA, Conrad DF, Enright AJ, Tyler-Smith C and Xue Y

    Human genetics 2012;131;5;665-74

  • A systematic survey of loss-of-function variants in human protein-coding genes.

    MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, 1000 Genomes Project Consortium, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB and Tyler-Smith C

    Science (New York, N.Y.) 2012;335;6070;823-8

  • High altitude adaptation in Daghestani populations from the Caucasus.

    Pagani L, Ayub Q, MacArthur DG, Xue Y, Baillie JK, Chen Y, Kozarewa I, Turner DJ, Tofanelli S, Bulayeva K, Kidd K, Paoli G and Tyler-Smith C

    Human genetics 2012;131;3;423-33

  • Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool.

    Pagani L, Kivisild T, Tarekegn A, Ekong R, Plaster C, Gallego Romero I, Ayub Q, Mehdi SQ, Thomas MG, Luiselli D, Bekele E, Bradman N, Balding DJ and Tyler-Smith C

    American journal of human genetics 2012;91;1;83-96

  • Impact of restricted marital practices on genetic variation in an endogamous Gujarati group.

    Pemberton TJ, Li FY, Hanson EK, Mehta NU, Choi S, Ballantyne J, Belmont JW, Rosenberg NA, Tyler-Smith C and Patel PI

    American journal of physical anthropology 2012;149;1;92-103

  • Evolutionary genetics of the human Rh blood group system.

    Perry GH, Xue Y, Smith RS, Meyer WK, Calışkan M, Yanez-Cuna O, Lee AS, Gutiérrez-Arcelus M, Ober C, Hollox EJ, Tyler-Smith C and Lee C

    Human genetics 2012;131;7;1205-16

  • Insights into hominid evolution from the gorilla genome sequence.

    Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, McCarthy S, Montgomery SH, Schwalie PC, Tang YA, Ward MC, Xue Y, Yngvadottir B, Alkan C, Andersen LN, Ayub Q, Ball EV, Beal K, Bradley BJ, Chen Y, Clee CM, Fitzgerald S, Graves TA, Gu Y, Heath P, Heger A, Karakoc E, Kolb-Kokocinski A, Laird GK, Lunter G, Meader S, Mort M, Mullikin JC, Munch K, O'Connor TD, Phillips AD, Prado-Martinez J, Rogers AS, Sajjadian S, Schmidt D, Shaw K, Simpson JT, Stenson PD, Turner DJ, Vigilant L, Vilella AJ, Whitener W, Zhu B, Cooper DN, de Jong P, Dermitzakis ET, Eichler EE, Flicek P, Goldman N, Mundy NI, Ning Z, Odom DT, Ponting CP, Quail MA, Ryder OA, Searle SM, Warren WC, Wilson RK, Schierup MH, Rogers J, Tyler-Smith C and Durbin R

    Nature 2012;483;7388;169-75

  • A British approach to sampling.

    Tyler-Smith C and Xue Y

    European journal of human genetics : EJHG 2012;20;2;129-30

  • Sibling rivalry among paralogs promotes evolution of the human brain.

    Tyler-Smith C and Xue Y

    Cell 2012;149;4;737-9

  • Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing.

    Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, Shaw K, Stenson PD, Cooper DN, Tyler-Smith C and 1000 Genomes Project Consortium

    American journal of human genetics 2012;91;6;1022-32

Team publications 2011

  • Dindel: accurate indel calls from short-read data.

    Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH and Durbin R

    Genome research 2011;21;6;961-73

  • Comprehensive comparison of three commercial human whole-exome capture platforms.

    Asan, Xu Y, Jiang H, Tyler-Smith C, Xue Y, Jiang T, Wang J, Wu M, Liu X, Tian G, Wang J, Wang J, Yang H and Zhang X

    Genome biology 2011;12;9;R95

  • Male lineages in the Himalayan foothills: a commentary on Y-chromosome haplogroup diversity in the sub-Himalayan Terai and Duars populations of East India.

    Ayub Q

    Journal of human genetics 2011;56;12;813-4

  • Parallel evolution of genes and languages in the Caucasus region.

    Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, Pocheshkhova E, Haber M, Platt D, Schurr T, Haak W, Kuznetsova M, Radzhabov M, Balaganskaya O, Romanov A, Zakharova T, Soria Hernanz DF, Zalloua P, Koshel S, Ruhlen M, Renfrew C, Wells RS, Tyler-Smith C, Balanovska E and Genographic Consortium

    Molecular biology and evolution 2011;28;10;2905-20

  • Gene inactivation and its implications for annotation in the era of personal genomics.

    Balasubramanian S, Habegger L, Frankish A, MacArthur DG, Harte R, Tyler-Smith C, Harrow J and Gerstein M

    Genes & development 2011;25;1;1-10

  • Population genetic structure in Indian Austroasiatic speakers: the role of landscape barriers and sex-specific admixture.

    Chaubey G, Metspalu M, Choi Y, Mägi R, Romero IG, Soares P, van Oven M, Behar DM, Rootsi S, Hudjashov G, Mallick CB, Karmin M, Nelis M, Parik J, Reddy AG, Metspalu E, van Driem G, Xue Y, Tyler-Smith C, Thangaraj K, Singh L, Remm M, Richards MB, Lahr MM, Kayser M, Villems R and Kivisild T

    Molecular biology and evolution 2011;28;2;1013-24

  • A world in a grain of sand: human history from genetic data.

    Colonna V, Pagani L, Xue Y and Tyler-Smith C

    Genome biology 2011;12;11;234

  • Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing.

    de Gruijter JM, Lao O, Vermeulen M, Xue Y, Woodwark C, Gillson CJ, Coffey AJ, Ayub Q, Mehdi SQ, Kayser M and Tyler-Smith C

    Investigative genetics 2011;2;1;24

  • Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon.

    Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA, Youhanna SC, Saade S, Soria-Hernanz DF, Royyuru A, Wells RS, Tyler-Smith C, Zalloua PA and Genographic Consortium

    European journal of human genetics : EJHG 2011;19;3;334-40

  • Y-chromosome R-M343 African lineages and sickle cell disease reveal structured assimilation in Lebanon.

    Haber M, Platt DE, Khoury S, Badro DA, Abboud M, Tyler-Smith C and Zalloua PA

    Journal of human genetics 2011;56;1;29-33

  • A worldwide analysis of beta-defensin copy number variation suggests recent selection of a high-expressing DEFB103 gene copy in East Asia.

    Hardwick RJ, Machado LR, Zuccherato LW, Antolinos S, Xue Y, Shawa N, Gilman RH, Cabrera L, Berg DE, Tyler-Smith C, Kelly P, Tarazona-Santos E and Hollox EJ

    Human mutation 2011;32;7;743-50

  • PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing.

    Long Q, Jeffares DC, Zhang Q, Ye K, Nizhynska V, Ning Z, Tyler-Smith C and Nordborg M

    PloS one 2011;6;1;e15292

  • The functional spectrum of low-frequency coding variation.

    Marth GT, Yu F, Indap AR, Garimella K, Gravel S, Leong WF, Tyler-Smith C, Bainbridge M, Blackwell T, Zheng-Bradley X, Chen Y, Challis D, Clarke L, Ball EV, Cibulskis K, Cooper DN, Fulton B, Hartl C, Koboldt D, Muzny D, Smith R, Sougnez C, Stewart C, Ward A, Yu J, Xue Y, Altshuler D, Bustamante CD, Clark AG, Daly M, DePristo M, Flicek P, Gabriel S, Mardis E, Palotie A, Gibbs R and 1000 Genomes Project

    Genome biology 2011;12;9;R84

  • Indian Siddis: African descendants with Indian admixture.

    Shah AM, Tamang R, Moorjani P, Rani DS, Govindaraj P, Kulkarni G, Bhattacharya T, Mustak MS, Bhaskar LV, Reddy AG, Gadhvi D, Gai PB, Chaubey G, Patterson N, Reich D, Tyler-Smith C, Singh L and Thangaraj K

    American journal of human genetics 2011;89;1;154-61

  • Response to the comment on "The hare and the tortoise: One small step for four SNPs, one giant leap for SNP-kind".

    Xue Y and Tyler-Smith C

    Forensic science international. Genetics 2011;5;4;361-2

  • α-Actinin-3 deficiency is associated with reduced bone mass in human and mouse.

    Yang N, Schindeler A, McDonald MM, Seto JT, Houweling PJ, Lek M, Hogarth M, Morse AR, Raftery JM, Balasuriya D, MacArthur DG, Berman Y, Quinlan KG, Eisman JA, Nguyen TV, Center JR, Prince RL, Wilson SG, Zhu K, Little DG and North KN

    Bone 2011;49;4;790-8

  • Replication of the association of a MET variant with autism in a Chinese Han population.

    Zhou X, Xu Y, Wang J, Zhou H, Liu X, Ayub Q, Wang X, Tyler-Smith C, Wu L and Xue Y

    PloS one 2011;6;11;e27428

Team publications 2010

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    Nature 2010;467;7319;1061-73

  • A predominantly neolithic origin for European paternal lineages.

    Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A, Demaine AG, Barbujani G, Previderè C, Wilson IJ, Tyler-Smith C and Jobling MA

    PLoS biology 2010;8;1;e1000285

  • Origins and functional impact of copy number variation in the human genome.

    Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW and Hurles ME

    Nature 2010;464;7289;704-12

  • Traces of sub-Saharan and Middle Eastern lineages in Indian Muslim populations.

    Eaaswarkhanth M, Haque I, Ravesh Z, Romero IG, Meganathan PR, Dubey B, Khan FA, Chaubey G, Kivisild T, Tyler-Smith C, Singh L and Thangaraj K

    European journal of human genetics : EJHG 2010;18;3;354-63

  • Loss-of-function variants in the genomes of healthy humans.

    MacArthur DG and Tyler-Smith C

    Human molecular genetics 2010;19;R2;R125-30

  • Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing.

    Park H, Kim JI, Ju YS, Gokcumen O, Mills RE, Kim S, Lee S, Suh D, Hong D, Kang HP, Yoo YJ, Shin JY, Kim HJ, Yavartanoo M, Chang YW, Ha JS, Chong W, Hwang GR, Darvishi K, Kim H, Yang SJ, Yang KS, Kim H, Hurles ME, Scherer SW, Carter NP, Tyler-Smith C, Lee C and Seo JS

    Nature genetics 2010;42;5;400-5

  • A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations.

    Shi W, Ayub Q, Vermeulen M, Shao RG, Zuniga S, van der Gaag K, de Knijff P, Kayser M, Xue Y and Tyler-Smith C

    Molecular biology and evolution 2010;27;2;385-93

  • Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a.

    Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky LA, King RJ, Lin AA, Chow CE, Semino O, Battaglia V, Kutuev I, Järve M, Chaubey G, Ayub Q, Mohyuddin A, Mehdi SQ, Sengupta S, Rogaev EI, Khusnutdinova EK, Pshenichnov A, Balanovsky O, Balanovska E, Jeran N, Augustin DH, Baldovic M, Herrera RJ, Thangaraj K, Singh V, Singh L, Majumder P, Rudan P, Primorac D, Villems R and Kivisild T

    European journal of human genetics : EJHG 2010;18;4;479-84

  • Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.

    Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF, Giannoulatou E, Holmes C, Marchini JL, Stirrups K, Tobin MD, Wain LV, Yau C, Aerts J, Ahmad T, Andrews TD, Arbury H, Attwood A, Auton A, Ball SG, Balmforth AJ, Barrett JC, Barroso I, Barton A, Bennett AJ, Bhaskar S, Blaszczyk K, Bowes J, Brand OJ, Braund PS, Bredin F, Breen G, Brown MJ, Bruce IN, Bull J, Burren OS, Burton J, Byrnes J, Caesar S, Clee CM, Coffey AJ, Connell JM, Cooper JD, Dominiczak AF, Downes K, Drummond HE, Dudakia D, Dunham A, Ebbs B, Eccles D, Edkins S, Edwards C, Elliot A, Emery P, Evans DM, Evans G, Eyre S, Farmer A, Ferrier IN, Feuk L, Fitzgerald T, Flynn E, Forbes A, Forty L, Franklyn JA, Freathy RM, Gibbs P, Gilbert P, Gokumen O, Gordon-Smith K, Gray E, Green E, Groves CJ, Grozeva D, Gwilliam R, Hall A, Hammond N, Hardy M, Harrison P, Hassanali N, Hebaishi H, Hines S, Hinks A, Hitman GA, Hocking L, Howard E, Howard P, Howson JM, Hughes D, Hunt S, Isaacs JD, Jain M, Jewell DP, Johnson T, Jolley JD, Jones IR, Jones LA, Kirov G, Langford CF, Lango-Allen H, Lathrop GM, Lee J, Lee KL, Lees C, Lewis K, Lindgren CM, Maisuria-Armer M, Maller J, Mansfield J, Martin P, Massey DC, McArdle WL, McGuffin P, McLay KE, Mentzer A, Mimmack ML, Morgan AE, Morris AP, Mowat C, Myers S, Newman W, Nimmo ER, O'Donovan MC, Onipinla A, Onyiah I, Ovington NR, Owen MJ, Palin K, Parnell K, Pernet D, Perry JR, Phillips A, Pinto D, Prescott NJ, Prokopenko I, Quail MA, Rafelt S, Rayner NW, Redon R, Reid DM, Renwick, Ring SM, Robertson N, Russell E, St Clair D, Sambrook JG, Sanderson JD, Schuilenburg H, Scott CE, Scott R, Seal S, Shaw-Hawkins S, Shields BM, Simmonds MJ, Smyth DJ, Somaskantharajah E, Spanova K, Steer S, Stephens J, Stevens HE, Stone MA, Su Z, Symmons DP, Thompson JR, Thomson W, Travers ME, Turnbull C, Valsesia A, Walker M, Walker NM, Wallace C, Warren-Perry M, Watkins NA, Webster J, Weedon MN, Wilson AG, Woodburn M, Wordsworth BP, Young AH, Zeggini E, Carter NP, Frayling TM, Lee C, McVean G, Munroe PB, Palotie A, Sawcer SJ, Scherer SW, Strachan DP, Tyler-Smith C, Brown MA, Burton PR, Caulfield MJ, Compston A, Farrall M, Gough SC, Hall AS, Hattersley AT, Hill AV, Mathew CG, Pembrey M, Satsangi J, Stratton MR, Worthington J, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand W, Parkes M, Rahman N, Todd JA, Samani NJ and Donnelly P

    Nature 2010;464;7289;713-20

  • Distinct variants at LIN28B influence growth in height from birth to adulthood.

    Widén E, Ripatti S, Cousminer DL, Surakka I, Lappalainen T, Järvelin MR, Eriksson JG, Raitakari O, Salomaa V, Sovio U, Hartikainen AL, Pouta A, McCarthy MI, Osmond C, Kajantie E, Lehtimäki T, Viikari J, Kähönen M, Tyler-Smith C, Freimer N, Hirschhorn JN, Peltonen L and Palotie A

    American journal of human genetics 2010;86;5;773-82

  • The hare and the tortoise: one small step for four SNPs, one giant leap for SNP-kind.

    Xue Y and Tyler-Smith C

    Forensic science international. Genetics 2010;4;2;59-61

Team publications 2009

  • Genetic variation in South Asia: assessing the influences of geography, language and ethnicity for understanding history and disease risk.

    Ayub Q and Tyler-Smith C

    Briefings in functional genomics & proteomics 2009;8;5;395-404

  • Genomic complexity of the Y-STR DYS19: inversions, deletions and founder lineages carrying duplications.

    Balaresque P, Parkin EJ, Roewer L, Carvalho-Silva DR, Mitchell RJ, van Oorschot RA, Henke J, Stoneking M, Nasidze I, Wetton J, de Knijff P, Tyler-Smith C and Jobling MA

    International journal of legal medicine 2009;123;1;15-23

  • A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia.

    Dhandapany PS, Sadayappan S, Xue Y, Powell GT, Rani DS, Nallari P, Rai TS, Khullar M, Soares P, Bahl A, Tharkan JM, Vaideeswar P, Rathinavel A, Narasimhan C, Ayapati DR, Ayub Q, Mehdi SQ, Oppenheimer S, Richards MB, Price AL, Patterson N, Reich D, Singh L, Tyler-Smith C and Thangaraj K

    Nature genetics 2009;41;2;187-91

  • Geographical structure of the Y-chromosomal genetic landscape of the Levant: a coastal-inland contrast.

    El-Sibai M, Platt DE, Haber M, Xue Y, Youhanna SC, Wells RS, Izaabel H, Sanyoura MF, Harmanani H, Bonab MA, Behbehani J, Hashwa F, Tyler-Smith C, Zalloua PA and Genographic Consortium

    Annals of human genetics 2009;73;Pt 6;568-81

  • TSPY1 copy number variation influences spermatogenesis and shows differences among Y lineages.

    Giachini C, Nuti F, Turner DJ, Laface I, Xue Y, Daguin F, Forti G, Tyler-Smith C and Krausz C

    The Journal of clinical endocrinology and metabolism 2009;94;10;4016-22

  • Geographical affinities of the HapMap samples.

    He M, Gitschier J, Zerjal T, de Knijff P, Tyler-Smith C and Xue Y

    PloS one 2009;4;3;e4684

  • The peopling of Korea revealed by analyses of mitochondrial DNA and Y-chromosomal markers.

    Jin HJ, Tyler-Smith C and Kim W

    PloS one 2009;4;1;e4210

  • Phenotypic variation within European carriers of the Y-chromosomal gr/gr deletion is independent of Y-chromosomal background.

    Krausz C, Giachini C, Xue Y, O'Bryan MK, Gromoll J, Rajpert-de Meyts E, Oliva R, Aknin-Seifer I, Erdei E, Jorgensen N, Simoni M, Ballescà JL, Levy R, Balercia G, Piomboni P, Nieschlag E, Forti G, McLachlan R and Tyler-Smith C

    Journal of medical genetics 2009;46;1;21-31

  • HI: haplotype improver using paired-end short reads.

    Long Q, MacArthur D, Ning Z and Tyler-Smith C

    Bioinformatics (Oxford, England) 2009;25;18;2436-7

  • Biology of Genomes: making sense of sequence.

    Macarthur DG

    Genome medicine 2009;1;6;61

  • Genetic structure of nomadic Bedouin from Kuwait.

    Mohammad T, Xue Y, Evison M and Tyler-Smith C

    Heredity 2009;103;5;425-33

  • A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium.

    Soranzo N, Spector TD, Mangino M, Kühnel B, Rendon A, Teumer A, Willenborg C, Wright B, Chen L, Li M, Salo P, Voight BF, Burns P, Laskowski RA, Xue Y, Menzel S, Altshuler D, Bradley JR, Bumpstead S, Burnett MS, Devaney J, Döring A, Elosua R, Epstein SE, Erber W, Falchi M, Garner SF, Ghori MJ, Goodall AH, Gwilliam R, Hakonarson HH, Hall AS, Hammond N, Hengstenberg C, Illig T, König IR, Knouff CW, McPherson R, Melander O, Mooser V, Nauck M, Nieminen MS, O'Donnell CJ, Peltonen L, Potter SC, Prokisch H, Rader DJ, Rice CM, Roberts R, Salomaa V, Sambrook J, Schreiber S, Schunkert H, Schwartz SM, Serbanovic-Canic J, Sinisalo J, Siscovick DS, Stark K, Surakka I, Stephens J, Thompson JR, Völker U, Völzke H, Watkins NA, Wells GA, Wichmann HE, Van Heel DA, Tyler-Smith C, Thein SL, Kathiresan S, Perola M, Reilly MP, Stewart AF, Erdmann J, Samani NJ, Meisinger C, Greinacher A, Deloukas P, Ouwehand WH and Gieger C

    Nature genetics 2009;41;11;1182-90

  • A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation.

    Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, O'Meara S, Latimer C, Dicks E, Menzies A, Stephens P, Blow M, Greenman C, Xue Y, Tyler-Smith C, Thompson D, Gray K, Andrews J, Barthorpe S, Buck G, Cole J, Dunmore R, Jones D, Maddison M, Mironenko T, Turner R, Turrell K, Varian J, West S, Widaa S, Wray P, Teague J, Butler A, Jenkinson A, Jia M, Richardson D, Shepherd R, Wooster R, Tejada MI, Martinez F, Carvill G, Goliath R, de Brouwer AP, van Bokhoven H, Van Esch H, Chelly J, Raynaud M, Ropers HH, Abidi FE, Srivastava AK, Cox J, Luo Y, Mallya U, Moon J, Parnau J, Mohammed S, Tolmie JL, Shoubridge C, Corbett M, Gardner A, Haan E, Rujirabanjerd S, Shaw M, Vandeleur L, Fullston T, Easton DF, Boyle J, Partington M, Hackett A, Field M, Skinner C, Stevenson RE, Bobrow M, Turner G, Schwartz CE, Gecz J, Raymond FL, Futreal PA and Stratton MR

    Nature genetics 2009;41;5;535-43

  • The will-o'-the-wisp of genetics--hunting for the azoospermia factor gene.

    Tyler-Smith C and Krausz C

    The New England journal of medicine 2009;360;9;925-7

  • Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms.

    Vermeulen M, Wollstein A, van der Gaag K, Lao O, Xue Y, Wang Q, Roewer L, Knoblauch H, Tyler-Smith C, de Knijff P and Kayser M

    Forensic science international. Genetics 2009;3;4;205-13

  • Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree.

    Xue Y, Wang Q, Long Q, Ng BL, Swerdlow H, Burton J, Skuce C, Taylor R, Abdellah Z, Zhao Y, Asan, MacArthur DG, Quail MA, Carter NP, Yang H and Tyler-Smith C

    Current biology : CB 2009;19;17;1453-7

  • Population differentiation as an indicator of recent positive selection in humans: an empirical evaluation.

    Xue Y, Zhang X, Huang N, Daly A, Gillson CJ, Macarthur DG, Yngvadottir B, Nica AC, Woodwark C, Chen Y, Conrad DF, Ayub Q, Mehdi SQ, Li P and Tyler-Smith C

    Genetics 2009;183;3;1065-77

  • The promise and reality of personal genomics.

    Yngvadottir B, Macarthur DG, Jin H and Tyler-Smith C

    Genome biology 2009;10;9;237

  • A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs.

    Yngvadottir B, Xue Y, Searle S, Hunt S, Delgado M, Morrison J, Whittaker P, Deloukas P and Tyler-Smith C

    American journal of human genetics 2009;84;2;224-34

Team publications 2008

  • Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis.

    Balaresque P, Bowden GR, Parkin EJ, Omran GA, Heyer E, Quintana-Murci L, Roewer L, Stoneking M, Nasidze I, Carvalho-Silva DR, Tyler-Smith C, de Knijff P and Jobling MA

    Human mutation 2008;29;10;1171-80

  • A novel 154-bp deletion in the human mitochondrial DNA control region in healthy individuals.

    Behar DM, Blue-Smith J, Soria-Hernanz DF, Tzur S, Hadid Y, Bormans C, Moen A, Tyler-Smith C, Quintana-Murci L, Wells RS and Genographic Consortium

    Human mutation 2008;29;12;1387-91

  • The dawn of human matrilineal diversity.

    Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D, Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells RS, Rosset S and Genographic Consortium

    American journal of human genetics 2008;82;5;1130-40

  • The functional impact of structural variation in humans.

    Hurles ME, Dermitzakis ET and Tyler-Smith C

    Trends in genetics : TIG 2008;24;5;238-45

  • Copy number variation and evolution in humans and chimpanzees.

    Perry GH, Yang F, Marques-Bonet T, Murphy C, Fitzgerald T, Lee AS, Hyland C, Stone AC, Hurles ME, Tyler-Smith C, Eichler EE, Carter NP, Lee C and Redon R

    Genome research 2008;18;11;1698-710

  • Maximum-likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification.

    Rosset S, Wells RS, Soria-Hernanz DF, Tyler-Smith C, Royyuru AK, Behar DM and Genographic Consortium

    Genetics 2008;180;3;1511-24

  • Maternal footprints of Southeast Asians in North India.

    Thangaraj K, Chaubey G, Kivisild T, Selvi Rani D, Singh VK, Ismail T, Carvalho-Silva D, Metspalu M, Bhaskar LV, Reddy AG, Chandra S, Pande V, Prathap Naidu B, Adarsh N, Verma A, Jyothi IA, Mallick CB, Shrivastava N, Devasena R, Kumari B, Singh AK, Dwivedi SK, Singh S, Rao G, Gupta P, Sonvane V, Kumari K, Basha A, Bhargavi KR, Lalremruata A, Gupta AK, Kaur G, Reddy KK, Rao AP, Villems R, Tyler-Smith C and Singh L

    Human heredity 2008;66;1;1-9

  • Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping.

    Turner DJ, Tyler-Smith C and Hurles ME

    Nucleic acids research 2008;36;13;e82

  • An evolutionary perspective on Y-chromosomal variation and male infertility.

    Tyler-Smith C

    International journal of andrology 2008;31;4;376-82

  • Variation of the oxytocin/neurophysin I (OXT) gene in four human populations.

    Xu Y, Xue Y, Asan, Daly A, Wu L and Tyler-Smith C

    Journal of human genetics 2008;53;7;637-43

  • Adaptive evolution of UGT2B17 copy-number variation.

    Xue Y, Sun D, Daly A, Yang F, Zhou X, Zhao M, Huang N, Zerjal T, Lee C, Carter NP, Hurles ME and Tyler-Smith C

    American journal of human genetics 2008;83;3;337-46

  • Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean.

    Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, Haber M, Xue Y, Izaabel H, Bosch E, Adams SM, Arroyo E, López-Parra AM, Aler M, Picornell A, Ramon M, Jobling MA, Comas D, Bertranpetit J, Wells RS, Tyler-Smith C and Genographic Consortium

    American journal of human genetics 2008;83;5;633-42

  • Y-chromosomal diversity in Lebanon is structured by recent historical events.

    Zalloua PA, Xue Y, Khalife J, Makhoul N, Debiane L, Platt DE, Royyuru AK, Herrera RJ, Hernanz DF, Blue-Smith J, Wells RS, Comas D, Bertranpetit J, Tyler-Smith C and Genographic Consortium

    American journal of human genetics 2008;82;4;873-82

Team

Team members

Qasim Ayub
unknown
Yuan Chen
Senior Computer Biologist
Vincenza Colonna
Postdoctorall Fellow
Jose Espinosa
Visiting Undergraduate Student
Min Hu
PhD student
Daniel MacArthur
Visiting Scientist
Luca Pagani
Visiting PhD Student
Wei Wei
Visiting PhD student
Yali Xue
Staff Scientist
Bryndis Yngvadottir
by1@sanger.ac.ukunknown

Qasim Ayub

- unknown

I graduated from the Khyber Medical College, Peshawar, Pakistan and obtained my Ph.D. from the University of North Texas, USA in 1992 on a Thomas Jefferson Fellowship. Back in Pakistan I joined the Biomedical and Genetic Engineering Laboratories that became the focal point for the Human Genome Diversity Project's South Asian sample collection. Over the last decade I have analyzed DNA variation in ethnic and linguistic groups from Pakistan, in order to understand their genetic origins and relatedness with world populations. In 2006 I was awarded the President of Pakistan's Order of Imtiaz for contributions to science.

Research

I joined the Human Evolution Team in 2008 and am responsible for the team's wet lab and am part of Analysis Group of The 1000 Genomes Project. My research focuses on the analyses of DNA variation in humans and primates in order to understand molecular evolutionary processes. I am also carrying out targeted re-sequencing of several human Y chromosomes from different parts of the world in order to refine the Y phylogeny. Using this approach we have identified a common set of primers which are available for investigators who are interested in particular Y-chromosomal lineages.

References

  • Population differentiation as an indicator of recent positive selection in humans: an empirical evaluation.

    Xue Y, Zhang X, Huang N, Daly A, Gillson CJ, Macarthur DG, Yngvadottir B, Nica AC, Woodwark C, Chen Y, Conrad DF, Ayub Q, Mehdi SQ, Li P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    We have evaluated the extent to which SNPs identified by genomewide surveys as showing unusually high levels of population differentiation in humans have experienced recent positive selection, starting from a set of 32 nonsynonymous SNPs in 27 genes highlighted by the HapMap1 project. These SNPs were genotyped again in the HapMap samples and in the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) panel of 52 populations representing worldwide diversity; extended haplotype homozygosity was investigated around all of them, and full resequence data were examined for 9 genes (5 from public sources and 4 from new data sets). For 7 of the genes, genotyping errors were responsible for an artifactual signal of high population differentiation and for 2, the population differentiation did not exceed our significance threshold. For the 18 genes with confirmed high population differentiation, 3 showed evidence of positive selection as measured by unusually extended haplotypes within a population, and 7 more did in between-population analyses. The 9 genes with resequence data included 7 with high population differentiation, and 5 showed evidence of positive selection on the haplotype carrying the nonsynonymous SNP from skewed allele frequency spectra; in addition, 2 showed evidence of positive selection on unrelated haplotypes. Thus, in humans, high population differentiation is (apart from technical artifacts) an effective way of enriching for recently selected genes, but is not an infallible pointer to recent positive selection supported by other lines of evidence.

    Funded by: Wellcome Trust

    Genetics 2009;183;3;1065-77

  • Genetic variation in South Asia: assessing the influences of geography, language and ethnicity for understanding history and disease risk.

    Ayub Q and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. qa1@sanger.ac.uk

    South Asia is home to more than 1.5 billion humans representing many diverse ethnicities, linguistic and religious groups and representing almost one-quarter of humanity. Modern humans arrived here soon after their departure from Africa approximately 50,000-70,000 years before present (YBP) and several subsequent human migrations and invasions, as well as the unique social structure of the region, have helped shape the pattern of genetic diversity currently observed in these populations. Over the last few decades population geneticists and molecular anthropologists have analyzed DNA variation in indigenous populations from this region in order to catalog their genetic relationships and histories. The emphasis is gradually shifting from the study of population origins to high resolution surveys of DNA variation to address issues of population stratification and genetic susceptibility or resistance to diseases in genome-wide association surveys. We present a historical overview of the genetic studies carried out on populations from this region in order to understand the influence of geographic, linguistic and religious factors on population diversity in this region, and discuss future prospects in light of developments in high throughput genotyping and next generation sequencing technologies.

    Funded by: Wellcome Trust

    Briefings in functional genomics & proteomics 2009;8;5;395-404

  • A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia.

    Dhandapany PS, Sadayappan S, Xue Y, Powell GT, Rani DS, Nallari P, Rai TS, Khullar M, Soares P, Bahl A, Tharkan JM, Vaideeswar P, Rathinavel A, Narasimhan C, Ayapati DR, Ayub Q, Mehdi SQ, Oppenheimer S, Richards MB, Price AL, Patterson N, Reich D, Singh L, Tyler-Smith C and Thangaraj K

    Department of Biochemistry, Madurai Kamaraj University, Madurai 625 021, India.

    Heart failure is a leading cause of mortality in South Asians. However, its genetic etiology remains largely unknown. Cardiomyopathies due to sarcomeric mutations are a major monogenic cause for heart failure (MIM600958). Here, we describe a deletion of 25 bp in the gene encoding cardiac myosin binding protein C (MYBPC3) that is associated with heritable cardiomyopathies and an increased risk of heart failure in Indian populations (initial study OR = 5.3 (95% CI = 2.3-13), P = 2 x 10(-6); replication study OR = 8.59 (3.19-25.05), P = 3 x 10(-8); combined OR = 6.99 (3.68-13.57), P = 4 x 10(-11)) and that disrupts cardiomyocyte structure in vitro. Its prevalence was found to be high (approximately 4%) in populations of Indian subcontinental ancestry. The finding of a common risk factor implicated in South Asian subjects with cardiomyopathy will help in identifying and counseling individuals predisposed to cardiac diseases in this region.

    Funded by: NHGRI NIH HHS: R01 HG006399-02; Wellcome Trust: 077009

    Nature genetics 2009;41;2;187-91

  • Y-chromosomal evidence for a limited Greek contribution to the Pathan population of Pakistan.

    Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, Underhill PA and Ayub Q

    Biomedical and Genetic Engineering Division, Dr. AQ Khan Research Laboratories, Islamabad, Pakistan.

    Three Pakistani populations residing in northern Pakistan, the Burusho, Kalash and Pathan claim descent from Greek soldiers associated with Alexander's invasion of southwest Asia. Earlier studies have excluded a substantial Greek genetic input into these populations, but left open the question of a smaller contribution. We have now typed 90 binary polymorphisms and 16 multiallelic, short-tandem-repeat (STR) loci mapping to the male-specific portion of the human Y chromosome in 952 males, including 77 Greeks in order to re-investigate this question. In pairwise comparisons between the Greeks and the three Pakistani populations using genetic distance measures sensitive to recent events, the lowest distances were observed between the Greeks and the Pathans. Clade E3b1 lineages, which were frequent in the Greeks but not in Pakistan, were nevertheless observed in two Pathan individuals, one of whom shared a 16 Y-STR haplotype with the Greeks. The worldwide distribution of a shortened (9 Y-STR) version of this haplotype, determined from database information, was concentrated in Macedonia and Greece, suggesting an origin there. Although based on only a few unrelated descendants, this provides strong evidence for a European origin for a small proportion of the Pathan Y chromosomes.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2007;15;1;121-6

  • Detection of novel Y SNPs provides further insights into Y chromosomal variation in Pakistan.

    Mohyuddin A, Ayub Q, Underhill PA, Tyler-Smith C and Mehdi SQ

    Biomedical and Genetic Engineering Laboratories, G. P. O Box 2891, 44000, Islamabad, Pakistan.

    Biallelic polymorphisms on the Y chromosome have been extensively used to study the history, evolution, and migration patterns of world populations. In this study we screened 8.5 kb of Y chromosomal DNA for single nucleotide polymorphisms (SNPs) in a panel of 95 male individuals belonging to different haplogroups. Five novel Y-SNPs (PK1-5) were identified, four in the Pakistani sample and one in an African sample. The ancestral state of each SNP was determined in two chimpanzee samples and a variety of Pakistani ethnic groups. In addition to these novel Y-SNPs 77 additional markers on the Y chromosome were analyzed to place the SNPs on the phylogenetic tree of Y chromosomal lineages and to further investigate extant human Y chromosomal variation within Pakistan. BATWING analysis gave an estimate of between 2,500 and 7,300 YBP for population expansion in Pakistan which coincides with the period of the Indus Valley civilizations.

    Journal of human genetics 2006;51;4;375-8

  • Investigation of the Greek ancestry of populations from northern Pakistan.

    Mansoor A, Mazhar K, Khaliq S, Hameed A, Rehman S, Siddiqi S, Papaioannou M, Cavalli-Sforza LL, Mehdi SQ and Ayub Q

    Biomedical and Genetic Engineering Division, Dr. A.Q. Khan Research Laboratories, G.P.O. Box 2891, 44000 Islamabad, Pakistan.

    Three populations from northern Pakistan, the Burusho, Kalash, and Pathan, claim descent from soldiers left behind by Alexander the Great after his invasion of the Indo-Pak subcontinent. In order to investigate their genetic relationships, we analyzed nine Alu insertion polymorphisms and 113 autosomal microsatellites in the extant Pakistani and Greek populations. Principal component, phylogenetic, and structure analyses show that the Kalash are genetically distinct, and that the Burusho and Pathan populations are genetically close to each other and the Greek population. Admixture estimates suggest a small Greek contribution to the genetic pool of the Burusho and Pathan and demonstrate that these two northern Pakistani populations share a common Indo-European gene pool that probably predates Alexander's invasion. The genetically isolated Kalash population may represent the genetic pool of ancestral Eurasian populations of Central Asia or early Indo-European nomadic pastoral tribes that became sequestered in the valleys of the Hindu Kush Mountains.

    Human genetics 2004;114;5;484-90

  • Reconstruction of human evolutionary tree using polymorphic autosomal microsatellites.

    Ayub Q, Mansoor A, Ismail M, Khaliq S, Mohyuddin A, Hameed A, Mazhar K, Rehman S, Siddiqi S, Papaioannou M, Piazza A, Cavalli-Sforza LL and Mehdi SQ

    Biomedical and Genetic Engineering Division, Dr. A.Q. Khan Research Laboratories, Islamabad 44000, Pakistan.

    Allelic frequencies of 182 tri- and tetra-autosomal microsatellites were used to examine phylogenetic relationships among 19 extant human populations. In particular, because the languages of the Basques and Hunza Burusho have been suggested to have an ancient relationship, this study sought to explore the genetic relationship between these two major language isolate populations and to compare them with other human populations. The work presented here shows that the microsatellite allelic diversity and the number of unique alleles were highest in sub-Saharan Africans. Neighbor-joining trees based on genetic distances and principal component analyses separated populations from different continents, and are consistent with an African origin for modern humans. For the first time, with biparentally transmitted markers, the microsatellite tree also shows that the San are the first branch of the human tree before the branch leading to all other Africans. In contrast to an earlier study, these results provided no evidence of a genetic relationship among the two language isolate groups. Genetic relationships, as ascertained by these microsatellites, are dictated primarily by geographic proximity rather than by remote linguistic origin, Mantel test, R(0) = 0.484, g = 3.802 (critical g value = 1.645; P = 0.05).

    American journal of physical anthropology 2003;122;3;259-68

  • The genetic legacy of the Mongols.

    Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q, Mohyuddin A, Fu S, Li P, Yuldasheva N, Ruzibakiev R, Xu J, Shu Q, Du R, Yang H, Hurles ME, Robinson E, Gerelsaikhan T, Dashnyam B, Mehdi SQ and Tyler-Smith C

    Department of Biochemistry, University of Oxford, Oxford, United Kingdom.

    We have identified a Y-chromosomal lineage with several unusual features. It was found in 16 populations throughout a large region of Asia, stretching from the Pacific to the Caspian Sea, and was present at high frequency: approximately 8% of the men in this region carry it, and it thus makes up approximately 0.5% of the world total. The pattern of variation within the lineage suggested that it originated in Mongolia approximately 1,000 years ago. Such a rapid spread cannot have occurred by chance; it must have been a result of selection. The lineage is carried by likely male-line descendants of Genghis Khan, and we therefore propose that it has spread by a novel form of social selection resulting from their behavior.

    American journal of human genetics 2003;72;3;717-21

  • Y-chromosomal DNA variation in Pakistan.

    Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, Tyler-Smith C and Mehdi SQ

    Biomedical and Genetic Engineering Division, Dr. A. Q. Khan Research Laboratories, Islamabad, Pakistan.

    Eighteen binary polymorphisms and 16 multiallelic, short-tandem-repeat (STR) loci from the nonrecombining portion of the human Y chromosome were typed in 718 male subjects belonging to 12 ethnic groups of Pakistan. These identified 11 stable haplogroups and 503 combination binary marker/STR haplotypes. Haplogroup frequencies were generally similar to those in neighboring geographical areas, and the Pakistani populations speaking a language isolate (the Burushos), a Dravidian language (the Brahui), or a Sino-Tibetan language (the Balti) resembled the Indo-European-speaking majority. Nevertheless, median-joining networks of haplotypes revealed considerable substructuring of Y variation within Pakistan, with many populations showing distinct clusters of haplotypes. These patterns can be accounted for by a common pool of Y lineages, with substantial isolation between populations and drift in the smaller ones. Few comparative genetic or historical data are available for most populations, but the results can be compared with oral traditions about origins. The Y data support the well-established origin of the Parsis in Iran, the suggested descent of the Hazaras from Genghis Khan's army, and the origin of the Negroid Makrani in Africa, but do not support traditions of Tibetan, Syrian, Greek, or Jewish origins for other populations.

    American journal of human genetics 2002;70;5;1107-24

  • Identification and characterisation of novel human Y-chromosomal microsatellites from sequence database information.

    Ayub Q, Mohyuddin A, Qamar R, Mazhar K, Zerjal T, Mehdi SQ and Tyler-Smith C

    Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK.

    1.33 Mb of sequence from the human Y chromosome was searched for tri- to hexanucleotide microsatellites. Twenty loci containing a stretch of eight or more repeat units with complete repeat sequence homo-geneity were found, 18 of which were novel. Six loci (one tri-, four tetra- and one pentanucleotide) were assembled into a single multiplex reaction and their degree of polymorphism was investigated in a sample of 278 males from Pakistan. Diversities of the individual loci ranged from 0.064 to 0.727 in Pakistan, while the haplotype diversity was 0.971. One population, the Hazara, showed particularly low diversity, with predominantly two haplotypes. As the sequence builds up in the databases, direct methods such as this will replace more biased and technically demanding indirect methods for the isolation of microsatellites.

    Nucleic acids research 2000;28;2;e8

Yuan Chen

- Senior Computer Biologist

From 1980 to 1985, I studied for Bachelor of Medicine (equivalent to M.B., Ch.B) at Dept. of Medicine, Tong-Ji Medical University, Wuhan, P.R. China. Then worked one year at a hospital in Wuhan as a Physician. Gained MSc degree in Bio-computing / Bioinformatics at the University of Manchester in 1989; worked as research assistant on molecular modelling projects there. Joined Sanger Center in 1998; investigated SNP detection using overlap clones from chromosome 22. About three years later, moved to European Bioinformatics Institute (EBI), worked on Variation database in Ensembl Project for about 9 years.

Research

From May 2010, I re-joined Sanger Institute Human Evolution Group, developing pipelines using PERL and MYSQL database to provide data for analysis for various projects, such as 1000 Genomes, Gorilla, 500 Exomes, the Gene Selection Detection Projects.

References

  • A calibrated human Y-chromosomal phylogeny based on resequencing.

    Wei W, Ayub Q, Chen Y, McCarthy S, Hou Y, Carbone I, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    We have identified variants present in high-coverage complete sequences of 36 diverse human Y chromosomes from Africa, Europe, South Asia, East Asia, and the Americas, representing eight major haplogroups. After restricting our analysis to 8.97 Mb of the unique male-specific Y sequence, we identified 6662 high-confidence variants, including single-nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs), and indels. We constructed phylogenetic trees using these variants, or subsets of them, and recapitulated the known structure of the tree. Assuming a male mutation rate of 1 × 10(-9) per base pair per year, the time depth of the tree (haplogroups A3-R) was ~101,000-115,000 yr, and the lineages found outside Africa dated to 57,000-74,000 yr, both as expected. In addition, we dated a striking Paleolithic male lineage expansion to 41,000-52,000 yr ago and the node representing the major European Y lineage, R1b, to 4000-13,000 yr ago, supporting a Neolithic origin for these modern European Y chromosomes. In all, we provide a nearly 10-fold increase in the number of Y markers with phylogenetic information, and novel historical insights derived from placing them on a calibrated phylogenetic tree.

    Funded by: Wellcome Trust: 098051

    Genome research 2013;23;2;388-95

  • Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing.

    Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, Shaw K, Stenson PD, Cooper DN, Tyler-Smith C and 1000 Genomes Project Consortium

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    We have assessed the numbers of potentially deleterious variants in the genomes of apparently healthy humans by using (1) low-coverage whole-genome sequence data from 179 individuals in the 1000 Genomes Pilot Project and (2) current predictions and databases of deleterious variants. Each individual carried 281-515 missense substitutions, 40-85 of which were homozygous, predicted to be highly damaging. They also carried 40-110 variants classified by the Human Gene Mutation Database (HGMD) as disease-causing mutations (DMs), 3-24 variants in the homozygous state, and many polymorphisms putatively associated with disease. Whereas many of these DMs are likely to represent disease-allele-annotation errors, between 0 and 8 DMs (0-1 homozygous) per individual are predicted to be highly damaging, and some of them provide information of medical relevance. These analyses emphasize the need for improved annotation of disease alleles both in mutation databases and in the primary literature; some HGMD mutation data have been recategorized on the basis of the present findings, an iterative process that is both necessary and ongoing. Our estimates of deleterious-allele numbers are likely to be subject to both overcounting and undercounting. However, our current best mean estimates of ~400 damaging variants and ~2 bona fide disease mutations per individual are likely to increase rather than decrease as sequencing studies ascertain rare variants more effectively and as additional disease alleles are discovered.

    Funded by: Wellcome Trust: WT098051

    American journal of human genetics 2012;91;6;1022-32

  • An integrated map of genetic variation from 1,092 human genomes.

    1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT and McVean GA

    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I021213/1; British Heart Foundation: RG/09/12/28096; Howard Hughes Medical Institute; Medical Research Council: G0900747(91070); NCI NIH HHS: R01 CA166661, R01CA166661; NCRR NIH HHS: UL1RR024131; NHGRI NIH HHS: P01HG4120, P41HG2371, P41HG4221, R01 HG002898, R01HG2898, R01HG3698, R01HG4719, R01HG4960, R01HG5701, RC2HG5552, RC2HG5581, U01 HG005728, U01HG5208, U01HG5209, U01HG5211, U01HG5214, U01HG5715, U01HG5725, U01HG5728, U01HG6513, U01HG6569, U41HG4568, U54HG3067, U54HG3079, U54HG3273; NHLBI NIH HHS: HL078885, R01HL95045, RC2HL102925, T32HL94284; NIAID NIH HHS: AI077439, AI2009061; NIEHS NIH HHS: ES015794; NIGMS NIH HHS: R01GM59290, T32GM7748, T32GM8283; NIH HHS: DP2OD6514; NIMH NIH HHS: R01MH84698; NLM NIH HHS: T15LM7033; PHS HHS: HHSN268201100040C; Wellcome Trust: WT085475/Z/08/Z, WT085532AIA, WT086084/Z/08/Z, WT089250/Z/09/Z, WT090532/Z/09/Z, WT095552/Z/11/Z, WT098051

    Nature 2012;491;7422;56-65

  • High altitude adaptation in Daghestani populations from the Caucasus.

    Pagani L, Ayub Q, MacArthur DG, Xue Y, Baillie JK, Chen Y, Kozarewa I, Turner DJ, Tofanelli S, Bulayeva K, Kidd K, Paoli G and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, UK. lp8@sanger.ac.uk

    We have surveyed 15 high-altitude adaptation candidate genes for signals of positive selection in North Caucasian highlanders using targeted re-sequencing. A total of 49 unrelated Daghestani from three ethnic groups (Avars, Kubachians, and Laks) living in ancient villages located at around 2,000 m above sea level were chosen as the study population. Caucasian (Adygei living at sea level, N = 20) and CEU (CEPH Utah residents with ancestry from northern and western Europe; N = 20) were used as controls. Candidate genes were compared with 20 putatively neutral control regions resequenced in the same individuals. The regions of interest were amplified by long-PCR, pooled according to individual, indexed by adding an eight-nucleotide tag, and sequenced using the Illumina GAII platform. 1,066 SNPs were called using false discovery and false negative thresholds of ~6%. The neutral regions provided an empirical null distribution to compare with the candidate genes for signals of selection. Two genes stood out. In Laks, a non-synonymous variant within HIF1A already known to be associated with improvement in oxygen metabolism was rediscovered, and in Kubachians a cluster of 13 SNPs located in a conserved intronic region within EGLN1 showing high population differentiation was found. These variants illustrate both the common pathways of adaptation to high altitude in different populations and features specific to the Daghestani populations, showing how even a mildly hypoxic environment can lead to genetic adaptation.

    Funded by: Wellcome Trust

    Human genetics 2012;131;3;423-33

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • A database and API for variation, dense genotyping and resequencing data.

    Rios D, McLaren WM, Chen Y, Birney E, Stabenau A, Flicek P and Cunningham F

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

    Background: Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources.

    Results: Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments.

    Conclusions: Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.

    Funded by: Medical Research Council; Wellcome Trust

    BMC bioinformatics 2010;11;238

  • Ensembl variation resources.

    Chen Y, Cunningham F, Rios D, McLaren WM, Smith J, Pritchard B, Spudich GM, Brent S, Kulesha E, Marin-Garcia P, Smedley D, Birney E and Flicek P

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Background: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics.

    Description: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl.

    Conclusions: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.

    Funded by: Medical Research Council; Wellcome Trust

    BMC genomics 2010;11;293

  • Locus Reference Genomic sequences: an improved basis for describing human DNA variants.

    Dalgleish R, Flicek P, Cunningham F, Astashyn A, Tully RE, Proctor G, Chen Y, McLaren WM, Larsson P, Vaughan BW, Béroud C, Dobson G, Lehväslaiho H, Taschner PE, den Dunnen JT, Devereau A, Birney E, Brookes AJ and Maglott DR

    Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK. raymond.dalgleish@le.ac.uk.

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org.

    Genome medicine 2010;2;4;24

  • A first-generation linkage disequilibrium map of human chromosome 22.

    Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, Dibling T, Tinsley E, Kirby S, Carter D, Papaspyridonos M, Livingstone S, Ganske R, Lõhmussaar E, Zernant J, Tõnisson N, Remm M, Mägi R, Puurand T, Vilo J, Kurg A, Rice K, Deloukas P, Mott R, Metspalu A, Bentley DR, Cardon LR and Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    DNA sequence variants in specific genes or regions of the human genome are responsible for a variety of phenotypes such as disease risk or variable drug response. These variants can be investigated directly, or through their non-random associations with neighbouring markers (called linkage disequilibrium (LD)). Here we report measurement of LD along the complete sequence of human chromosome 22. Duplicate genotyping and analysis of 1,504 markers in Centre d'Etude du Polymorphisme Humain (CEPH) reference families at a median spacing of 15 kilobases (kb) reveals a highly variable pattern of LD along the chromosome, in which extensive regions of nearly complete LD up to 804 kb in length are interspersed with regions of little or no detectable LD. The LD patterns are replicated in a panel of unrelated UK Caucasians. There is a strong correlation between high LD and low recombination frequency in the extant genetic map, suggesting that historical and contemporary recombination rates are similar. This study demonstrates the feasibility of developing genome-wide maps of LD.

    Nature 2002;418;6897;544-8

  • A SNP resource for human chromosome 22: extracting dense clusters of SNPs from the genomic sequence.

    Dawson E, Chen Y, Hunt S, Smink LJ, Hunt A, Rice K, Livingston S, Bumpstead S, Bruskiewich R, Sham P, Ganske R, Adams M, Kawasaki K, Shimizu N, Minoshima S, Roe B, Bentley D and Dunham I

    The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The recent publication of the complete sequence of human chromosome 22 provides a platform from which to investigate genomic sequence variation. We report the identification and characterization of 12,267 potential variants (SNPs and other small insertions/deletions) of human chromosome 22, discovered in the overlaps of 460 clones used for the chromosome sequencing. We found, on average, 1 potential variant every 1.07 kb and approximately 18% of the potential variants involve insertions/deletions. The SNPs have been positioned both relative to each other, and to genes, predicted genes, repeat sequences, other genetic markers, and the 2730 SNPs previously identified on the chromosome. A subset of the SNPs were verified experimentally using either PCR-RFLP or genomic Invader assays. These experiments confirmed 92% of the potential variants in a panel of 92 individuals. [Details of the SNPs and RFLP assays can be found at http://www.sanger.ac.uk and in dbSNP.]

    Genome research 2001;11;1;170-8

Vincenza Colonna

- Postdoctorall Fellow

I took my PhD University of Naples in Italy. During my PhD I was collecting and genotyping DNA samples from isolated populations in south Italy. I analysed these data to evaluate the extent of isolation of these populations and to proof their reduced genetic heterogeneity.

Subsequently I spent two years at the University of Ferrara as a postdoctoral fellow focusing on population genetics and as lecturer in genetics.

Currently, I am a Junior Researcher at the National Research Council in Italy.

Research

I am interested in understanding the processes leading to the current levels and distribution of genomic variation in humans. My current work is mainly focused on population genetic analyses of the “1000 Genomes Project” data. In addition to this I continue to work on population isolates.

References

  • Genetic affinity and admixture of northern Thai people along their migration route in northern Thailand: evidence from autosomal STR loci.

    Kutanan W, Kampuansai J, Colonna V, Nakbunlung S, Lertvicha P, Seielstad M, Bertorelle G and Kangwanpong D

    Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand.

    The Khon Mueang (KM) are the largest group of northern Thai people. Our previous mtDNA studies have suggested an admixture process among the KM with the earlier Mon-Khmer-speaking inhabitants of this region. In this study, we evaluate genetic affinities and admixture among 10 KM populations in northern Thailand lying along the historical Yuan migration route, and 10 neighboring populations belonging to 7 additional ethnic groups: Lawa, Mon (Mon-Khmer-speaking groups), Shan, Yuan, Lue, Khuen and Yong (Tai-speaking groups) by analyzing 15 hypervariable autosomal short tandem repeat loci. The KM exhibited close relationships with neighboring populations, especially the Tai-speaking groups, reflecting an admixed origin of the KM. Admixture proportions were observed in all KM populations, which had a higher contribution from the parental Tai than the Mon-Khmer groups. Different admixture patterns of the KM along the migration route might indicate high heterogeneity among the KM. These patterns were not directly associated with geographical proximity, suggesting other factors, like variation in the timing of admixture with the existing populations may have had an important role. More genetic data from different marker systems solely transmitted through the male or female lineages are needed to complete the description of genetic admixture and population history of the KM.

    Journal of human genetics 2011;56;2;130-7

  • A world in a grain of sand: human history from genetic data.

    Colonna V, Pagani L, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Genome-wide genotypes and sequences are enriching our understanding of the past 50,000 years of human history and providing insights into earlier periods largely inaccessible to mitochondrial DNA and Y-chromosomal studies.To see a world in a grain of sand ...William Blake, Auguries of Innocence.

    Funded by: Wellcome Trust

    Genome biology 2011;12;11;234

  • Human genome diversity: frequently asked questions.

    Barbujani G and Colonna V

    Department of Biology and Evolution, University of Ferrara, 44121 Ferrara, Italy. g.barbujani@unife.it

    Despite our relatively large population size, humans are genetically less variable than other primates. Many allele frequencies and statistical descriptors of genome diversity form broad gradients, tracing the main expansion from Africa, local migrations, and sometimes adaptation. However, this continuous variation is discordant across loci, and principally seems to reflect different blends of common and often cosmopolitan alleles rather than the presence of distinct gene pools in different regions of the world. The elusive structure of human populations could lead to spurious associations if the effects of shared ancestry are not properly dealt with; indeed, this is among the causes (although not the only one) of the difficulties encountered in discovering the loci responsible for quantitative traits and complex diseases. However, the rapidly growing body of data on our genomic diversity has already cast new light on human population history and is now revealing intricate biological relationships among individuals and populations of our species.

    Trends in genetics : TIG 2010;26;7;285-95

  • Long-range comparison between genes and languages based on syntactic distances.

    Colonna V, Boattini A, Guardiano C, Dall'ara I, Pettener D, Longobardi G and Barbujani G

    Dipartimento di Biologia ed Evoluzione, Università di Ferrara, Ferrara, Italy.

    Objective: To propose a new approach for comparing genetic and linguistic diversity in populations belonging to distantly related groups.

    Background: Comparisons of linguistic and genetic differences have proved powerful tools to reconstruct human demographic history. Current models assume on both sides that similarities reflect either descent from common ancestry or the balance between isolation and contact. Most linguistic phylogenies are ultimately based on lexical evidence (roughly, words and morphemes with their sounds and meanings). However, measures of lexical divergence are reliable only for closely related languages, thus large-scale comparisons of genetic and linguistic diversity have appeared problematic so far. Methods: Syntax (abstract rules to combine words into sentences) appears more measurable, universally comparable, and stable than the lexicon, and hence certain syntactic similarities might reflect deeper linguistic relationships, such as those between distant language families. In this study, we for the first time compared genetic data to a matrix of syntactic differences among selected populations of three continents.

    Results: Comparing two databases of microsatellite (Short Tandem Repeat) markers and Single Nucleotides Polymorphisms (SNPs), with a linguistic matrix based on the values of 62 grammatical parameters, we show that there is indeed a correlation of syntactic and genetic distances. We also identified a few outliers and suggest a possible interpretation of the overall pattern.

    Conclusions: These results strongly support the possibility of better investigating population history by combining genetic data with linguistic information of a new type, provided by a theoretically more sophisticated method to assess the relationships between distantly related languages and language families.

    Human heredity 2010;70;4;245-54

  • Comparing population structure as inferred from genealogical versus genetic information.

    Colonna V, Nutile T, Ferrucci RR, Fardella G, Aversano M, Barbujani G and Ciullo M

    Dipartimento di Biologia ed Evoluzione, Università di Ferrara, Ferrara, Italy.

    Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite F(st) between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and F(st) between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when F(st) values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data.

    European journal of human genetics : EJHG 2009;17;12;1635-41

  • Comparing models on the genealogical relationships among Neandertal, Cro-Magnoid and modern Europeans by serial coalescent simulations.

    Belle EM, Benazzo A, Ghirotto S, Colonna V and Barbujani G

    Dipartimento di Biologia ed Evoluzione, Università di Ferrara, Via Borsari 46, Ferrara, Italy.

    Populations of anatomically archaic (Neandertal) and early modern (Cro-Magnoid) humans are jointly documented in the European fossil record, in the period between 40 000 and 25 000 years BP, but the large differences between their cultures, morphologies and DNAs suggest that the two groups were not close relatives. However, it is still unclear whether any genealogical continuity between them can be ruled out. Here, we simulated a broad range of demographic scenarios by means of a serial coalescence algorithm in which Neandertals, Cro-Magnoids and modern Europeans were either part of the same mitochondrial genealogy or of two separate genealogies. Mutation rates, population sizes, population structure and demographic growth rates varied across simulations. All models in which anatomically modern (that is, Cro-Magnoid and current) Europeans belong to a distinct genealogy performed better than any model in which the three groups were assigned to the same mitochondrial genealogy. The maximum admissible level of gene flow between Neandertals and the ancestors of current Europeans is 0.001% per generation, one order of magnitude lower than estimated in previous studies not considering genetic data on Cro-Magnoid people.

    Heredity 2009;102;3;218-25

  • Identification and replication of a novel obesity locus on chromosome 1q24 in isolated populations of Cilento.

    Ciullo M, Nutile T, Dalmasso C, Sorice R, Bellenguez C, Colonna V, Persico MG and Bourgain C

    Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Via Pietro Castellino, 111, 80131 Naples, Italy. ciullo@igb.cnr.it

    Objective: Obesity is a complex trait with a variety of genetic susceptibility variants. Several loci linked to obesity and/or obesity-related traits have been identified, and relatively few regions have been replicated. Studying isolated populations can be a useful approach to identify rare variants that will not be detected with whole-genome association studies in large populations.

    Random individuals were sampled from Campora, an isolated village of the Cilento area in South Italy, phenotyped for BMI, and genotyped using a dense microsatellite marker map. An efficient pedigree-breaking strategy was applied to perform genome-wide linkage analyses of both BMI and obesity. Significance was assessed with ad hoc simulations for the two traits and with an original local false discovery rate approach to quantitative trait linkage analysis for BMI. A genealogy-corrected association test was performed for a single nucleotide polymorphism located in one of the linkage regions. A replication study was conducted in the neighboring village of Gioi.

    Results: A new locus on chr1q24 significantly linked to BMI was identified in Campora. Linkage at the same locus is suggested with obesity. Three additional loci linked to BMI were also detected, including the locus including the INSIG2 gene region. No evidence of association between the rs7566605 variant and BMI or obesity was found. In Gioi, the linkage on chr1q24 was replicated with both BMI and obesity.

    Conclusions: Overall, our results confirm that successful linkage studies can be accomplished in these populations both to replicate known linkages and to identify novel quantitative trait linkages.

    Diabetes 2008;57;3;783-90

  • Campora: a young genetic isolate in South Italy.

    Colonna V, Nutile T, Astore M, Guardiola O, Antoniol G, Ciullo M and Persico MG

    Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR Naples, Naples, Italy. colonna@igb.cnr.it

    Genetic isolates have been successfully used in the study of complex traits, mainly because due to their features, they allow a reduction in the complexity of the genetic models underlying the trait. The aim of the present study is to describe the population of Campora, a village in the South of Italy, highlighting its properties of a genetic isolate. Both historical evidence and multi-locus genetic data (genomic and mitochondrial DNA polymorphisms) have been taken into account in the analyses. The extension of linkage disequilibrium (LD) regions has been evaluated on autosomes and on a region of the X chromosome. We defined a study sample population on the basis of the genealogy and exogamy data. We found in this population a few different mitochondrial and Y chromosome haplotypes and we ascertained that, similarly to other isolated populations, in Campora LD extends over wider region compared to large and genetically heterogeneous populations. These findings indicate a conspicuous genetic homogeneity in the genome. Finally, we found evidence for a recent population bottleneck that we propose to interpret as a demographic crisis determined by the plague of the 17th century. Overall our findings demonstrate that Campora displays the genetic characteristics of a young isolate.

    Human heredity 2007;64;2;123-35

  • New susceptibility locus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolate.

    Ciullo M, Bellenguez C, Colonna V, Nutile T, Calabria A, Pacente R, Iovino G, Trimarco B, Bourgain C and Persico MG

    Institute of Genetics and Biophysics, A. Buzzati-Traverso, CNR Naples, Italy. ciullo@igb.cnr.it

    Essential hypertension (EH) affects a large proportion of the adult population in Western countries and is a major risk factor for cardiovascular diseases. EH is a multifactorial disease with a complex genetic component. To tackle the complexity of this genetic component, we have initiated a study of Campora, an isolated village in South Italy. A random sample of 389 adults was genotyped for a very dense microsatellite genome scan and phenotyped for EH. Of this sample, 173 affected individuals were all related through a 2,180-member pedigree and could be integrated within a linkage analysis. The complexity of the pedigree prevented its direct use for a non-parametric linkage (NPL) analysis. Therefore, the method proposed by Falchi et al. [2004, Am. J. Hum. Genet., 75, 1015-1031] was used for automatic pedigree-breaking. We identified a new locus for EH on chromosome 8q22-23 and detected linkage with two known loci for EH: 1q42-43 and 4p16. Simulations showed that the linkage with 8q22-23 is highly genome-wide significant, even when accounting for the breaking of the pedigree. An extension to qualitative traits of another pedigree-breaking approach [Pankratz et al., 2001, Genet. Epidemiol., 21 (Suppl. 1), S258-S263] also detected a significant linkage on 8q22-23 using a remarkably different set of sub-pedigrees and helped to refine the location of the linkage signal. This work both identifies a new locus strongly linked to hypertension and shows that the power of linkage analysis can be improved by the appropriate use of efficient pedigree-breaking strategies.

    Human molecular genetics 2006;15;10;1735-43

  • The C. elegans pvf-1 gene encodes a PDGF/VEGF-like factor able to bind mammalian VEGF receptors and to induce angiogenesis.

    Tarsitano M, De Falco S, Colonna V, McGhee JD and Persico MG

    Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy.

    Members of the platelet-derived growth factor/vascular endothelial growth factor (PDGF/VEGF) family have been implicated in a variety of functions in vertebrates, especially angiogenesis. Here we identify and characterize a PDGF/VEGF-like factor (named PVF-1) from the nematode C. elegans. We show that PVF-1 has biochemical properties similar to vertebrate PDGF/VEGF growth factors. More important, PVF-1 binds to the human receptors VEGFR-1 (Flt-1) and VEGFR-2 (KDR) and is able to induce angiogenesis in two model systems derived from vertebrates. Our results highlight the widespread evolutionary conservation of this important class of growth factors and raise the possibility that C. elegans can provide a simple experimental system in which to investigate how these factors function.

    FASEB journal : official publication of the Federation of American Societies for Experimental Biology 2006;20;2;227-33

Jose Espinosa

- Visiting Undergraduate Student

I did my undergraduate in genomic sciences at the National Autonomous University of Mexico (UNAM).

My first field of work took place at the UNAM Biotechnology Institute and it involved analysing MicroRNA expression levels in bean based on NGS RNA libraries. First in 2010 and later on in early 2012 I joined the Sanger Institute and conducted a project related to structural variation in Y chromosomes from different populations. During this period I also collaborated at the National Institute of Genomic Medicine in Mexico analysing copy number variation in glioblastoma cancer samples.

Research

Certainly the deep understanding of the relationship genotype-phenotype is one of the major scientific challenges of the years to come. If we ever come to comprehend this complex interplay in fine detail not only the medical and biological implications will be huge, we might also be able to tell at that same level of detail what it's in our genome that has contributed to make us human beings from a genomic point of view. With no doubt structural variation and its study plays an important role in that.

I'm studying structural variation in Y chromosomes fron the 1000 Genomes Project.

Min Hu

- PhD student

Before I came to the UK in 2008, I obtained my undergraduate degree in life sciences at Peking University in China.

Research

My research focuses on looking for regions in the human genome that have been positively selected during modern human evolution. I am using statistical approaches on sequencing data from multiple populations, aiming to understand: 1) what types of selective sweeps can we detect using current models and statistical tests; 2) which genes and other functional elements in the human genome have been favored by positive natural selection after modern human emerged about 200,000 years ago.

References

  • Exploration of signals of positive selection derived from genotype-based human genome scans using re-sequencing data.

    Hu M, Ayub Q, Guerra-Assunção JA, Long Q, Ning Z, Huang N, Romero IG, Mamanova L, Akan P, Liu X, Coffey AJ, Turner DJ, Swerdlow H, Burton J, Quail MA, Conrad DF, Enright AJ, Tyler-Smith C and Xue Y

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.

    We have investigated whether regions of the genome showing signs of positive selection in scans based on haplotype structure also show evidence of positive selection when sequence-based tests are applied, whether the target of selection can be localized more precisely, and whether such extra evidence can lead to increased biological insights. We used two tools: simulations under neutrality or selection, and experimental investigation of two regions identified by the HapMap2 project as putatively selected in human populations. Simulations suggested that neutral and selected regions should be readily distinguished and that it should be possible to localize the selected variant to within 40 kb at least half of the time. Re-sequencing of two ~300 kb regions (chr4:158Mb and chr10:22Mb) lacking known targets of selection in HapMap CHB individuals provided strong evidence for positive selection within each and suggested the micro-RNA gene hsa-miR-548c as the best candidate target in one region, and changes in regulation of the sperm protein gene SPAG6 in the other.

    Funded by: Wellcome Trust: 077009

    Human genetics 2012;131;5;665-74

  • A systematic survey of loss-of-function variants in human protein-coding genes.

    MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, 1000 Genomes Project Consortium, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB and Tyler-Smith C

    Wellcome Trust Sanger Institute, Hinxton, UK. macarthur@atgu.mgh.harvard.edu

    Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

    Funded by: Wellcome Trust: 090532, 090532/Z/09/Z, 098051

    Science (New York, N.Y.) 2012;335;6070;823-8

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • Origins and functional impact of copy number variation in the human genome.

    Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK.

    Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

    Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: HG004221; NIGMS NIH HHS: GM081533; Wellcome Trust: 077006/Z/05/Z, 077008, 077009, 077014

    Nature 2010;464;7289;704-12

Daniel MacArthur

- Visiting Scientist

I completed my PhD at the Institute for Neuromuscular Research in Sydney, Australia. My PhD focused on the genetics of human athletic performance, and specifically on the effect of variation in the ACTN3 gene on muscle function. During my PhD I generated and analysed a knockout mouse model of ACTN3 and analysed its recent evolutionary history in humans.

I moved to the Sanger Institute in September 2008. For the last two years I have been funded by an Australian National Health and Medical Research Council Overseas Biomedical Fellowship.

Research

My current research is focused on predicting the functional effects of genetic variants. I coordinated the functional annotation of genetic variants in the 1000 Genomes pilot projects, and have also led an international collaboration investigating the impact of "loss-of-function" variants - sequence changes that are predicted to severely damage the function of protein-coding genes. We have identified over 800 completely novel variants of this kind as part of the 1000 Genomes Project, and explored their effects on gene expression, complex disease risk, and recent human evolution.

References

  • α-Actinin-3 deficiency is associated with reduced bone mass in human and mouse.

    Yang N, Schindeler A, McDonald MM, Seto JT, Houweling PJ, Lek M, Hogarth M, Morse AR, Raftery JM, Balasuriya D, MacArthur DG, Berman Y, Quinlan KG, Eisman JA, Nguyen TV, Center JR, Prince RL, Wilson SG, Zhu K, Little DG and North KN

    Institute for Neuroscience and Muscle Research, The Children's Hospital at Westmead, Sydney 2145, NSW, Australia. nan.yang@persongen.com

    Bone mineral density (BMD) is a complex trait that is the single best predictor of the risk of osteoporotic fractures. Candidate gene and genome-wide association studies have identified genetic variations in approximately 30 genetic loci associated with BMD variation in humans. α-Actinin-3 (ACTN3) is highly expressed in fast skeletal muscle fibres. There is a common null-polymorphism R577X in human ACTN3 that results in complete deficiency of the α-actinin-3 protein in approximately 20% of Eurasians. Absence of α-actinin-3 does not cause any disease phenotypes in muscle because of compensation by α-actinin-2. However, α-actinin-3 deficiency has been shown to be detrimental to athletic sprint/power performance. In this report we reveal additional functions for α-actinin-3 in bone. α-Actinin-3 but not α-actinin-2 is expressed in osteoblasts. The Actn3(-/-) mouse displays significantly reduced bone mass, with reduced cortical bone volume (-14%) and trabecular number (-61%) seen by microCT. Dynamic histomorphometry indicated this was due to a reduction in bone formation. In a cohort of postmenopausal Australian women, ACTN3 577XX genotype was associated with lower BMD in an additive genetic model, with the R577X genotype contributing 1.1% of the variance in BMD. Microarray analysis of cultured osteoprogenitors from Actn3(-/-) mice showed alterations in expression of several genes regulating bone mass and osteoblast/osteoclast activity, including Enpp1, Opg and Wnt7b. Our studies suggest that ACTN3 likely contributes to the regulation of bone mass through alterations in bone turnover. Given the high frequency of R577X in the general population, the potential role of ACTN3 R577X as a factor influencing variations in BMD in elderly humans warrants further study.

    Bone 2011;49;4;790-8

  • Deficiency of α-actinin-3 is associated with increased susceptibility to contraction-induced damage and skeletal muscle remodeling.

    Seto JT, Lek M, Quinlan KG, Houweling PJ, Zheng XF, Garton F, MacArthur DG, Raftery JM, Garvey SM, Hauser MA, Yang N, Head SI and North KN

    Institute for Neuroscience and Muscle Research, The Children's Hospital at Westmead, Locked Bag 4001, Sydney, NSW 2145, Australia.

    Sarcomeric α-actinins (α-actinin-2 and -3) are a major component of the Z-disk in skeletal muscle, where they crosslink actin and other structural proteins to maintain an ordered myofibrillar array. Homozygosity for the common null polymorphism (R577X) in ACTN3 results in the absence of fast fiber-specific α-actinin-3 in ∼20% of the general population. α-Actinin-3 deficiency is associated with decreased force generation and is detrimental to sprint and power performance in elite athletes, suggesting that α-actinin-3 is necessary for optimal forceful repetitive muscle contractions. Since Z-disks are the structures most vulnerable to eccentric damage, we sought to examine the effects of α-actinin-3 deficiency on sarcomeric integrity. Actn3 knockout mouse muscle showed significantly increased force deficits following eccentric contraction at 30% stretch, suggesting that α-actinin-3 deficiency results in an increased susceptibility to muscle damage at the extremes of muscle performance. Microarray analyses demonstrated an increase in muscle remodeling genes, which we confirmed at the protein level. The loss of α-actinin-3 and up-regulation of α-actinin-2 resulted in no significant changes to the total pool of sarcomeric α-actinins, suggesting that alterations in fast fiber Z-disk properties may be related to differences in functional protein interactions between α-actinin-2 and α-actinin-3. In support of this, we demonstrated that the Z-disk proteins, ZASP, titin and vinculin preferentially bind to α-actinin-2. Thus, the loss of α-actinin-3 changes the overall protein composition of fast fiber Z-disks and alters their elastic properties, providing a mechanistic explanation for the loss of force generation and increased susceptibility to eccentric damage in α-actinin-3-deficient individuals.

    Human molecular genetics 2011;20;15;2914-27

  • Dindel: accurate indel calls from short-read data.

    Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, United Kingdom. caa@sanger.ac.uk

    Small insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs. However, inference of indels from next-generation sequence data is challenging, and so far methods for identifying indels lag behind methods for calling SNVs in terms of sensitivity and specificity. We propose a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included. In our probabilistic realignment model we account for base-calling errors, mapping errors, and also, importantly, for increased sequencing error indel rates in long homopolymer runs. We show that our method is sensitive and achieves low false discovery rates on simulated and real data sets, although challenges remain. The algorithm is implemented in the program Dindel, which has been used in the 1000 Genomes Project call sets.

    Funded by: British Heart Foundation: RG/09/012/28096; Wellcome Trust: 086084, WT089088/Z/09/Z

    Genome research 2011;21;6;961-73

  • Gene inactivation and its implications for annotation in the era of personal genomics.

    Balasubramanian S, Habegger L, Frankish A, MacArthur DG, Harte R, Tyler-Smith C, Harrow J and Gerstein M

    Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

    The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.

    Funded by: Wellcome Trust

    Genes & development 2011;25;1;1-10

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • Loss-of-function variants in the genomes of healthy humans.

    MacArthur DG and Tyler-Smith C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. dm8@sanger.ac.uk

    Genetic variants predicted to seriously disrupt the function of human protein-coding genes-so-called loss-of-function (LOF) variants-have traditionally been viewed in the context of severe Mendelian disease. However, recent large-scale sequencing and genotyping projects have revealed a surprisingly large number of these variants in the genomes of apparently healthy individuals--at least 100 per genome, including more than 30 in a homozygous state--suggesting a previously unappreciated level of variation in functional gene content between humans. These variants are mostly found at low frequency, suggesting that they are enriched for mildly deleterious polymorphisms suppressed by negative natural selection, and thus represent an attractive set of candidate variants for complex disease susceptibility. However, they are also enriched for sequencing and annotation artefacts, so overall present serious challenges for clinical sequencing projects seeking to identify severe disease genes amidst the 'noise' of technical error and benign genetic polymorphism. Systematic, high-quality catalogues of LOF variants present in the genomes of healthy individuals, built from the output of large-scale sequencing studies such as the 1000 Genomes Project, will help to distinguish between benign and disease-causing LOF variants, and will provide valuable resources for clinical genomics.

    Funded by: Wellcome Trust

    Human molecular genetics 2010;19;R2;R125-30

  • Origins and functional impact of copy number variation in the human genome.

    Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK.

    Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

    Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: HG004221; NIGMS NIH HHS: GM081533; Wellcome Trust: 077006/Z/05/Z, 077008, 077009, 077014

    Nature 2010;464;7289;704-12

  • An Actn3 knockout mouse provides mechanistic insights into the association between alpha-actinin-3 deficiency and human athletic performance.

    MacArthur DG, Seto JT, Chan S, Quinlan KG, Raftery JM, Turner N, Nicholson MD, Kee AJ, Hardeman EC, Gunning PW, Cooney GJ, Head SI, Yang N and North KN

    Institute for Neuromuscular Research, The Children's Hospital at Westmead, Sydney 2145, NSW, Australia.

    A common nonsense polymorphism (R577X) in the ACTN3 gene results in complete deficiency of the fast skeletal muscle fiber protein alpha-actinin-3 in an estimated one billion humans worldwide. The XX null genotype is under-represented in elite sprint athletes, associated with reduced muscle strength and sprint performance in non-athletes, and is over-represented in endurance athletes, suggesting that alpha-actinin-3 deficiency increases muscle endurance at the cost of power generation. Here we report that muscle from Actn3 knockout mice displays reduced force generation, consistent with results from human association studies. Detailed analysis of knockout mouse muscle reveals reduced fast fiber diameter, increased activity of multiple enzymes in the aerobic metabolic pathway, altered contractile properties, and enhanced recovery from fatigue, suggesting a shift in the properties of fast fibers towards those characteristic of slow fibers. These findings provide the first mechanistic explanation for the reported associations between R577X and human athletic performance and muscle function.

    Human molecular genetics 2008;17;8;1076-86

  • Loss of ACTN3 gene function alters mouse muscle metabolism and shows evidence of positive selection in humans.

    MacArthur DG, Seto JT, Raftery JM, Quinlan KG, Huttley GA, Hook JW, Lemckert FA, Kee AJ, Edwards MR, Berman Y, Hardeman EC, Gunning PW, Easteal S, Yang N and North KN

    Institute for Neuromuscular Research, Children's Hospital at Westmead, Sydney, New South Wales 2145, Australia.

    More than a billion humans worldwide are predicted to be completely deficient in the fast skeletal muscle fiber protein alpha-actinin-3 owing to homozygosity for a premature stop codon polymorphism, R577X, in the ACTN3 gene. The R577X polymorphism is associated with elite athlete status and human muscle performance, suggesting that alpha-actinin-3 deficiency influences the function of fast muscle fibers. Here we show that loss of alpha-actinin-3 expression in a knockout mouse model results in a shift in muscle metabolism toward the more efficient aerobic pathway and an increase in intrinsic endurance performance. In addition, we demonstrate that the genomic region surrounding the 577X null allele shows low levels of genetic variation and recombination in individuals of European and East Asian descent, consistent with strong, recent positive selection. We propose that the 577X allele has been positively selected in some human populations owing to its effect on skeletal muscle metabolism.

    Nature genetics 2007;39;10;1261-5

  • ACTN3 genotype is associated with human elite athletic performance.

    Yang N, MacArthur DG, Gulbin JP, Hahn AG, Beggs AH, Easteal S and North K

    Institute for Neuromuscular Research, Children's Hospital at Westmead, Sydney, Australia.

    There is increasing evidence for strong genetic influences on athletic performance and for an evolutionary "trade-off" between performance traits for speed and endurance activities. We have recently demonstrated that the skeletal-muscle actin-binding protein alpha-actinin-3 is absent in 18% of healthy white individuals because of homozygosity for a common stop-codon polymorphism in the ACTN3 gene, R577X. alpha-Actinin-3 is specifically expressed in fast-twitch myofibers responsible for generating force at high velocity. The absence of a disease phenotype secondary to alpha-actinin-3 deficiency is likely due to compensation by the homologous protein, alpha-actinin-2. However, the high degree of evolutionary conservation of ACTN3 suggests function(s) independent of ACTN2. Here, we demonstrate highly significant associations between ACTN3 genotype and athletic performance. Both male and female elite sprint athletes have significantly higher frequencies of the 577R allele than do controls. This suggests that the presence of alpha-actinin-3 has a beneficial effect on the function of skeletal muscle in generating forceful contractions at high velocity, and provides an evolutionary advantage because of increased sprint performance. There is also a genotype effect in female sprint and endurance athletes, with higher than expected numbers of 577RX heterozygotes among sprint athletes and lower than expected numbers among endurance athletes. The lack of a similar effect in males suggests that the ACTN3 genotype affects athletic performance differently in males and females. The differential effects in sprint and endurance athletes suggests that the R577X polymorphism may have been maintained in the human population by balancing natural selection.

    American journal of human genetics 2003;73;3;627-31

Luca Pagani

- Visiting PhD Student

I received both my B.A. and MSci in Molecular Biology at the Scuola Normale Superiore of Pisa, Italy in 2007 and 2009 respectively. My experience at Sanger started in 2009 thanks to an international exchange program (Erasmus) while my current involvement with the institute is part of an ongoing PhD project at the Biological Anthropology department of the University of Cambridge.

Research

I have always been fascinated by the migration events that brought a single African species to colonize the whole planet. Although with some delay, I finally understood that Biology was somewhat useful to try and retrieve the migration routes followed by our ancestors on their way out of Africa. The PhD project I am currently involved in is about the human populations currently inhabiting Eastern Africa. The aim of my project is indeed to better understand the demographic dynamics occurred in the area during the last 200.000 years to clarify the processes that led our expansion out of Africa.

References

  • The dual origin of Tati-speakers from Dagestan as written in the genealogy of uniparental variants.

    Bertoncini S, Bulayeva K, Ferri G, Pagani L, Caciagli L, Taglioli L, Semyonov I, Bulayev O, Paoli G and Tofanelli S

    Department of Biology, University of Pisa, Pisa, Italy. stef.bertoncini@gmail.com

    Objectives: Tat language is classified in an Iranian subbranch of the Indo-European family. It is spoken in the Caucasus and in the West Caspian region by populations with heterogeneous cultural traditions and religion whose ancestry is unknown. The aim of this study is to get a first insight about the genetic history of this peculiar linguistic group.

    Methods: We investigated the uniparental gene pools, defined by NRY and mtDNA high-resolution markers, in two Tati-speaking communities from Dagestan: Mountain Jews or Juhur, who speak the Judeo-Tat dialect, and the Tats, who speak the Muslim-Tat dialect. The samples have been collected in monoethnic rural villages and selected on the basis of genealogical relationships. A novel approach aimed at resolving cryptic cases in the recent history of human populations, which combines the properties of uniparental genetic markers with the potential of "forward-in-time" computer simulations, is presented.

    Results: Judeo-Tats emerged as a group with tight matrilineal genetic legacy who separated early from other Jewish communities. Tats exhibited genetic signals of a much longer in situ evolution, which appear as substantially unlinked with other Indo-Iranian enclaves in the Caucasus.

    Conclusions: The independent demographic histories of the two samples, with mutually reversed profiles at paternally and maternally transmitted genetic systems, suggest that geographic proximity and linguistic assimilation of Tati-speakers from Dagestan do not reflect a common ancestry.

    American journal of human biology : the official journal of the Human Biology Council 2012;24;4;391-9

  • High altitude adaptation in Daghestani populations from the Caucasus.

    Pagani L, Ayub Q, MacArthur DG, Xue Y, Baillie JK, Chen Y, Kozarewa I, Turner DJ, Tofanelli S, Bulayeva K, Kidd K, Paoli G and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, UK. lp8@sanger.ac.uk

    We have surveyed 15 high-altitude adaptation candidate genes for signals of positive selection in North Caucasian highlanders using targeted re-sequencing. A total of 49 unrelated Daghestani from three ethnic groups (Avars, Kubachians, and Laks) living in ancient villages located at around 2,000 m above sea level were chosen as the study population. Caucasian (Adygei living at sea level, N = 20) and CEU (CEPH Utah residents with ancestry from northern and western Europe; N = 20) were used as controls. Candidate genes were compared with 20 putatively neutral control regions resequenced in the same individuals. The regions of interest were amplified by long-PCR, pooled according to individual, indexed by adding an eight-nucleotide tag, and sequenced using the Illumina GAII platform. 1,066 SNPs were called using false discovery and false negative thresholds of ~6%. The neutral regions provided an empirical null distribution to compare with the candidate genes for signals of selection. Two genes stood out. In Laks, a non-synonymous variant within HIF1A already known to be associated with improvement in oxygen metabolism was rediscovered, and in Kubachians a cluster of 13 SNPs located in a conserved intronic region within EGLN1 showing high population differentiation was found. These variants illustrate both the common pathways of adaptation to high altitude in different populations and features specific to the Daghestani populations, showing how even a mildly hypoxic environment can lead to genetic adaptation.

    Funded by: Wellcome Trust

    Human genetics 2012;131;3;423-33

  • A world in a grain of sand: human history from genetic data.

    Colonna V, Pagani L, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Genome-wide genotypes and sequences are enriching our understanding of the past 50,000 years of human history and providing insights into earlier periods largely inaccessible to mitochondrial DNA and Y-chromosomal studies.To see a world in a grain of sand ...William Blake, Auguries of Innocence.

    Funded by: Wellcome Trust

    Genome biology 2011;12;11;234

  • The key role of patrilineal inheritance in shaping the genetic variation of Dagestan highlanders.

    Caciagli L, Bulayeva K, Bulayev O, Bertoncini S, Taglioli L, Pagani L, Paoli G and Tofanelli S

    Dipartimento di Biologia, Università di Pisa, Pisa, Italy.

    The Caucasus region is a complex cultural and ethnic mosaic, comprising populations that speak Caucasian, Indo-European and Altaic languages. Isolated mountain villages (auls) in Dagestan still preserve high level of genetic and cultural diversity and have patriarchal societies with a long history of isolation. The aim of this study was to understand the genetic history of five Dagestan highland auls with distinct ethnic affiliation (Avars, Chechens-Akkins, Kubachians, Laks, Tabasarans) using markers on the male-specific region of the Y chromosome. The groups analyzed here are all Muslims but speak different languages all belonging to the Nakh-Dagestanian linguistic family. The results show that the Dagestan ethnic groups share a common Y-genetic background, with deep-rooted genealogies and rare alleles, dating back to an early phase in the post-glacial recolonization of Europe. Geography and stochastic factors, such as founder effect and long-term genetic drift, driven by the rigid structuring of societies in groups of patrilineal descent, most likely acted as mutually reinforcing key factors in determining the high degree of Y-genetic divergence among these ethnic groups.

    Journal of human genetics 2009;54;12;689-94

Wei Wei

- Visiting PhD student

I am a third-year PhD student in institute of Forensic Medicine, Sichuan University, Chengdu, China and joined the Human Evolution team in September, 2011 as a visiting student.

Research

My PhD project started with identifying the informative Y-chromosomal makers for the populations in China and applying them in the forensic science using traditional PCR based methods. Now I am extending my research interest by using the publicly available whole Y chromosomal resequencing data, such as the ones from the Complete Genomics to refine the Y chromosome phylogenetic tree by identifying more new Y markers and understand the human male history by carrying on the population genetic analysis.

References

  • Exploring of new Y-chromosome SNP loci using Pyrosequencing and the SNaPshot methods.

    Wei W, Luo HB, Yan J and Hou YP

    Department of Forensic Genetics, West China School of Basic Science and Forensic Medicine, Sichuan University (West China University of Medical Sciences), Chengdu, 610041, Sichuan, China. weiwei090818@163.com

    The single nucleotide polymorphisms on the Y chromosome (Y-SNP) have been considered to be important in forensic casework. However, Y-SNP loci were mostly population specific and lacked biallelic polymorphisms in the Asian population. In this study, we developed a strategy for seeking and genotyping new Y-SNP markers based on both Pyrosequencing and the SNaPshot methods. As results, 34 new biallelic markers were observed to be polymorphic in the Chinese Han population by estimation of allele frequencies of 103 candidate's Y-SNP loci in DNA pools using Pyrosequencing technology. Then, a multiplex system with 20 Y-SNP loci was genotyped using the SNaPshot™ multiplex kit. Twenty Y-SNP loci defined 56 different haplotypes, and the haplotype diversity was estimated to be 0.9539. Our result demonstrated that the strategy could be used as an efficient tool to search and genotype biallelic markers from a large amount of candidate loci. In addition, 20 Y-SNP loci constructed a multiplex system, which could provide supplementary information for forensic identification.

    International journal of legal medicine 2012;126;6;825-33

Yali Xue

- Staff Scientist

I studied public health as an undergraduate, epidemiology for my master’s degree, and medical and population genetics for my Ph.D in Harbin Medical University, China. I collected samples from different ethnic groups in China and established cell lines from them, some of which are now included in the HGDP panel. In all, I studied human genetic diversity in China for 8 years, also making visits to Oxford University, UK and Cleveland University, US during this period. I received the national scientific research award in 2005.

Research

Joined Sanger in 2004, working initially on Y-chromosomal diversity, including involvement in the Genographic project. Subsequently, focused more on identifying signatures of positive selection in the human genome. Since 2008, I have concentrated on applying new sequencing technology to address human evolution and population genetics questions, e.g. directly measuring Y mutation rate. Involved in the 1000 Genomes Project, including Y-chromosomal diversity, a genome-wide scan for positive selection, identifying disease variants in the general population, and functional prediction of the consequences of variants. Also coordinate two major team projects for the new quinquennium: Native American and Himalayan population genetics studies.

References

  • Response to the comment on "The hare and the tortoise: One small step for four SNPs, one giant leap for SNP-kind".

    Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs. CB10 1SA, UK.

    The possibility of introducing new sequencing technologies into forensic genetics raises questions that go beyond the choice between SNPs and STRs as the preferred genetic markers. We suggest that many of the novel methodological and technical issues could be incorporated into the likelihood ratio frameworks currently used by forensic scientists. However, changes to ethical and legal structures may be needed before the new information could be used.

    Forensic science international. Genetics 2011;5;4;361-2

  • A worldwide analysis of beta-defensin copy number variation suggests recent selection of a high-expressing DEFB103 gene copy in East Asia.

    Hardwick RJ, Machado LR, Zuccherato LW, Antolinos S, Xue Y, Shawa N, Gilman RH, Cabrera L, Berg DE, Tyler-Smith C, Kelly P, Tarazona-Santos E and Hollox EJ

    Department of Genetics, University of Leicester, University Road, Leicester, United Kingdom.

    Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region.

    Funded by: Medical Research Council: GO801123; Wellcome Trust: 067948, 077009, 087663

    Human mutation 2011;32;7;743-50

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations.

    Shi W, Ayub Q, Vermeulen M, Shao RG, Zuniga S, van der Gaag K, de Knijff P, Kayser M, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs., United Kingdom.

    We have investigated human male demographic history using 590 males from 51 populations in the Human Genome Diversity Project - Centre d'Etude du Polymorphisme Humain worldwide panel, typed with 37 Y-chromosomal Single Nucleotide Polymorphisms and 65 Y-chromosomal Short Tandem Repeats and analyzed with the program Bayesian Analysis of Trees With Internal Node Generation. The general patterns we observe show a gradient from the oldest population time to the most recent common ancestors (TMRCAs) and expansion times together with the largest effective population sizes in Africa, to the youngest times and smallest effective population sizes in the Americas. These parameters are significantly negatively correlated with distance from East Africa, and the patterns are consistent with most other studies of human variation and history. In contrast, growth rate showed a weaker correlation in the opposite direction. Y-lineage diversity and TMRCA also decrease with distance from East Africa, supporting a model of expansion with serial founder events starting from this source. A number of individual populations diverge from these general patterns, including previously documented examples such as recent expansions of the Yoruba in Africa, Basques in Europe, and Yakut in Northern Asia. However, some unexpected demographic histories were also found, including low growth rates in the Hazara and Kalash from Pakistan and recent expansion of the Mozabites in North Africa.

    Molecular biology and evolution 2010;27;2;385-93

  • Population differentiation as an indicator of recent positive selection in humans: an empirical evaluation.

    Xue Y, Zhang X, Huang N, Daly A, Gillson CJ, Macarthur DG, Yngvadottir B, Nica AC, Woodwark C, Chen Y, Conrad DF, Ayub Q, Mehdi SQ, Li P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    We have evaluated the extent to which SNPs identified by genomewide surveys as showing unusually high levels of population differentiation in humans have experienced recent positive selection, starting from a set of 32 nonsynonymous SNPs in 27 genes highlighted by the HapMap1 project. These SNPs were genotyped again in the HapMap samples and in the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) panel of 52 populations representing worldwide diversity; extended haplotype homozygosity was investigated around all of them, and full resequence data were examined for 9 genes (5 from public sources and 4 from new data sets). For 7 of the genes, genotyping errors were responsible for an artifactual signal of high population differentiation and for 2, the population differentiation did not exceed our significance threshold. For the 18 genes with confirmed high population differentiation, 3 showed evidence of positive selection as measured by unusually extended haplotypes within a population, and 7 more did in between-population analyses. The 9 genes with resequence data included 7 with high population differentiation, and 5 showed evidence of positive selection on the haplotype carrying the nonsynonymous SNP from skewed allele frequency spectra; in addition, 2 showed evidence of positive selection on unrelated haplotypes. Thus, in humans, high population differentiation is (apart from technical artifacts) an effective way of enriching for recently selected genes, but is not an infallible pointer to recent positive selection supported by other lines of evidence.

    Funded by: Wellcome Trust

    Genetics 2009;183;3;1065-77

  • Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree.

    Xue Y, Wang Q, Long Q, Ng BL, Swerdlow H, Burton J, Skuce C, Taylor R, Abdellah Z, Zhao Y, Asan, MacArthur DG, Quail MA, Carter NP, Yang H and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs CB10 1SA, UK. ylx@sanger.ac.uk

    Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. In the past, estimates of mutation rates have generally been inferred from phenotypic observations or comparisons of homologous sequences among closely related species. Here, we apply new sequencing technology to measure directly one mutation rate, that of base substitutions on the human Y chromosome. The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 11x or 20x, respectively. Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in approximately 10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 x 10(-8) mutations/nucleotide/generation (95% CI: 8.9 x 10(-9)-7.0 x 10(-8)), consistent with estimates of 2.3 x 10(-8)-6.3 x 10(-8) mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons depending on the generation and split times assumed.

    Funded by: Wellcome Trust

    Current biology : CB 2009;19;17;1453-7

  • A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation.

    Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, O'Meara S, Latimer C, Dicks E, Menzies A, Stephens P, Blow M, Greenman C, Xue Y, Tyler-Smith C, Thompson D, Gray K, Andrews J, Barthorpe S, Buck G, Cole J, Dunmore R, Jones D, Maddison M, Mironenko T, Turner R, Turrell K, Varian J, West S, Widaa S, Wray P, Teague J, Butler A, Jenkinson A, Jia M, Richardson D, Shepherd R, Wooster R, Tejada MI, Martinez F, Carvill G, Goliath R, de Brouwer AP, van Bokhoven H, Van Esch H, Chelly J, Raynaud M, Ropers HH, Abidi FE, Srivastava AK, Cox J, Luo Y, Mallya U, Moon J, Parnau J, Mohammed S, Tolmie JL, Shoubridge C, Corbett M, Gardner A, Haan E, Rujirabanjerd S, Shaw M, Vandeleur L, Fullston T, Easton DF, Boyle J, Partington M, Hackett A, Field M, Skinner C, Stevenson RE, Bobrow M, Turner G, Schwartz CE, Gecz J, Raymond FL, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Large-scale systematic resequencing has been proposed as the key future strategy for the discovery of rare, disease-causing sequence variants across the spectrum of human complex disease. We have sequenced the coding exons of the X chromosome in 208 families with X-linked mental retardation (XLMR), the largest direct screen for constitutional disease-causing mutations thus far reported. The screen has discovered nine genes implicated in XLMR, including SYP, ZNF711 and CASK reported here, confirming the power of this strategy. The study has, however, also highlighted issues confronting whole-genome sequencing screens, including the observation that loss of function of 1% or more of X-chromosome genes is compatible with apparently normal existence.

    Funded by: NICHD NIH HHS: HD26202; Wellcome Trust: 077012

    Nature genetics 2009;41;5;535-43

  • A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia.

    Dhandapany PS, Sadayappan S, Xue Y, Powell GT, Rani DS, Nallari P, Rai TS, Khullar M, Soares P, Bahl A, Tharkan JM, Vaideeswar P, Rathinavel A, Narasimhan C, Ayapati DR, Ayub Q, Mehdi SQ, Oppenheimer S, Richards MB, Price AL, Patterson N, Reich D, Singh L, Tyler-Smith C and Thangaraj K

    Department of Biochemistry, Madurai Kamaraj University, Madurai 625 021, India.

    Heart failure is a leading cause of mortality in South Asians. However, its genetic etiology remains largely unknown. Cardiomyopathies due to sarcomeric mutations are a major monogenic cause for heart failure (MIM600958). Here, we describe a deletion of 25 bp in the gene encoding cardiac myosin binding protein C (MYBPC3) that is associated with heritable cardiomyopathies and an increased risk of heart failure in Indian populations (initial study OR = 5.3 (95% CI = 2.3-13), P = 2 x 10(-6); replication study OR = 8.59 (3.19-25.05), P = 3 x 10(-8); combined OR = 6.99 (3.68-13.57), P = 4 x 10(-11)) and that disrupts cardiomyocyte structure in vitro. Its prevalence was found to be high (approximately 4%) in populations of Indian subcontinental ancestry. The finding of a common risk factor implicated in South Asian subjects with cardiomyopathy will help in identifying and counseling individuals predisposed to cardiac diseases in this region.

    Funded by: NHGRI NIH HHS: R01 HG006399-02; Wellcome Trust: 077009

    Nature genetics 2009;41;2;187-91

  • A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs.

    Yngvadottir B, Xue Y, Searle S, Hunt S, Delgado M, Morrison J, Whittaker P, Deloukas P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK.

    Nonsense SNPs introduce premature termination codons into genes and can result in the absence of a gene product or in a truncated and potentially harmful protein, so they are often considered disadvantageous and are associated with disease susceptibility. As such, we might expect the disrupted allele to be rare and, in healthy people, observed only in a heterozygous state. However, some, like those in the CASP12 and ACTN3 genes, are known to be present at high frequencies and to occur often in a homozygous state and seem to have been advantageous in recent human evolution. To evaluate the selective forces acting on nonsense SNPs as a class, we have carried out a large-scale experimental survey of nonsense SNPs in the human genome by genotyping 805 of them (plus control synonymous SNPs) in 1,151 individuals from 56 worldwide populations. We identified 169 genes containing nonsense SNPs that were variable in our samples, of which 99 were found with both copies inactivated in at least one individual. We found that the sampled humans differ on average by 24 genes (out of about 20,000) because of these nonsense SNPs alone. As might be expected, nonsense SNPs as a class were found to be slightly disadvantageous over evolutionary timescales, but a few nevertheless showed signs of being possibly advantageous, as indicated by unusually high levels of population differentiation, long haplotypes, and/or high frequencies of derived alleles. This study underlines the extent of variation in gene content within humans and emphasizes the importance of understanding this type of variation.

    Funded by: Wellcome Trust: 062023

    American journal of human genetics 2009;84;2;224-34

  • Adaptive evolution of UGT2B17 copy-number variation.

    Xue Y, Sun D, Daly A, Yang F, Zhou X, Zhao M, Huang N, Zerjal T, Lee C, Carter NP, Hurles ME and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    The human UGT2B17 gene varies in copy number from zero to two per individual and also differs in mean number between populations from Africa, Europe, and East Asia. We show that such a high degree of geographical variation is unusual and investigate its evolutionary history. This required first reinterpreting the reference sequence in this region of the genome, which is misassembled from the two different alleles separated by an artifactual gap. A corrected assembly identifies the polymorphism as a 117 kb deletion arising by nonallelic homologous recombination between approximately 4.9 kb segmental duplications and allows the deletion breakpoint to be identified. We resequenced approximately 12 kb of DNA spanning the breakpoint in 91 humans from three HapMap and one extended HapMap populations and one chimpanzee. Diversity was unusually high and the time to the most recent common ancestor was estimated at approximately 2.4 or approximately 3.0 million years by two different methods, with evidence of balancing selection in Europe. In contrast, diversity was low in East Asia where a single haplotype predominated, suggesting positive selection for the deletion in this part of the world.

    Funded by: Wellcome Trust

    American journal of human genetics 2008;83;3;337-46

Bryndis Yngvadottir

by1@sanger.ac.uk unknown

I received my B.A in Social Anthropology from the University of Iceland in 2001 and my M.A. in Biological Anthropology from the same university in 2004. I gained my Ph.D. from the University of Cambridge in 2008, after undertaking a four-year Ph.D. programme at the Wellcome Trust Sanger Institute. My doctoral project was in the field of Evolutionary Genetics under the supervision of Dr. Chris Tyler-Smith. Subsequently, I joined the Human Evolution team as a postdoctoral fellow.

Research

My primary research interests are in the field of human evolution. Specifically, they include the subjects of genetic variation in humans and non-human great apes, natural selection, cultural history and genome-wide comparison of closely related species. My current work is focused on analysing genetic variation in modern gorillas to make inferences about their demographic past. To this end I am using the de novo assembly of Kamilah, a western lowland gorilla, as well as reduced representation sequence data from additional individuals representing both the eastern and western species.

References

  • Population differentiation as an indicator of recent positive selection in humans: an empirical evaluation.

    Xue Y, Zhang X, Huang N, Daly A, Gillson CJ, Macarthur DG, Yngvadottir B, Nica AC, Woodwark C, Chen Y, Conrad DF, Ayub Q, Mehdi SQ, Li P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    We have evaluated the extent to which SNPs identified by genomewide surveys as showing unusually high levels of population differentiation in humans have experienced recent positive selection, starting from a set of 32 nonsynonymous SNPs in 27 genes highlighted by the HapMap1 project. These SNPs were genotyped again in the HapMap samples and in the Human Genome Diversity Project-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) panel of 52 populations representing worldwide diversity; extended haplotype homozygosity was investigated around all of them, and full resequence data were examined for 9 genes (5 from public sources and 4 from new data sets). For 7 of the genes, genotyping errors were responsible for an artifactual signal of high population differentiation and for 2, the population differentiation did not exceed our significance threshold. For the 18 genes with confirmed high population differentiation, 3 showed evidence of positive selection as measured by unusually extended haplotypes within a population, and 7 more did in between-population analyses. The 9 genes with resequence data included 7 with high population differentiation, and 5 showed evidence of positive selection on the haplotype carrying the nonsynonymous SNP from skewed allele frequency spectra; in addition, 2 showed evidence of positive selection on unrelated haplotypes. Thus, in humans, high population differentiation is (apart from technical artifacts) an effective way of enriching for recently selected genes, but is not an infallible pointer to recent positive selection supported by other lines of evidence.

    Funded by: Wellcome Trust

    Genetics 2009;183;3;1065-77

  • A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs.

    Yngvadottir B, Xue Y, Searle S, Hunt S, Delgado M, Morrison J, Whittaker P, Deloukas P and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK.

    Nonsense SNPs introduce premature termination codons into genes and can result in the absence of a gene product or in a truncated and potentially harmful protein, so they are often considered disadvantageous and are associated with disease susceptibility. As such, we might expect the disrupted allele to be rare and, in healthy people, observed only in a heterozygous state. However, some, like those in the CASP12 and ACTN3 genes, are known to be present at high frequencies and to occur often in a homozygous state and seem to have been advantageous in recent human evolution. To evaluate the selective forces acting on nonsense SNPs as a class, we have carried out a large-scale experimental survey of nonsense SNPs in the human genome by genotyping 805 of them (plus control synonymous SNPs) in 1,151 individuals from 56 worldwide populations. We identified 169 genes containing nonsense SNPs that were variable in our samples, of which 99 were found with both copies inactivated in at least one individual. We found that the sampled humans differ on average by 24 genes (out of about 20,000) because of these nonsense SNPs alone. As might be expected, nonsense SNPs as a class were found to be slightly disadvantageous over evolutionary timescales, but a few nevertheless showed signs of being possibly advantageous, as indicated by unusually high levels of population differentiation, long haplotypes, and/or high frequencies of derived alleles. This study underlines the extent of variation in gene content within humans and emphasizes the importance of understanding this type of variation.

    Funded by: Wellcome Trust: 062023

    American journal of human genetics 2009;84;2;224-34

  • The promise and reality of personal genomics.

    Yngvadottir B, Macarthur DG, Jin H and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

    The publication of the highest-quality and best-annotated personal genome yet tells us much about sequencing technology, something about genetic ancestry, but still little of medical relevance.

    Funded by: Wellcome Trust

    Genome biology 2009;10;9;237

  • Insights into modern disease from our distant evolutionary past.

    Yngvadottir B

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. bya@sanger.ac.uk

    An EMBO workshop entitled 'Human Evolution and Disease' was held recently (6-9 December 2006, Hyderabad, India) where 141 scientists from many disciplines came together to discuss recent studies of human variation, origins and dispersal, natural selection and disease susceptibility. The meeting tackled the subject of human evolution and disease from the different perspectives of archaeology, linguistics, genetics and genomics based on both new and publicly available data sets. In this report, we highlight the latest fashion crazes in the discipline, in particular, the use of large public data sets and new methods to analyse modern human variation and the links between human evolution and disease susceptibility.

    European journal of human genetics : EJHG 2007;15;5;603-6

  • A shared Y-chromosomal heritage between Muslims and Hindus in India.

    Gutala R, Carvalho-Silva DR, Jin L, Yngvadottir B, Avadhanula V, Nanne K, Singh L, Chakraborty R and Tyler-Smith C

    Department of Medicine, University of Texas Health Science Center, San Antonio, TX, USA.

    Arab forces conquered the Indus Delta region in 711 AD: and, although a Muslim state was established there, their influence was barely felt in the rest of South Asia at that time. By the end of the tenth century, Central Asian Muslims moved into India from the northwest and expanded throughout the subcontinent. Muslim communities are now the largest minority religion in India, comprising more than 138 million people in a predominantly Hindu population of over one billion. It is unclear whether the Muslim expansion in India was a purely cultural phenomenon or had a genetic impact on the local population. To address this question from a male perspective, we typed eight microsatellite loci and 16 binary markers from the Y chromosome in 246 Muslims from Andhra Pradesh, and compared them to published data on 4,204 males from East Asia, Central Asia, other parts of India, Sri Lanka, Pakistan, Iran, the Middle East, Turkey, Egypt and Morocco. We find that the Muslim populations in general are genetically closer to their non-Muslim geographical neighbors than to other Muslims in India, and that there is a highly significant correlation between genetics and geography (but not religion). Our findings indicate that, despite the documented practice of marriage between Muslim men and Hindu women, Islamization in India did not involve large-scale replacement of Hindu Y chromosomes. The Muslim expansion in India was predominantly a cultural change and was not accompanied by significant gene flow, as seen in other places, such as China and Central Asia.

    Funded by: Wellcome Trust: 077009

    Human genetics 2006;120;4;543-51

  • mtDNA variation in Inuit populations of Greenland and Canada: migration history and population structure.

    Helgason A, Pálsson G, Pedersen HS, Angulalik E, Gunnarsdóttir ED, Yngvadóttir B and Stefánsson K

    deCODE Genetics, Inc., 101 Reykjavik, Iceland. agnar@decode.is

    We examined 395 mtDNA control-region sequences from Greenlandic Inuit and Canadian Kitikmeot Inuit with the aim of shedding light on the migration history that underlies the present geographic patterns of genetic variation at this locus in the Arctic. In line with previous studies, we found that Inuit populations carry only sequences belonging to haplotype clusters A2 and D3. However, a comparison of Arctic populations from Siberia, Canada, and Greenland revealed considerable differences in the frequencies of these haplotypes. Moreover, large sample sizes and regional information about birthplaces of maternal grandmothers permitted the detection of notable differences in the distribution of haplotypes among subpopulations within Greenland. Our results cast doubt on the prevailing hypothesis that contemporary Inuit trace their all of their ancestry to so-called Thule groups that expanded from Alaska about 800-1,000 years ago. In particular, discrepancies in mutational divergence between the Inuit populations and their putative source mtDNA pool in Siberia/Alaska for the two predominant haplotype clusters, A2a and A2b, are more consistent with the possibility that expanding Thule groups encountered and interbred with existing Dorset populations in Canada and Greenland.

    American journal of physical anthropology 2006;130;1;123-34

  • Spread of an inactive form of caspase-12 in humans is due to recent positive selection.

    Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, Sabeti P, Chen Y, Stalker J, Huckle E, Burton J, Leonard S, Rogers J and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, United Kingdom.

    The human caspase-12 gene is polymorphic for the presence or absence of a stop codon, which results in the occurrence of both active (ancestral) and inactive (derived) forms of the gene in the population. It has been shown elsewhere that carriers of the inactive gene are more resistant to severe sepsis. We have now investigated whether the inactive form has spread because of neutral drift or positive selection. We determined its distribution in a worldwide sample of 52 populations and resequenced the gene in 77 individuals from the HapMap Yoruba, Han Chinese, and European populations. There is strong evidence of positive selection from low diversity, skewed allele-frequency spectra, and the predominance of a single haplotype. We suggest that the inactive form of the gene arose in Africa approximately 100-500 thousand years ago (KYA) and was initially neutral or almost neutral but that positive selection beginning approximately 60-100 KYA drove it to near fixation. We further propose that its selective advantage was sepsis resistance in populations that experienced more infectious diseases as population sizes and densities increased.

    Funded by: Wellcome Trust

    American journal of human genetics 2006;78;4;659-70

  • An Icelandic example of the impact of population structure on association studies.

    Helgason A, Yngvadóttir B, Hrafnkelsson B, Gulcher J and Stefánsson K

    deCODE Genetics, Sturlugata 8, 101 Reykjavík, Iceland. agnar@decode.is <agnar@decode.is&gt;

    The impact of population structure on association studies undertaken to identify genetic variants underlying common human diseases is an issue of growing interest. Spurious associations of alleles with disease phenotypes may be obtained or true associations overlooked when allele frequencies differ notably among subpopulations that are not represented equally among cases and controls. Population structure influences even carefully designed studies and can affect the validity of association results. Most study designs address this problem by sampling cases and controls from groups that share the same nationality or self-reported ethnic background, with the implicit assumption that no substructure exists within such groups. We examined population structure in the Icelandic gene pool using extensive genealogical and genetic data. Our results indicate that sampling strategies need to take account of substructure even in a relatively homogenous genetic isolate. This will probably be even more important in larger populations.

    Nature genetics 2005;37;1;90-5

* quick link - http://q.sanger.ac.uk/p05a13tm