Medical genomics

The aim of the Medical genomics research group is to elucidate the genetic basis of common human disease with a particular focus on autoimmunity.

Jeffrey Barrett's team aims to unravel the relationships between variations in DNA sequence among individuals and their disease risk. The complexity of this problem, when applied across the entire human genome, requires a number of specialised statistical and computational approaches to handling the huge volumes of data and extracting meaningful connections to disease.

[Genome Research Limited]

Background

For most common human diseases it is impossible to predict with certainty whether any particular person will become sick, or even to explain the causes of an illness after it has occurred. A person's total risk for common, complex diseases such as cancer or heart disease is a combination of their unique genetic make-up and their environment and behaviour. The challenge of human disease genetics is to translate genetic experiments into an understanding of the underlying biology of human disease.

Human disease genetics has been historically focused on those diseases which follow a simple inheritance pattern implicating a single, controlling gene. Family-based studies have identified single genes that cause disease in over 2,400 cases, but they have proved disappointing in contributing to the understanding of the causes of complex disease.

A number of developments, beginning around the year 2000, have changed that picture and have contributed to the recent explosive growth in the insight into how genetic variation contributes to common disease risk. These include the completion of the whole human genome sequence, followed by large-scale efforts to identify single nucleotide polymorphisms (SNPs), and the HapMap project. These have crucially provided a genome-wide 'catalogue of variation', as well as a map of the correlations between nearby variants. Furthermore, inexpensive SNP technology has made it feasible to study nearly all common variation in thousands of disease cases and healthy controls. The culmination of these discoveries has enabled dozens of genome-wide association studies which have revealed hundreds of bona-fide genetic associations to complex diseases.

Research

Our aims

The aim of the Medical genomics team is to identify specific genetic variants which are linked to common diseases, and to understand the biological function of those variants.

Our approach

We have been involved in a number of first generation genome-wide association studies (GWAS), including the Wellcome Trust Case Control Consortium (WTCCC). Because the number of samples involved in GWAS is directly linked to the power to detect weak association signals we have more recently undertaken a number of meta-analyses of multiple GWAS of the same trait. Recent successes include studies in both Crohn's disease and Type 1 diabetes, each of which yielded twenty new associations. We hope to take advantage of ever larger datasets and also to combine information across different observable disease characteristics.

In addition to this work, we are also actively involved in extending human disease genetics into the new era of fast, inexpensive DNA sequencing. We are working both on direct sequencing of cases to discover rare variants, and on using statistical methods to predict these variants from reference sets such as those generated by the 1000 Genomes Project. These approaches will help unlock a set of rare, more highly penetrant, mutations which are difficult to detect using GWAS.

While the discovery of so many associations to common human disease has generated a great deal of excitement, we still face the critical challenge of translating that information into a deeper understanding of biology. We are working both on methods to find causal mutations in associated regions (which are usually implicated by a nearby variant in linkage disequilibrium) as well as collaborative work within the Institute to understand what happens to experimental organisms when the newly discovered loci are disrupted.

Team publications

  • Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis.

    International Multiple Sclerosis Genetics Consortium (IMSGC), Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C, Shah TS, Spencer C, Booth D, Goris A, Oturai A, Saarela J, Fontaine B, Hemmer B, Martin C, Zipp F, D'Alfonso S, Martinelli-Boneschi F, Taylor B, Harbo HF, Kockum I, Hillert J, Olsson T, Ban M, Oksenberg JR, Hintzen R, Barcellos LF, Wellcome Trust Case Control Consortium 2 (WTCCC2), International IBD Genetics Consortium (IIBDGC), Agliardi C, Alfredsson L, Alizadeh M, Anderson C, Andrews R, Søndergaard HB, Baker A, Band G, Baranzini SE, Barizzone N, Barrett J, Bellenguez C, Bergamaschi L, Bernardinelli L, Berthele A, Biberacher V, Binder TM, Blackburn H, Bomfim IL, Brambilla P, Broadley S, Brochet B, Brundin L, Buck D, Butzkueven H, Caillier SJ, Camu W, Carpentier W, Cavalla P, Celius EG, Coman I, Comi G, Corrado L, Cosemans L, Cournu-Rebeix I, Cree BA, Cusi D, Damotte V, Defer G, Delgado SR, Deloukas P, di Sapio A, Dilthey AT, Donnelly P, Dubois B, Duddy M, Edkins S, Elovaara I, Esposito F, Evangelou N, Fiddes B, Field J, Franke A, Freeman C, Frohlich IY, Galimberti D, Gieger C, Gourraud PA, Graetz C, Graham A, Grummel V, Guaschino C, Hadjixenofontos A, Hakonarson H, Halfpenny C, Hall G, Hall P, Hamsten A, Harley J, Harrower T, Hawkins C, Hellenthal G, Hillier C, Hobart J, Hoshi M, Hunt SE, Jagodic M, Jelčić I, Jochim A, Kendall B, Kermode A, Kilpatrick T, Koivisto K, Konidari I, Korn T, Kronsbein H, Langford C, Larsson M, Lathrop M, Lebrun-Frenay C, Lechner-Scott J, Lee MH, Leone MA, Leppä V, Liberatore G, Lie BA, Lill CM, Lindén M, Link J, Luessi F, Lycke J, Macciardi F, Männistö S, Manrique CP, Martin R, Martinelli V, Mason D, Mazibrada G, McCabe C, Mero IL, Mescheriakova J, Moutsianas L, Myhr KM, Nagels G, Nicholas R, Nilsson P, Piehl F, Pirinen M, Price SE, Quach H, Reunanen M, Robberecht W, Robertson NP, Rodegher M, Rog D, Salvetti M, Schnetz-Boutaud NC, Sellebjerg F, Selter RC, Schaefer C, Shaunak S, Shen L, Shields S, Siffrin V, Slee M, Sorensen PS, Sorosina M, Sospedra M, Spurkland A, Strange A, Sundqvist E, Thijs V, Thorpe J, Ticca A, Tienari P, van Duijn C, Visser EM, Vucic S, Westerlind H, Wiley JS, Wilkins A, Wilson JF, Winkelmann J, Zajicek J, Zindler E, Haines JL, Pericak-Vance MA, Ivinson AJ, Stewart G, Hafler D, Hauser SL, Compston A, McVean G, De Jager P, Sawcer SJ and McCauley JL

    Nature genetics 2013;45;11;1353-60

  • Imputation-based meta-analysis of severe malaria in three African populations.

    Band G, Le QS, Jostins L, Pirinen M, Kivinen K, Jallow M, Sisay-Joof F, Bojang K, Pinder M, Sirugo G, Conway DJ, Nyirongo V, Kachala D, Molyneux M, Taylor T, Ndila C, Peshu N, Marsh K, Williams TN, Alcock D, Andrews R, Edkins S, Gray E, Hubbart C, Jeffreys A, Rowlands K, Schuldt K, Clark TG, Small KS, Teo YY, Kwiatkowski DP, Rockett KA, Barrett JC, Spencer CC, Malaria Genomic Epidemiology Network and Malaria Genomic Epidemiological Network

    PLoS genetics 2013;9;5;e1003509

  • Olorin: combining gene flow with exome sequencing in large family studies of complex disease.

    Morris JA and Barrett JC

    Bioinformatics (Oxford, England) 2012;28;24;3320-1

  • Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.

    Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, Lee JC, Schumm LP, Sharma Y, Anderson CA, Essers J, Mitrovic M, Ning K, Cleynen I, Theatre E, Spain SL, Raychaudhuri S, Goyette P, Wei Z, Abraham C, Achkar JP, Ahmad T, Amininejad L, Ananthakrishnan AN, Andersen V, Andrews JM, Baidoo L, Balschun T, Bampton PA, Bitton A, Boucher G, Brand S, Büning C, Cohain A, Cichon S, D'Amato M, De Jong D, Devaney KL, Dubinsky M, Edwards C, Ellinghaus D, Ferguson LR, Franchimont D, Fransen K, Gearry R, Georges M, Gieger C, Glas J, Haritunians T, Hart A, Hawkey C, Hedl M, Hu X, Karlsen TH, Kupcinskas L, Kugathasan S, Latiano A, Laukens D, Lawrance IC, Lees CW, Louis E, Mahy G, Mansfield J, Morgan AR, Mowat C, Newman W, Palmieri O, Ponsioen CY, Potocnik U, Prescott NJ, Regueiro M, Rotter JI, Russell RK, Sanderson JD, Sans M, Satsangi J, Schreiber S, Simms LA, Sventoraityte J, Targan SR, Taylor KD, Tremelling M, Verspaget HW, De Vos M, Wijmenga C, Wilson DC, Winkelmann J, Xavier RJ, Zeissig S, Zhang B, Zhang CK, Zhao H, International IBD Genetics Consortium (IIBDGC), Silverberg MS, Annese V, Hakonarson H, Brant SR, Radford-Smith G, Mathew CG, Rioux JD, Schadt EE, Daly MJ, Franke A, Parkes M, Vermeire S, Barrett JC and Cho JH

    Nature 2012;491;7422;119-24

  • Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis.

    Liu JZ, Almarri MA, Gaffney DJ, Mells GF, Jostins L, Cordell HJ, Ducker SJ, Day DB, Heneghan MA, Neuberger JM, Donaldson PT, Bathgate AJ, Burroughs A, Davies MH, Jones DE, Alexander GJ, Barrett JC, Sandford RN, Anderson CA, UK Primary Biliary Cirrhosis (PBC) Consortium and Wellcome Trust Case Control Consortium 3

    Nature genetics 2012;44;10;1137-41

  • optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants.

    Shah TS, Liu JZ, Floyd JA, Morris JA, Wirth N, Barrett JC and Anderson CA

    Bioinformatics (Oxford, England) 2012;28;12;1598-603

  • Misuse of hierarchical linear models overstates the significance of a reported association between OXTR and prosociality.

    Jostins L, Pickrell JK, MacArthur DG and Barrett JC

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;18;E1048

  • A systematic survey of loss-of-function variants in human protein-coding genes.

    MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, 1000 Genomes Project Consortium, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB and Tyler-Smith C

    Science (New York, N.Y.) 2012;335;6070;823-8

  • Genetic risk prediction in complex disease.

    Jostins L and Barrett JC

    Human molecular genetics 2011;20;R2;R182-8

  • Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

    Jostins L, Morley KI and Barrett JC

    European journal of human genetics : EJHG 2011;19;6;662-6

  • Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis.

    Mells GF, Floyd JA, Morley KI, Cordell HJ, Franklin CS, Shin SY, Heneghan MA, Neuberger JM, Donaldson PT, Day DB, Ducker SJ, Muriithi AW, Wheater EF, Hammond CJ, Dawwas MF, UK PBC Consortium, Wellcome Trust Case Control Consortium 3, Jones DE, Peltonen L, Alexander GJ, Sandford RN and Anderson CA

    Nature genetics 2011;43;4;329-32

  • Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47.

    Anderson CA, Boucher G, Lees CW, Franke A, D'Amato M, Taylor KD, Lee JC, Goyette P, Imielinski M, Latiano A, Lagacé C, Scott R, Amininejad L, Bumpstead S, Baidoo L, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Colombel JF, Denson LA, De Vos M, Dubinsky M, Edwards C, Ellinghaus D, Fehrmann RS, Floyd JA, Florin T, Franchimont D, Franke L, Georges M, Glas J, Glazer NL, Guthery SL, Haritunians T, Hayward NK, Hugot JP, Jobin G, Laukens D, Lawrance I, Lémann M, Levine A, Libioulle C, Louis E, McGovern DP, Milla M, Montgomery GW, Morley KI, Mowat C, Ng A, Newman W, Ophoff RA, Papi L, Palmieri O, Peyrin-Biroulet L, Panés J, Phillips A, Prescott NJ, Proctor DD, Roberts R, Russell R, Rutgeerts P, Sanderson J, Sans M, Schumm P, Seibold F, Sharma Y, Simms LA, Seielstad M, Steinhart AH, Targan SR, van den Berg LH, Vatn M, Verspaget H, Walters T, Wijmenga C, Wilson DC, Westra HJ, Xavier RJ, Zhao ZZ, Ponsioen CY, Andersen V, Torkvist L, Gazouli M, Anagnou NP, Karlsen TH, Kupcinskas L, Sventoraityte J, Mansfield JC, Kugathasan S, Silverberg MS, Halfvarson J, Rotter JI, Mathew CG, Griffiths AM, Gearry R, Ahmad T, Brant SR, Chamaillard M, Satsangi J, Cho JH, Schreiber S, Daly MJ, Barrett JC, Parkes M, Annese V, Hakonarson H, Radford-Smith G, Duerr RH, Vermeire S, Weersma RK and Rioux JD

    Nature genetics 2011;43;3;246-52

  • Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.

    Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Green T, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Raychaudhuri S, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, Wijmenga C, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Cohen A, Colombel JF, Cottone M, Stronati L, Denson T, De Vos M, D'Inca R, Dubinsky M, Edwards C, Florin T, Franchimont D, Gearry R, Glas J, Van Gossum A, Guthery SL, Halfvarson J, Verspaget HW, Hugot JP, Karban A, Laukens D, Lawrance I, Lemann M, Levine A, Libioulle C, Louis E, Mowat C, Newman W, Panés J, Phillips A, Proctor DD, Regueiro M, Russell R, Rutgeerts P, Sanderson J, Sans M, Seibold F, Steinhart AH, Stokkers PC, Torkvist L, Kullak-Ublick G, Wilson D, Walters T, Targan SR, Brant SR, Rioux JD, D'Amato M, Weersma RK, Kugathasan S, Griffiths AM, Mansfield JC, Vermeire S, Duerr RH, Silverberg MS, Satsangi J, Schreiber S, Cho JH, Annese V, Hakonarson H, Daly MJ and Parkes M

    Nature genetics 2010;42;12;1118-25

  • Evoker: a visualization tool for genotype intensity data.

    Morris JA, Randall JC, Maller JB and Barrett JC

    Bioinformatics (Oxford, England) 2010;26;14;1786-7

  • Multiple common variants for celiac disease influencing immune gene expression.

    Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Adány R, Aromaa A, Bardella MT, van den Berg LH, Bockett NA, de la Concha EG, Dema B, Fehrmann RS, Fernández-Arquero M, Fiatal S, Grandone E, Green PM, Groen HJ, Gwilliam R, Houwen RH, Hunt SE, Kaukinen K, Kelleher D, Korponay-Szabo I, Kurppa K, MacMathuna P, Mäki M, Mazzilli MC, McCann OT, Mearin ML, Mein CA, Mirza MM, Mistry V, Mora B, Morley KI, Mulder CJ, Murray JA, Núñez C, Oosterom E, Ophoff RA, Polanco I, Peltonen L, Platteel M, Rybak A, Salomaa V, Schweizer JJ, Sperandeo MP, Tack GJ, Turner G, Veldink JH, Verbeek WH, Weersma RK, Wolters VM, Urcelay E, Cukrowska B, Greco L, Neuhausen SL, McManus R, Barisani D, Deloukas P, Barrett JC, Saavalainen P, Wijmenga C and van Heel DA

    Nature genetics 2010;42;4;295-302

Team

Team members

Luke Jostins
Visiting Scientist
Yang Luo
yl2@sanger.ac.ukPostdoctoral Fellow
Kate Morley
km5@sanger.ac.ukStatistical Genetics Analyst
James Morris
jm20@sanger.ac.ukunknown

Luke Jostins

- Visiting Scientist

I am a PhD student in Statistical Genetics at the Sanger Institute and Cambridge University. I'm originally from London, but have been resident in Cambridge for 7 years. Apart from my Sanger Institute research, I also teach on a few courses in Cambridge, and run, edit and write for the blog Genomes Unzipped.

I have an MA (Cantab i.e. equivalent to a BA) in Natural Sciences from Cambridge, and an MPhil in Computation Biology from the same (a real one). Before coming to the Sanger, I worked on evolutionary optimization of gene-circuit models with Johannes Jaeger in Cambridge.

Research

My research interests are focused on the use of next-generation datasets (mostly from second-generation sequencing) to understand the genetic basis of complex disease. I am interested in genotype imputation, linkage and sequencing in affected families, case-control resequencing, and risk prediction in complex disease. Wearing a technical hat, my interests are in statistical inference and computational analysis of large datasets.

I am involved in a number of consortia, including the UK Inflammatory Bowel Diseases Genetics Consortium, the MalariaGEN Consortium and the 1000 Genomes Project.

References

  • Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

    Jostins L, Morley KI and Barrett JC

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.

    Funded by: Wellcome Trust: WT089120/Z/09/Z

    European journal of human genetics : EJHG 2011;19;6;662-6

  • Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.

    Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Green T, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Raychaudhuri S, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, Wijmenga C, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Cohen A, Colombel JF, Cottone M, Stronati L, Denson T, De Vos M, D'Inca R, Dubinsky M, Edwards C, Florin T, Franchimont D, Gearry R, Glas J, Van Gossum A, Guthery SL, Halfvarson J, Verspaget HW, Hugot JP, Karban A, Laukens D, Lawrance I, Lemann M, Levine A, Libioulle C, Louis E, Mowat C, Newman W, Panés J, Phillips A, Proctor DD, Regueiro M, Russell R, Rutgeerts P, Sanderson J, Sans M, Seibold F, Steinhart AH, Stokkers PC, Torkvist L, Kullak-Ublick G, Wilson D, Walters T, Targan SR, Brant SR, Rioux JD, D'Amato M, Weersma RK, Kugathasan S, Griffiths AM, Mansfield JC, Vermeire S, Duerr RH, Silverberg MS, Satsangi J, Schreiber S, Cho JH, Annese V, Hakonarson H, Daly MJ and Parkes M

    Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel, Germany.

    We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: M01-RR00425; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-35129, N01-HC-45133, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, R01 HL087652, U01 HL080295; NIAMS NIH HHS: K08 AR055688-01A1S1, K08 AR055688-03, K08 AR055688-04; NIDDK NIH HHS: DK 063491, DK062413, DK062420, DK062422, DK062423, DK062429, DK062431, DK062432, DK064869, DK069513, DK084554, DK76984, P01-DK046763, R01 DK064869-09; Wellcome Trust: 089120, WT089120/Z/09/Z

    Nature genetics 2010;42;12;1118-25

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01 HG004120, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54 HG003273, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861, T32 GM007753; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • Microindel detection in short-read sequence data.

    Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S and Robinson PN

    Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin. peter.krawitz@googlemail.com

    Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge.

    Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels.

    Contact: peter.krawitz@googlemail.com; peter.robinson@charite.de

    Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2010;26;6;722-9

  • Reverse engineering a gene network using an asynchronous parallel evolution strategy.

    Jostins L and Jaeger J

    Laboratory for Development & Evolution, University Museum of Zoology, Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK.

    Background: The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task.

    Results: Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested.

    Conclusions: Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D00513

    BMC systems biology 2010;4;17

Yang Luo

yl2@sanger.ac.uk Postdoctoral Fellow

After completing my first degree in Mathematics at Imperial College London, I moved to Cambridge for my PhD degree. My research was focused on the stochastic modelling of Embryonic stem (ES) cell systems. In July 2011, I joined Jeff's team at Sanger as a post-doctoral fellow.

Research

I am interested in general mathematical methods that can help to identify the links between variations in DNA sequences and the risk of certain diseases. My current research projects include carrying out genome-wide association studies to discover genomic regions that are associated with Tuberculosis and using the next-generation sequencing data to study complex human diseases.

Kate Morley

km5@sanger.ac.uk Statistical Genetics Analyst

I completed undergraduate degrees in History and Genetics at the University of Queensland (Australia), followed by PhD studies in the Genetic Epidemiology group at the Queensland Institute of Medical Research. After graduating I took up a Post-doctoral position at the Centre for Molecular, Environmental, Genetic and Analytic Epidemiology at the University of Melbourne. I joined the Statistical and Computational Genetics group as a Post-doctoral Fellow in April 2009. In February 2011 I moved to a staff scientist position on the Deciphering Developmental Disorders project (led by Nigel Carter, Jeff Barrett, and Matt Hurles).

Research

As a Post-doc at Sanger I worked on the analysis of genome-wide association studies of autoimmune disorders, particularly inflammatory bowel disease (with the UK Inflammatory Bowel Disease Genetics Consortium) and primary biliary cirrhosis (as part of the Wellcome Trust Case Control Consortium 3). As part of the Deciphering Developmental Disorders project I am involved in the analysis of aCGH, SNP array, and exome data from children with developmental disorders and their parents.

References

  • Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

    Jostins L, Morley KI and Barrett JC

    Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.

    Funded by: Wellcome Trust: WT089120/Z/09/Z

    European journal of human genetics : EJHG 2011;19;6;662-6

  • Systematic review of early cardiometabolic outcomes of the first treated episode of psychosis.

    Foley DL and Morley KI

    Applied Genetics and Biostatistics, Orygen Youth Health Research Centre, the University of Melbourne, Parkville, Victoria, Australia. dfoley@unimelb.edu.au

    Context: The increased mortality associated with schizophrenia is largely due to cardiovascular disease. Treatment with antipsychotics is associated with weight gain and changes in other cardiovascular risk factors. Early identification of modifiable cardiovascular risk factors is a clinical imperative but prospective longitudinal studies of the early cardiometabolic adverse effects of antipsychotic drug treatment other than weight gain have not been previously reviewed.

    Objectives: To assess the methods and reporting of cardiometabolic outcome studies of the first treated episode of psychosis, review key findings, and suggest directions for future research.

    PsycINFO, MEDLINE, and Scopus from January 1990 to June 2010.

    Subjects were experiencing their first treated episode of psychosis. Subjects were antipsychotic naive or had been exposed to antipsychotics for a short known period at the beginning of the study. Cardiometabolic indices were assessed. Studies used a longitudinal design.

    Sixty-four articles were identified describing 53 independent studies; 25 studies met inclusion criteria and were retained for detailed review.

    Consolidated Standards of Reporting Trials and Strengthening the Reporting of Observational Studies in Epidemiology checklists were used to assess the methods and reporting of studies. A qualitative review of findings was conducted.

    Conclusions: Two key hypotheses were identified based on this review: (1) in general, there is no difference in cardiovascular risk assessed by weight or metabolic indices between individuals with an untreated first episode of psychosis and healthy controls and (2) cardiovascular risk increases after first exposure to any antipsychotic drug. A rank order of drugs can be derived but there is no evidence of significant class differences. Recommended directions for future research include assessing the effect on cardiometabolic outcomes of medication adherence and dosage effects, determining the therapeutic window for antipsychotic use in adults and youth, and testing for moderation of outcomes by demographic factors, including sex and age, and clinical and genetic factors.

    Archives of general psychiatry 2011;68;6;609-16

  • Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis.

    Mells GF, Floyd JA, Morley KI, Cordell HJ, Franklin CS, Shin SY, Heneghan MA, Neuberger JM, Donaldson PT, Day DB, Ducker SJ, Muriithi AW, Wheater EF, Hammond CJ, Dawwas MF, UK PBC Consortium, Wellcome Trust Case Control Consortium 3, Jones DE, Peltonen L, Alexander GJ, Sandford RN and Anderson CA

    Academic Department of Medical Genetics, Cambridge University, Cambridge, UK; Department of Hepatology, Cambridge University Hospitals National Health Service (NHS) Foundation Trust, Cambridge, UK.

    In addition to the HLA locus, six genetic risk factors for primary biliary cirrhosis (PBC) have been identified in recent genome-wide association studies (GWAS). To identify additional loci, we carried out a GWAS using 1,840 cases from the UK PBC Consortium and 5,163 UK population controls as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3). We followed up 28 loci in an additional UK cohort of 620 PBC cases and 2,514 population controls. We identified 12 new susceptibility loci (at a genome-wide significance level of P < 5 × 10⁻⁸) and replicated all previously associated loci. We identified three further new loci in a meta-analysis of data from our study and previously published GWAS results. New candidate genes include STAT4, DENND1B, CD80, IL7R, CXCR5, TNFRSF1A, CLEC16A and NFKB1. This study has considerably expanded our knowledge of the genetic architecture of PBC.

    Funded by: Medical Research Council: G0500020, G0800460, G0802068; PHS HHS: 1R01LEY018246; Wellcome Trust: 085925/Z/08/Z, 091745, WT090355/B/09/Z, WT09355A/09/Z, WT91745/Z/10/Z

    Nature genetics 2011;43;4;329-32

  • Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47.

    Anderson CA, Boucher G, Lees CW, Franke A, D'Amato M, Taylor KD, Lee JC, Goyette P, Imielinski M, Latiano A, Lagacé C, Scott R, Amininejad L, Bumpstead S, Baidoo L, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Colombel JF, Denson LA, De Vos M, Dubinsky M, Edwards C, Ellinghaus D, Fehrmann RS, Floyd JA, Florin T, Franchimont D, Franke L, Georges M, Glas J, Glazer NL, Guthery SL, Haritunians T, Hayward NK, Hugot JP, Jobin G, Laukens D, Lawrance I, Lémann M, Levine A, Libioulle C, Louis E, McGovern DP, Milla M, Montgomery GW, Morley KI, Mowat C, Ng A, Newman W, Ophoff RA, Papi L, Palmieri O, Peyrin-Biroulet L, Panés J, Phillips A, Prescott NJ, Proctor DD, Roberts R, Russell R, Rutgeerts P, Sanderson J, Sans M, Schumm P, Seibold F, Sharma Y, Simms LA, Seielstad M, Steinhart AH, Targan SR, van den Berg LH, Vatn M, Verspaget H, Walters T, Wijmenga C, Wilson DC, Westra HJ, Xavier RJ, Zhao ZZ, Ponsioen CY, Andersen V, Torkvist L, Gazouli M, Anagnou NP, Karlsen TH, Kupcinskas L, Sventoraityte J, Mansfield JC, Kugathasan S, Silverberg MS, Halfvarson J, Rotter JI, Mathew CG, Griffiths AM, Gearry R, Ahmad T, Brant SR, Chamaillard M, Satsangi J, Cho JH, Schreiber S, Daly MJ, Barrett JC, Parkes M, Annese V, Hakonarson H, Radford-Smith G, Duerr RH, Vermeire S, Weersma RK and Rioux JD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: M01-RR00425; NIAID NIH HHS: AI062773; NIDDK NIH HHS: DK 063491, DK043351, DK062413, DK062420, DK062422, DK062423, DK062429, DK062431, DK062432, DK064869, DK069513, DK076984, DK084554, DK83756, P01-DK046763, P30 DK040561-15, P30 DK043351, R01 DK060049-10, R01 DK064869-05S1, R01 DK064869-06A1, R01 DK064869-07, R01 DK064869-08, R01 DK064869-09, R01 DK083756-04, U01 DK062432, U01 DK062432-07, U01 DK062432-08, U01 DK062432-09, U01 DK062432-10; Wellcome Trust: 083948/Z/07/Z, WT089120/Z/09/Z, WT091745/Z/10/Z

    Nature genetics 2011;43;3;246-52

  • Being more realistic about the public health impact of genomic medicine.

    Hall WD, Mathews R and Morley KI

    University of Queensland Centre for Clinical Research, The University of Queensland, Herston, Queensland, Australia. w.hall@uq.edu.au

    PLoS medicine 2010;7;10

  • Multiple common variants for celiac disease influencing immune gene expression.

    Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Adány R, Aromaa A, Bardella MT, van den Berg LH, Bockett NA, de la Concha EG, Dema B, Fehrmann RS, Fernández-Arquero M, Fiatal S, Grandone E, Green PM, Groen HJ, Gwilliam R, Houwen RH, Hunt SE, Kaukinen K, Kelleher D, Korponay-Szabo I, Kurppa K, MacMathuna P, Mäki M, Mazzilli MC, McCann OT, Mearin ML, Mein CA, Mirza MM, Mistry V, Mora B, Morley KI, Mulder CJ, Murray JA, Núñez C, Oosterom E, Ophoff RA, Polanco I, Peltonen L, Platteel M, Rybak A, Salomaa V, Schweizer JJ, Sperandeo MP, Tack GJ, Turner G, Veldink JH, Verbeek WH, Weersma RK, Wolters VM, Urcelay E, Cukrowska B, Greco L, Neuhausen SL, McManus R, Barisani D, Deloukas P, Barrett JC, Saavalainen P, Wijmenga C and van Heel DA

    Blizard Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.

    Funded by: Medical Research Council: G0700545, G0700545(82277); NIDDK NIH HHS: DK050678, DK071003, DK081645, DK57892, R01 DK081645-02; NINDS NIH HHS: NS058980; Wellcome Trust: 084743

    Nature genetics 2010;42;4;295-302

James Morris

jm20@sanger.ac.uk unknown

My first degree was in Molecular biology and genetics from the University of East Anglia (2003). I then went on to complete a Masters in research in Bioinformatics at the University of York (2004). Before joining the Sanger institute I undertook a PhD in Bioinformatics at the Elizabeth Garrett Anderson Institute for Women's Health at University College London (2009). The project involved the development of bioinformatics solutions for the management and analysis of high throughput tumour profiling projects. During my PhD I also developed an interest in agile software development processes and best practices in a bioinformatics setting.

Research

My role as scientific programmer is to provide informatics support to all the projects undertaken by the statistical and computational genetics group. My responsibilities include the maintenance and development of Evoker (http://www.sanger.ac.uk/resources/software/evoker/), which is a graphical tool for visualizing genotype intensity data that greatly simplifies the process of using intensity plots for quality control of genotype calls in genome-wide association studies. I am also involved with the analysis of RNAseq data, where I have created and currently maintain an analysis pipeline for identifying imbalances in RNA transcript counts between cases and controls.

References

  • Evoker: a visualization tool for genotype intensity data.

    Morris JA, Randall JC, Maller JB and Barrett JC

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    Summary: Genome-wide association studies (GWAS), which produce huge volumes of data, are now being carried out by many groups around the world, creating a need for user-friendly tools for data quality control (QC) and analysis. One critical aspect of GWAS QC is evaluating genotype cluster plots to verify sensible genotype calling in putatively associated single nucleotide polymorphisms (SNPs). Evoker is a tool for visualizing genotype cluster plots, and provides a solution to the computational and storage problems related to working with such large datasets.

    Availability: http://www.sanger.ac.uk/resources/software/evoker/

    Funded by: Wellcome Trust: 089120, WT08912/Z/09/Z

    Bioinformatics (Oxford, England) 2010;26;14;1786-7

Group leader

Jeffery's photo Jeffrey Barrett
Jeffrey's profile

Software

  • Evoker - a graphical tool for visualising genotype intensity data in order to assess genotype calls as part of quality control procedures for genome-wide association studies.
  • Olorin - an interactive filtering tool for next generation sequencing data coming from the study of large complex disease pedigrees.
* quick link - http://q.sanger.ac.uk/statcomp