Dr Richard Durbin

Richard is Acting Head of Computational Genomics at the Wellcome Trust Sanger Institute and leader of the Genome Informatics group.

Richard has worked on many areas of biological sequence analysis, and currently focuses on studying human genetic variation by genome-wide resequencing using new sequencing technologies.

Apart from human genome resequencing, projects that Richard is connected to include the SGRP yeast sequence variation and population genomics project, the TreeFam database of animal gene families, the Ensembl resource for vertebrate genome annotation, the WormBase model organism database for C. elegans, the MitoCheck study of mitosis regulation in human cells, the Pfam database of protein domain families, and the ACEDB genome database.

Richard is currently supervising four postdocs and a research student. He is interested in applications for new potential research students or postdocs, particularly in the area of population genome sequence variation analysis. During the last few years Richard's group have been using evolutionary probabilistic methods based on phylogenetic trees, and from this two new projects listed below have developed.


First, alongside continued method development, Richard's team have initiated a new comprehensive data resource, TreeFam, in the same way that in the past the Pfam project grew out of this group. This project started in 2004 in collaboration with the Beijing Genome Institute (BGI). It is developing a high quality comprehensive resource that shows how genes in animal gene families are related in an evolutionary tree, and hence assigns orthology and paralogy relationships between members of the families. The approach taken is analogous to that used by Pfam, using automated methods to develop candidate families, then progressively curating these families, at which point names and basic references are assigned, as well as any clear errors fixed. Once a family is curated, new sequences can be assigned to it during regular database rebuilds, allowing the classification to be maintained as more genomes are finished. An initial paper on TreeFam was published in the NAR database issue in January 2006. TreeFam now contains 289,083 genes from 25 species in 1,203 curated TreeFam-A families (39,000 genes) and 15,002 automatically generated TreeFam-B families.

New methods to handle genetic variation data

Second, Richard's team have developed a new way to analyse genetic variation data within species, based on heuristic reconstructions of Ancestral Recombination Graphs (ARGs; software is available). These describe the tree that relates individuals at each position in the genome, analogous to the phylogenetic tree, and how these trees vary as one moves along the chromosome, because of ancestral recombination events. Although it is well established that in principle knowing the ARG relating a set of individuals would allow optimal analysis of, for example, genetic disease association, inferring the ARG from gentoype data is underdetermined, and estimation or sampling using full likelihood or Bayesian methods is intractable. Rather than work with a simplified model, the team have developed a computationally efficient way to reconstruct plausible ARGs from large scale data sets, and shown using both simulated and real data how this can help association fine mapping. They have also shown in the yeast resequencing project, how ARGs can be used to integrate low coverage sequence data from many strains (S cerevisiae and S paradoxus) to infer full sequences for each strain with error estimates, and support population genetic analyses of sequence variation. The team are interested in extending this to human and pathogen data.

Suggested reading

"Biological Sequence Analysis", Sean Eddy S, Anders Krogh A and Graeme Mitchison G (Cambridge: Cambridge University Press, 1998)

Selected Publications

  • The Sequence Alignment/Map format and SAMtools.

    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R and 1000 Genome Project Data Processing Subgroup

    Bioinformatics (Oxford, England) 2009;25;16;2078-9

  • The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

    Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R and Lipman D

    Genome research 2009;19;7;1316-23

  • Population genomics of domestic and wild yeasts.

    Liti G, Carter DM, Moses AM, Warringer J, Parts L, James SA, Davey RP, Roberts IN, Burt A, Koufopanou V, Tsai IJ, Bergman CM, Bensasson D, O'Kelly MJ, van Oudenaarden A, Barton DB, Bailes E, Nguyen AN, Jones M, Quail MA, Goodhead I, Sims S, Smith F, Blomberg A, Durbin R and Louis EJ

    Nature 2009;458;7236;337-41

  • Inferring selection on amino acid preference in protein domains.

    Moses AM and Durbin R

    Molecular biology and evolution 2009;26;3;527-36

  • Accurate whole human genome sequencing using reversible terminator chemistry.

    Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IM, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DM, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang GD, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O'Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R and Smith AJ

    Nature 2008;456;7218;53-9

  • Mapping short DNA sequencing reads and calling variants using mapping quality scores.

    Li H, Ruan J and Durbin R

    Genome research 2008;18;11;1851-8

  • Mapping trait loci by use of inferred ancestral recombination graphs.

    Minichiello MJ and Durbin R

    American journal of human genetics 2006;79;5;910-22

  • TreeFam: a curated database of phylogenetic trees of animal gene families.

    Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J and Durbin R

    Nucleic acids research 2006;34;Database issue;D572-80

[Wellcome Library, London]

Richard's Project
Genome Informatics
Research Area
* quick link - http://q.sanger.ac.uk/adqmnmk5