Apart from human genome resequencing, projects that Richard is connected to include:
- the SGRP yeast sequence variation and population genomics project;
- the TreeFam database of animal gene families;
- the Ensembl resource for vertebrate genome annotation;
- the WormBase model organism database for C. elegans;
- the MitoCheck study of mitosis regulation in human cells;
- the Pfam database of protein domain families; and
- the ACEDB genome database.
- 1000 Genomes Project, a deep catalogue of human genetic variation.
- SGRP, Saccharomyces Genome Resequencing Project.
- WormBase is the repository of mapping, sequencing and phenotypic information for C. elegans and several related nematodes. It also contains large amounts of data from manually curated papers and genome wide studies.
- TreeFam, tree families database.
- Margarita, inferring genealogies from population genotype data and using these to map disease loci.
- MAQ, software for mapping short sequencing reads
The Sequence Alignment/Map format and SAMtools.
Bioinformatics (Oxford, England)2009;25;16;2078-9
PUBMED: 19505943; PMC: 2723002; DOI: 10.1093/bioinformatics/btp352
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.
Genome research2009;19;7;1316-23
PUBMED: 19498102; PMC: 2704439; DOI: 10.1101/gr.080531.108
Population genomics of domestic and wild yeasts.
Nature2009;458;7236;337-41
PUBMED: 19212322; PMC: 2659681; DOI: 10.1038/nature07743
Inferring selection on amino acid preference in protein domains.
Molecular biology and evolution2009;26;3;527-36
PUBMED: 19095755; PMC: 2716081; DOI: 10.1093/molbev/msn286
Accurate whole human genome sequencing using reversible terminator chemistry.
Nature2008;456;7218;53-9
PUBMED: 18987734; PMC: 2581791; DOI: 10.1038/nature07517
Mapping short DNA sequencing reads and calling variants using mapping quality scores.
Genome research2008;18;11;1851-8
PUBMED: 18714091; PMC: 2577856; DOI: 10.1101/gr.078212.108
Mapping trait loci by use of inferred ancestral recombination graphs.
American journal of human genetics2006;79;5;910-22
PUBMED: 17033967; PMC: 1698562; DOI: 10.1086/508901
TreeFam: a curated database of phylogenetic trees of animal gene families.
Nucleic acids research2006;34;Database issue;D572-80
PUBMED: 16381935; PMC: 1347480; DOI: 10.1093/nar/gkj118
Team
Team members
Members
- Aylwyn Scally
- as6@sanger.ac.ukPostdoctoral Fellow
Aylwyn Scally
as6@sanger.ac.uk Postdoctoral Fellow
I am a researcher in computational genomics and population genetics, with particular focus on human and primate evolution. Prior to working in this field my training was in theoretical physics at Trinity College, Dublin, followed by a Ph.D. in astrophysics at the University of Cambridge. I have been at the Sanger Institute since 2007.
Research
My research at the Sanger Institute has primarily been devoted to the Gorilla Genome Project, an international collaboration to assemble and analyse a whole genome sequence for gorilla. As part of this and other projects, I work on various aspects of high-throughput sequencing informatics including assembly, alignment and the detection and analysis of genomic variation.
References
-
Mapping copy number variation by population-scale genome sequencing.
Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: P01 HG004120, P41 HG004221, P41 HG004221-01, P41 HG004221-02, P41 HG004221-03, P41 HG004221-03S1, P41 HG004221-03S2, P41 HG004221-03S3, R01 HG004719, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, RC2 HG005552, RC2 HG005552-01, RC2 HG005552-02, U01 HG005209, U01 HG005209-01, U01 HG005209-02; NIGMS NIH HHS: R01 GM059290-10, R01 GM081533, R01 GM081533-01A1, R01 GM081533-02, R01 GM081533-03, R01 GM081533-04, R01 GM59290; Wellcome Trust: 062023, 077009, 077014, 077192
Nature 2011;470;7332;59-65
PUBMED: 21293372; PMC: 3077050; DOI: 10.1038/nature09708
-
A large genome center's improvements to the Illumina sequencing system.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
The Wellcome Trust Sanger Institute is one of the world's largest genome centers, and a substantial amount of our sequencing is performed with 'next-generation' massively parallel sequencing technologies: in June 2008 the quantity of purity-filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64 gigabases. Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high-throughput environment, to reduce bias, tighten insert size distribution and reliably obtain high yields of data.
Nature methods 2008;5;12;1005-10
PUBMED: 19034268; PMC: 2610436; DOI: 10.1038/nmeth.1270
-
Accurate whole human genome sequencing using reversible terminator chemistry.
Illumina Cambridge Ltd. (Formerly Solexa Ltd), Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex CB10 1XL, UK. dbentley@illumina.com
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Funded by: Biotechnology and Biological Sciences Research Council: B05823, MOL04534; NHGRI NIH HHS: Z01 HG200330-03; Wellcome Trust
Nature 2008;456;7218;53-9
PUBMED: 18987734; PMC: 2581791; DOI: 10.1038/nature07517

Dr Richard Durbin