Parasite genomics

The Parasite genomics team are undertaking major projects to sequence the genomes of parasites that are responsible for diseases prevalent throughout the developing world and poorer countries, such as malaria and other neglected tropical diseases.

Sequencing the genomes of these parasites is a first step to understanding how these parasites live and grow, and could eventually lead to the development of new and specific medicines to help eradicate the parasites and the diseases they cause.

The image on the right shows a mature female whipworm, Trichuris muris, in situ. T. muris is a naturally occurring nematode parasite of mice which resides in the caecum and colon and has a direct oral faecal life cycle. The slender front end of the whipworm, the stichosome, can be seen in multiple cross-sections just above the intestinal villi of the mammalian host in the centre of the image. In contrast, the larger rear end of the worm containing its reproductive organs has been captured in one longitudinal section in the upper half of the image.

[Neil Humphreys, University of Manchester]

Research

The Parasite Genomics group uses genome sequencing, comparative and functional genomics to investigate the biology of helminths and protozoan parasites.

Sequencing genomes

Perhaps the single most useful tool for any molecular biologist is a high quality genome sequence for their organism of interest. We are closely partnered with scientific communities interested in particular organisms and through this collaborative network we acquire DNA samples. We then utilise the outstanding sequencing facilities at the Sanger Institute to generate the data from which we can put together draft genome sequences. We develop computational tools to improve genome sequences, but also use manual improvement and this allows us to produce very high quality genomes which improve the quality of our collaborator's research. Our gold standard is the malaria reference genome, which we have been carefully curating for ten years.

Understanding genomes

Functional genomics deals with dynamic biological data, such as changes in the transcriptome, proteome and epigenome in the course of a parasite's life cycle. We make and use large-scale data sets to ask questions about the functions of parasite genes, using genome sequences to support our analysis. Many genes are unique to parasites, so we need these new data sets to unveil the functions of uncharacterised parasite genes. We can often infer functional information about a gene by understanding when and where it is expressed in the life cycle of the parasite, or identifying which genes change upon interaction with the parasite's host. High-throughput sequencing is a key tool in functional genomics, underpinning methods such as RNA-seq and ChIP-seq.

We apply these approaches to:

Helminths

Despite their importance globally, both medically and economically, parasitic helminth (worm) research has remained relatively untouched by genomics. Worm infections account for morbidity equivalent to more than 100 million disability-adjusted life years from more than one billion infections globally. With this in mind, we have developed the Sanger Helminth Genomes Initiative. Initially we are using de novo sequencing to produce reference genomes for a cross-phyla list that includes hookworms, whipworms, threadworms, Schistosomes, a tapeworm and the filarial parasite responsible for river blindness. We are also producing draft genomes of a broad list of parasitic helminths.

Protozoa

Amongst the protozoan parasites, we focus on two areas:

  • The Apicomplexa, including malaria parasites
  • The Kinetoplastida, which include Trypanosoma and Leishmania parasites.

We have built comparative genomic studies around high-quality reference genomes and while this continues we are also embarking on studies to understand host-parasite interactions and population structure.

Data download

Sequence data is available for download.

  • Reference and comparator Helminth genomes.
  • Genomes from the Helminth Genomes Initiative are accessible from the FTP site. These are available from the Sanger Institute as part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see the data sharing policy.
  • Complete, ongoing and forthcoming Protozoan genomes.

Resources

Databases

  • WormBase. Genetics of C elegans and related nematodes.
  • GeneDB. A window to our annotation as it is produced.

Tool development

Software that supports our annotation and analysis are under constant development. In particular, we work with the Pathogen Informatics team to develop

  • ABACAS. Rapidly contiguates (align, order, orientate), visualizes and designs primers to close gaps on shotgun assembled contigs based on a reference sequence.
  • Artemis and ACT. Portable and intuitive sequence viewing and browsing tools. Recently a new Chado database version has been launched.
  • iCORN. Corrects reference genome sequences by iteratively mapping reads and finding differences in the sequence.
  • IMAGE. Closes gaps in a draft assembly using Illumina paired end reads.
  • PAGIT. Generates high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
  • RATT. Transfer annotation from a reference (annotated) genome to an unannotated query genome.
  • REAPR. Evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison.

Selected Publications

  • Genomic analysis of the causative agents of coccidiosis in domestic chickens.

    Reid AJ, Blake DP, Ansari HR, Billington K, Browne HP, Bryant J, Dunn M, Hung SS, Kawahara F, Miranda-Saavedra D, Malas TB, Mourier T, Naghra H, Nair M, Otto TD, Rawlings ND, Rivailler P, Sanchez-Flores A, Sanders M, Subramaniam C, Tay YL, Woo Y, Wu X, Barrell B, Dear PH, Doerig C, Gruber A, Ivens AC, Parkinson J, Rajandream MA, Shirley MW, Wan KL, Berriman M, Tomley FM and Pain A

    Genome research 2014

  • Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction.

    Foth BJ, Tsai IJ, Reid AJ, Bancroft AJ, Nichol S, Tracey A, Holroyd N, Cotton JA, Stanley EJ, Zarowiecki M, Liu JZ, Huckvale T, Cooper PJ, Grencis RK and Berriman M

    Nature genetics 2014;46;7;693-700

  • Genome sequence of the tsetse fly (Glossina morsitans): vector of African trypanosomiasis.

    International Glossina Genome Initiative

    Science (New York, N.Y.) 2014;344;6182;380-6

  • A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium.

    Sinha A, Hughes KR, Modrzynska KK, Otto TD, Pfander C, Dickens NJ, Religa AA, Bushell E, Graham AL, Cameron R, Kafsack BF, Williams AE, Llinás M, Berriman M, Billker O and Waters AP

    Nature 2014;507;7491;253-7

  • A comprehensive evaluation of assembly scaffolding tools.

    Hunt M, Newbold C, Berriman M and Otto TD

    Genome biology 2014;15;3;R42

  • The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode.

    Cotton JA, Lilley CJ, Jones LM, Kikuchi T, Reid AJ, Thorpe P, Tsai IJ, Beasley H, Blok V, Cock PJ, Eves-van den Akker S, Holroyd N, Hunt M, Mantelin S, Naghra H, Pain A, Palomares-Rius JE, Zarowiecki M, Berriman M, Jones JT and Urwin PE

    Genome biology 2014;15;3;R43

  • The peculiar epidemiology of dracunculiasis in Chad.

    Eberhard ML, Ruiz-Tiben E, Hopkins DR, Farrell C, Toe F, Weiss A, Withers PC, Jenks MH, Thiele EA, Cotton JA, Hance Z, Holroyd N, Cama VA, Tahir MA and Mounda T

    The American journal of tropical medicine and hygiene 2014;90;1;61-70

  • Genomic confirmation of hybridisation and recent inbreeding in a vector-isolated Leishmania population.

    Rogers MB, Downing T, Smith BA, Imamura H, Sanders M, Svobodova M, Volf P, Berriman M, Cotton JA and Smith DF

    PLoS genetics 2014;10;1;e1004092

  • WormBase 2014: new views of curated biology.

    Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L and Sternberg PW

    Nucleic acids research 2014;42;Database issue;D789-93

  • The evolutionary dynamics of variant antigen genes in Babesia reveal a history of genomic innovation underlying host-parasite interaction.

    Jackson AP, Otto TD, Darby A, Ramaprasad A, Xia D, Echaide IE, Farber M, Gahlot S, Gamble J, Gupta D, Gupta Y, Jackson L, Malandrin L, Malas TB, Moussa E, Nair M, Reid AJ, Sanders M, Sharma J, Tracey A, Quail MA, Weir W, Wastling JM, Hall N, Willadsen P, Lingelbach K, Shiels B, Tait A, Berriman M, Allred DR and Pain A

    Nucleic acids research 2014;42;11;7113-31

  • Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.

    Lemieux JE, Kyes SA, Otto TD, Feller AI, Eastman RT, Pinches RA, Berriman M, Su XZ and Newbold CI

    Molecular microbiology 2013;90;3;519-37

  • Vector transmission regulates immune control of Plasmodium virulence.

    Spence PJ, Jarra W, Lévy P, Reid AJ, Chappell L, Brugat T, Sanders M, Berriman M and Langhorne J

    Nature 2013;498;7453;228-31

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Nature 2013;496;7443;57-63

  • Genes involved in host-parasite interactions can be revealed by their correlated expression.

    Reid AJ and Berriman M

    Nucleic acids research 2013;41;3;1508-18

  • The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.

    Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS and Cotton JA

    Genome biology 2013;14;8;R88

  • Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.

    Protasio AV, Dunne DW and Berriman M

    PLoS neglected tropical diseases 2013;7;3;e2091

  • REAPR: a universal tool for genome assembly evaluation.

    Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M and Otto TD

    Genome biology 2013;14;5;R47

  • A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

    Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M and Otto TD

    Nature protocols 2012;7;7;1260-84

  • A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.

    Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW and Berriman M

    PLoS neglected tropical diseases 2012;6;1;e1455

  • Germline transgenesis and insertional mutagenesis in Schistosoma mansoni mediated by murine leukemia virus.

    Rinaldi G, Eckert SE, Tsai IJ, Suttiprapa S, Kines KJ, Tort JF, Mann VH, Turner DJ, Berriman M and Brindley PJ

    PLoS pathogens 2012;8;7;e1002820

Team

Team members

Helen Beasley
Computer Biologist - Senior Genome Analyst
Hayley Bennett
hb6@sanger.ac.ukPostdoctoral Fellow
Lia Chappell
lc5@sanger.ac.ukPostdoctoral Fellow
Avril Coghlan
alc@sanger.ac.ukSenior Bioinformatician
James Cotton
jc17@sanger.ac.ukSenior Staff Scientist
Bernardo Foth
bf3@sanger.ac.ukSenior Staff Scientist
Tom Huckvale
Advanced Research Assistant
Sarah Nichol
unknown
Thomas Otto
Senior Staff Scientist
Anna Protasio
ap6@sanger.ac.ukPostdoctoral Fellow
Adam Reid
ar11@sanger.ac.ukStaff scientist
Florian Sessler
fs8@sanger.ac.ukPhD Student
Eleanor Stanley
es9@sanger.ac.ukSenior Bioinformatician
Sascha Steinbiss
ss34@sanger.ac.ukSenior Bioinformatician
Alan Tracey
Senior Computer Biologist
Alessandra Traini
at8@sanger.ac.ukSenior Bioinformatician
Magdalena Zarowiecki
mz3@sanger.ac.ukunknown

Helen Beasley

- Computer Biologist - Senior Genome Analyst

I graduated from the University of Paisley with BSc(Hons) in Biotechnology, where I developed a keen interest in plant breeding and crop improvement through recombinant DNA techniques. My research project focused on improving somatic hybridisation in Solanaceae species and this led me to study for a MSc in Plant Genetic Manipulation at the University of Nottingham. I joined the Sanger Institute as a finisher working on the Human Genome Project then other large genomes including zebrafish, mouse, pig, and tomato. Latterly I worked on finishing more problematic regions, alongside the coordination of some broad ranging collaborative projects.

Research

I joined the parasite genomics group as a Senior Genome Analyst in 2010 working on the manual improvement of helminths. I have worked on a number of helminth genomes including Schistosoma mansoni, Echinococcus species and Globodera pallida; improving assemblies at the sequence level using software tools to close gaps and resolve mis-assemblies, and through the manual curation of genes and gene training sets used to improve the accuracy of gene prediction software.

References

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

  • The tomato genome sequence provides insights into fleshy fruit evolution.

    Tomato Genome Consortium

    Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C509731/1, BB/G006199/1

    Nature 2012;485;7400;635-41

  • Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry.

    Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, Whibley A, Becuwe M, Baxter SW, Ferguson L, Wilkinson PA, Salazar C, Davidson C, Clark R, Quail MA, Beasley H, Glithero R, Lloyd C, Sims S, Jones MC, Rogers J, Jiggins CD and ffrench-Constant RH

    CNRS UMR 7205, Muséum National d'Histoire Naturelle, CP50, 45 Rue Buffon, 75005 Paris, France. joron@mnhn.fr

    Supergenes are tight clusters of loci that facilitate the co-segregation of adaptive variation, providing integrated control of complex adaptive phenotypes. Polymorphic supergenes, in which specific combinations of traits are maintained within a single population, were first described for 'pin' and 'thrum' floral types in Primula and Fagopyrum, but classic examples are also found in insect mimicry and snail morphology. Understanding the evolutionary mechanisms that generate these co-adapted gene sets, as well as the mode of limiting the production of unfit recombinant forms, remains a substantial challenge. Here we show that individual wing-pattern morphs in the polymorphic mimetic butterfly Heliconius numata are associated with different genomic rearrangements at the supergene locus P. These rearrangements tighten the genetic linkage between at least two colour-pattern loci that are known to recombine in closely related species, with complete suppression of recombination being observed in experimental crosses across a 400-kilobase interval containing at least 18 genes. In natural populations, notable patterns of linkage disequilibrium (LD) are observed across the entire P region. The resulting divergent haplotype clades and inversion breakpoints are found in complete association with wing-pattern morphs. Our results indicate that allelic combinations at known wing-patterning loci have become locked together in a polymorphic rearrangement at the P locus, forming a supergene that acts as a simple switch between complex adaptive phenotypes found in sympatry. These findings highlight how genomic rearrangements can have a central role in the coexistence of adaptive phenotypes involving several genes acting in concert, by locally limiting recombination and gene flow.

    Funded by: Biotechnology and Biological Sciences Research Council: BBE0118451; Medical Research Council: G0900740; Wellcome Trust: 079643, 098051

    Nature 2011;477;7363;203-6

  • Genomic libraries: I. Construction and screening of fosmid genomic libraries.

    Quail MA, Matthews L, Sims S, Lloyd C, Beasley H and Baxter SW

    Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.

    Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.

    Methods in molecular biology (Clifton, N.J.) 2011;772;37-58

  • Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Quail MA, Matthews L, Sims S, Lloyd C, Beasley H and Baxter SW

    Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

    Methods in molecular biology (Clifton, N.J.) 2011;772;59-81

  • Characterization of a hotspot for mimicry: assembly of a butterfly wing transcriptome to genomic sequence at the HmYb/Sb locus.

    Ferguson L, Lee SF, Chamberlain N, Nadeau N, Joron M, Baxter S, Wilkinson P, Papanicolaou A, Kumar S, Kee TJ, Clark R, Davidson C, Glithero R, Beasley H, Vogel H, Ffrench-Constant R and Jiggins C

    Department of Zoology, University of Cambridge, UK.

    The mimetic wing patterns of Heliconius butterflies are an excellent example of both adaptive radiation and convergent evolution. Alleles at the HmYb and HmSb loci control the presence/absence of hindwing bar and hindwing margin phenotypes respectively between divergent races of Heliconius melpomene, and also between sister species. Here, we used fine-scale linkage mapping to identify and sequence a BAC tilepath across the HmYb/Sb loci. We also generated transcriptome sequence data for two wing pattern forms of H. melpomene that differed in HmYb/Sb alleles using 454 sequencing technology. Custom scripts were used to process the sequence traces and generate transcriptome assemblies. Genomic sequence for the HmYb/Sb candidate region was annotated both using the MAKER pipeline and manually using transcriptome sequence reads. In total, 28 genes were identified in the HmYb/Sb candidate region, six of which have alternative splice forms. None of these are orthologues of genes previously identified as being expressed in butterfly wing pattern development, implying previously undescribed molecular mechanisms of pattern determination on Heliconius wings. The use of next-generation sequencing has therefore facilitated DNA annotation of a poorly characterized genome, and generated hypotheses regarding the identity of wing pattern at the HmYb/Sb loci.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0900740

    Molecular ecology 2010;19 Suppl 1;240-54

  • The genomic sequence and analysis of the swine major histocompatibility complex.

    Renard C, Hart E, Sehra H, Beasley H, Coggill P, Howe K, Harrow J, Gilbert J, Sims S, Rogers J, Ando A, Shigenari A, Shiina T, Inoko H, Chardon P and Beck S

    LREG INRA CEA, Jouy en Josas, France.

    We describe the generation and analysis of an integrated sequence map of a 2.4-Mb region of pig chromosome 7, comprising the classical class I region, the extended and classical class II regions, and the class III region of the major histocompatibility complex (MHC), also known as swine leukocyte antigen (SLA) complex. We have identified and manually annotated 151 loci, of which 121 are known genes (predicted to be functional), 18 are pseudogenes, 8 are novel CDS loci, 3 are novel transcripts, and 1 is a putative gene. Nearly all of these loci have homologues in other mammalian genomes but orthologues could be identified with confidence for only 123 genes. The 28 genes (including all the SLA class I genes) for which unambiguous orthology to genes within the human reference MHC could not be established are of particular interest with respect to porcine-specific MHC function and evolution. We have compared the porcine MHC to other mammalian MHC regions and identified the differences between them. In comparison to the human MHC, the main differences include the absence of HLA-A and other class I-like loci, the absence of HLA-DP-like loci, and the separation of the extended and classical class II regions from the rest of the MHC by insertion of the centromere. We show that the centromere insertion has occurred within a cluster of BTNL genes located at the boundary of the class II and III regions, which might have resulted in the loss of an orthologue to human C6orf10 from this region.

    Funded by: Wellcome Trust

    Genomics 2006;88;1;96-110

  • The DNA sequence and biological annotation of human chromosome 1.

    Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CC, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RI, Aubin K, Babbage AK, Bagguley CL, Bailey J, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Buckley D, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dunn M, Earthrowl M, Ellington AG, Errington H, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MR, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ES, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NS, McLaren S, Milne S, Mistry S, Moore MJ, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall M, Wallis JM, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR, Banerjee R, Bryant SP, Burford DC, Burrill WD, Clegg SM, Dhami P, Dovey O, Faulkner LM, Gribble SM, Langford CF, Pandian RD, Porter KM and Prigmore E

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. sgregory@chg.duhs.duke.edu

    The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.

    Funded by: Medical Research Council: G0000107; Wellcome Trust

    Nature 2006;441;7091;315-21

  • The DNA sequence and analysis of human chromosome 13.

    Dunham A, Matthews LH, Burton J, Ashurst JL, Howe KL, Ashcroft KJ, Beare DM, Burford DC, Hunt SE, Griffiths-Jones S, Jones MC, Keenan SJ, Oliver K, Scott CE, Ainscough R, Almeida JP, Ambrose KD, Andrews DT, Ashwell RI, Babbage AK, Bagguley CL, Bailey J, Bannerjee R, Barlow KF, Bates K, Beasley H, Bird CP, Bray-Allen S, Brown AJ, Brown JY, Burrill W, Carder C, Carter NP, Chapman JC, Clamp ME, Clark SY, Clarke G, Clee CM, Clegg SC, Cobley V, Collins JE, Corby N, Coville GJ, Deloukas P, Dhami P, Dunham I, Dunn M, Earthrowl ME, Ellington AG, Faulkner L, Frankish AG, Frankland J, French L, Garner P, Garnett J, Gilbert JG, Gilson CJ, Ghori J, Grafham DV, Gribble SM, Griffiths C, Hall RE, Hammond S, Harley JL, Hart EA, Heath PD, Howden PJ, Huckle EJ, Hunt PJ, Hunt AR, Johnson C, Johnson D, Kay M, Kimberley AM, King A, Laird GK, Langford CJ, Lawlor S, Leongamornlert DA, Lloyd DM, Lloyd C, Loveland JE, Lovell J, Martin S, Mashreghi-Mohammadi M, McLaren SJ, McMurray A, Milne S, Moore MJ, Nickerson T, Palmer SA, Pearce AV, Peck AI, Pelan S, Phillimore B, Porter KM, Rice CM, Searle S, Sehra HK, Shownkeen R, Skuce CD, Smith M, Steward CA, Sycamore N, Tester J, Thomas DW, Tracey A, Tromans A, Tubby B, Wall M, Wallis JM, West AP, Whitehead SL, Willey DL, Wilming L, Wray PW, Wright MW, Young L, Coulson A, Durbin R, Hubbard T, Sulston JE, Beck S, Bentley DR, Rogers J and Ross MT

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. ad1@sanger.ac.uk

    Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.

    Nature 2004;428;6982;522-8

Hayley Bennett

hb6@sanger.ac.uk Postdoctoral Fellow

My academic studies started with a degree in Neuroscience from Cardiff University. After working in the Biotech industry for two years, I began my PhD at the University of Bath. My PhD project focused on the neurobiology of nematodes; in particular the characterisation of novel drug targets. Halfway through my PhD I moved with my lab to work at the University of Georgia, USA. Here I enjoyed exposure to a rich diversity of parasite research, and decided to continue to work in this field.

I joined the parasite genomics group at the Wellcome Trust Sanger Institute in June 2012.

Research

My role is focused on using cutting-edge sequencing technology to understand parasitic worms.

Current projects include:

-Sequencing from small input amounts of DNA or RNA

-Sequencing from unusual, rare or clinical samples

-Epigenetic control of transcription and expression

References

  • Microbial genomes as cheat sheets.

    Bennett HM

    Nature reviews. Microbiology 2013;11;5;302

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

  • ACR-26: a novel nicotinic receptor subunit of parasitic nematodes.

    Bennett HM, Williamson SM, Walsh TK, Woods DJ and Wolstenholme AJ

    Department of Infectious Diseases and Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA.

    Nematode nicotinic acetylcholine receptors are the targets for many effective anthelmintics, including those recently introduced into the market. We have identified a novel nicotinic receptor subunit sequence, acr-26, that is expressed in all the animal parasitic nematodes we examined from clades III, IV and V, but is not present in the genomes of Trichinella spiralis, Caenorhabditis elegans, Pristionchus pacificus and Meloidogyne spp. In Ascaris suum, ACR-26 is expressed on muscle cells isolated from the head, but not from the mid-body region. Sequence comparisons with other vertebrate and nematode subunits suggested that ACR-26 may be capable of forming a functional homomeric receptor; when acr-26 cRNA was injected into Xenopus oocytes along with Xenopus laevis ric-3 cRNA we occasionally observed the formation of acetylcholine- and nicotine-sensitive channels. The unreliable expression of ACR-26 in vitro may suggest that additional subunits or chaperones may be required for efficient formation of the functional receptors. ACR-26 may represent a novel target for the development of cholinergic anthelmintics specific for animal parasites.

    Funded by: Biotechnology and Biological Sciences Research Council

    Molecular and biochemical parasitology 2012;183;2;151-7

Lia Chappell

lc5@sanger.ac.uk Postdoctoral Fellow

I'm currently pursuing a short Post Doc project here at the Sanger Institute after completing a four year Wellcome Trust PhD studentship in Sept 2013. I'm jointed supervised by Matt Berriman and Julian Rayner, and interact with researchers in the Parasite Genomics and Malaria programmes.

Before coming to Sanger I completed a Masters degree and a Bachelor's degree at the University of Cambridge, specialising in Biochemistry.

Research

In my current project I'm studying gene expression in malaria parasites using high-throughput DNA sequencing technologies (RNA-seq), using a novel directional, amplification-free protocol that I developed in my PhD project. We hope eventually to produce a near-complete catalogue of parasite gene expression that will be helpful for scientists working in malaria biology.

References

  • Found in translation.

    Chappell L

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Nature reviews. Microbiology 2014;12;4;238

  • Phosphoinositide metabolism links cGMP-dependent protein kinase G to essential Ca²⁺ signals at key decision points in the life cycle of malaria parasites.

    Brochet M, Collins MO, Smith TK, Thompson E, Sebastian S, Volkmann K, Schwach F, Chappell L, Gomes AR, Berriman M, Rayner JC, Baker DA, Choudhary J and Billker O

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Many critical events in the Plasmodium life cycle rely on the controlled release of Ca²⁺ from intracellular stores to activate stage-specific Ca²⁺-dependent protein kinases. Using the motility of Plasmodium berghei ookinetes as a signalling paradigm, we show that the cyclic guanosine monophosphate (cGMP)-dependent protein kinase, PKG, maintains the elevated level of cytosolic Ca²⁺ required for gliding motility. We find that the same PKG-dependent pathway operates upstream of the Ca²⁺ signals that mediate activation of P. berghei gametocytes in the mosquito and egress of Plasmodium falciparum merozoites from infected human erythrocytes. Perturbations of PKG signalling in gliding ookinetes have a marked impact on the phosphoproteome, with a significant enrichment of in vivo regulated sites in multiple pathways including vesicular trafficking and phosphoinositide metabolism. A global analysis of cellular phospholipids demonstrates that in gliding ookinetes PKG controls phosphoinositide biosynthesis, possibly through the subcellular localisation or activity of lipid kinases. Similarly, phosphoinositide metabolism links PKG to egress of P. falciparum merozoites, where inhibition of PKG blocks hydrolysis of phosphatidylinostitol (4,5)-bisphosphate. In the face of an increasing complexity of signalling through multiple Ca²⁺ effectors, PKG emerges as a unifying factor to control multiple cellular Ca²⁺ signals essential for malaria parasite development and transmission.

    Funded by: Medical Research Council: G0501670, G10000779; Wellcome Trust: 079643/Z/06/Z, WT093228, WT094752, WT098051

    PLoS biology 2014;12;3;e1001806

  • Vector transmission regulates immune control of Plasmodium virulence.

    Spence PJ, Jarra W, Lévy P, Reid AJ, Chappell L, Brugat T, Sanders M, Berriman M and Langhorne J

    Division of Parasitology, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK.

    Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents, primates or humans increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission, which is assumed to 'reset' Plasmodium to its original character. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we use mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi to interrogate regulation of parasite virulence. Analysis of SBP P. c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn modifies the elicited mammalian immune response, which in turn attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.

    Funded by: Medical Research Council: MC_U117584248, U.1175.02.004.00004(60507), U117584248; Wellcome Trust: 085775, 089553, 098051

    Nature 2013;498;7453;228-31

  • Finding a needle in a haystack. Microbial metatranscriptomes.

    Chappell L

    This month's Genome Watch highlights some of the technical challenges that need to be overcome to gain further insight into microbial metatranscriptomes.

    Nature reviews. Microbiology 2012;10;7;446

  • Expressions of individuality.

    Chappell L

    Nature reviews. Microbiology 2011;9;10;701

Avril Coghlan

alc@sanger.ac.uk Senior Bioinformatician

I studied genetics at Trinity College Dublin, then did a PhD on molecular evolution of nematode genomes with Ken Wolfe at Trinity College Dublin, followed by post-docs with Des Higgins in University College Dublin and with Richard Durbin at the Sanger Institute, Cambridge, working on various topics in phylogenetics, molecular evolution and gene-finding. I was subsequently a lecturer in bioinformatics in University College Cork for four years before joining the parasite genomics group at the Sanger Institute in 2012.

Research

At the Sanger Institute, I'm involved in projects across a range of parasitic species, including parasitic nematodes and schistosomes.

References

  • Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts.

    Forde BM, Neville BA, O'Donnell MM, Riboulet-Bisson E, Claesson MJ, Coghlan A, Ross RP and O'Toole PW

    Department Microbiology, University College Cork, Ireland. pwotoole@ucc.ie

    Background: The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade.

    Results: The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group.

    Conclusions: The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host interactions of these motile intestinal lactobacilli.

    Microbial cell factories 2011;10 Suppl 1;S13

  • The genome of the blood fluke Schistosoma mansoni.

    Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG and El-Sayed NM

    Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk

    Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

    Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z

    Nature 2009;460;7253;352-8

  • TreeFam: 2008 Update.

    Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Hériché JK, Hu Y, Kristiansen K, Li R, Liu T, Moses A, Qin J, Vang S, Vilella AJ, Ureta-Vidal A, Bolund L, Wang J and Durbin R

    Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.

    TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14,351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.

    Funded by: Wellcome Trust

    Nucleic acids research 2008;36;Database issue;D735-40

  • nGASP--the nematode genome annotation assessment project.

    Coghlan A, Fiedler TJ, McKay SJ, Flicek P, Harris TW, Blasiar D, nGASP Consortium and Stein LD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. alc@sanger.ac.uk

    Background: While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase.

    Results: The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders.

    Conclusion: This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.

    Funded by: NHGRI NIH HHS: P41 HG02223; Wellcome Trust

    BMC bioinformatics 2008;9;549

  • Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.

    Coghlan A and Durbin R

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. alc@sanger.ac.uk

    Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.

    Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.

    Availability: Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix

    Funded by: Wellcome Trust: 077192

    Bioinformatics (Oxford, England) 2007;23;12;1468-75

  • TreeFam: a curated database of phylogenetic trees of animal gene families.

    Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J and Durbin R

    Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.

    TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; approximately 40-85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at http://www.treefam.org and http://treefam.genomics.org.cn.

    Funded by: Wellcome Trust

    Nucleic acids research 2006;34;Database issue;D572-80

  • Chromosome evolution in eukaryotes: a multi-kingdom perspective.

    Coghlan A, Eichler EE, Oliver SG, Paterson AH and Stein L

    Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland.

    In eukaryotes, chromosomal rearrangements, such as inversions, translocations and duplications, are common and range from part of a gene to hundreds of genes. Lineage-specific patterns are also seen: translocations are rare in dipteran flies, and angiosperm genomes seem prone to polyploidization. In most eukaryotes, there is a strong association between rearrangement breakpoints and repeat sequences. Current data suggest that some repeats promoted rearrangements via non-allelic homologous recombination, for others the association might not be causal but reflects the instability of particular genomic regions. Rearrangement polymorphisms in eukaryotes are correlated with phenotypic differences, so are thought to confer varying fitness in different habitats. Some seem to be under positive selection because they either trap favorable allele combinations together or alter the expression of nearby genes. There is little evidence that chromosomal rearrangements cause speciation, but they probably intensify reproductive isolation between species that have formed by another route.

    Funded by: NHGRI NIH HHS: HG02639; NIGMS NIH HHS: GM58815; Wellcome Trust

    Trends in genetics : TIG 2005;21;12;673-82

  • Origins of recently gained introns in Caenorhabditis.

    Coghlan A and Wolfe KH

    Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.

    The genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae both contain approximately 100,000 introns, of which >6,000 are unique to one or the other species. To study the origins of new introns, we used a conservative method involving phylogenetic comparisons to animal orthologs and nematode paralogs to identify cases where an intron content difference between C. elegans and C. briggsae was caused by intron insertion rather than deletion. We identified 81 recently gained introns in C. elegans and 41 in C. briggsae. Novel introns have a stronger exon splice site consensus sequence than the general population of introns and show the same preference for phase 0 sites in codons over phases 1 and 2. More of the novel introns are inserted in genes that are expressed in the C. elegans germ line than expected by chance. Thirteen of the 122 gained introns are in genes whose protein products function in premRNA processing, including three gains in the gene for spliceosomal protein SF3B1 and two in the nonsense-mediated decay gene smg-2. Twenty-eight novel introns have significant DNA sequence identity to other introns, including three that are similar to other introns in the same gene. All of these similarities involve minisatellites or palindromes in the intron sequences. Our results suggest that at least some of the intron gains were caused by reverse splicing of a preexisting intron.

    Proceedings of the National Academy of Sciences of the United States of America 2004;101;31;11362-7

  • The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.

    Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D'Eustachio P, Fitch DH, Fulton LA, Fulton RE, Griffiths-Jones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R and Waterston RH

    Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.. lstein@cshl.org

    The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.

    Funded by: NHGRI NIH HHS: 5P01 HG00956, 5U01 HG02042, P41 HG02223; NIGMS NIH HHS: R01 GM42432, T32 GM07754-22

    PLoS biology 2003;1;2;E45

  • Fourfold faster rate of genome rearrangement in nematodes than in Drosophila.

    Coghlan A and Wolfe KH

    Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.

    We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that the species diverged 50-120 million years ago, and that since then there have been 4030 rearrangements between their whole genomes. Our estimate of the rearrangement rate, 0.4-1.0 chromosomal breakages/Mb per Myr, is at least four times that of Drosophila, which was previously reported to be the fastest rate among eukaryotes. The breakpoints of translocations are strongly associated with dispersed repeats and gene family members in the C. elegans genome.

    Genome research 2002;12;6;857-67

James Cotton

jc17@sanger.ac.uk Senior Staff Scientist

I studied biology at Oxford, and then did a PhD on gene family evolution with Rod Page at the University of Glasgow, followed by post-docs at the Natural History Museum in London and at the National University of Ireland, Maynooth, working on various topics in phylogenetics and molecular evolution. I was subsequently an RCUK Fellow at Queen Mary, University of London for three years before joining the parasite genomics group in 2010.

Research

At the Sanger Institute, I'm involved in a range of projects across a diverse array of parasitic species, including nematodes, schistosomes and kinetoplastids. I play a leading role in a number of de-novo genome sequencing projects, but particularly focus on projects with a strong comparative or population genomics component.

References

  • Genome-wide signatures of convergent evolution in echolocating mammals.

    Parker J, Tsagkogeorga G, Cotton JA, Liu Y, Provero P, Stupka E and Rossiter SJ

    School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, UK. j.d.parker@qmul.ac.uk

    Evolution is typically thought to proceed through divergence of genes, proteins and ultimately phenotypes. However, similar traits might also evolve convergently in unrelated taxa owing to similar selection pressures. Adaptive phenotypic convergence is widespread in nature, and recent results from several genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level. Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution, although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show that convergence is not a rare process restricted to several loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four newly sequenced bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the bottlenose dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Unexpectedly, we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognized.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/H017178/1

    Nature 2013;502;7470;228-31

  • Characterization and comparative analysis of the complete Haemonchus contortus β-tubulin gene family and implications for benzimidazole resistance in strongylid nematodes.

    Saunders GI, Wasmuth JD, Beech R, Laing R, Hunt M, Naghra H, Cotton JA, Berriman M, Britton C and Gilleard JS

    Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, 464 Bearsden Road, Glasgow, Scotland G61 1QH, UK.

    Parasitic nematode β-tubulin genes are of particular interest because they are the targets of benzimidazole drugs. However, in spite of this, the full β-tubulin gene family has not been characterized for any parasitic nematode to date. Haemonchus contortus is the parasite species for which we understand benzimidazole resistance the best and its close phylogenetic relationship with Caenorhabditis elegans potentially allows inferences of gene function by comparative analysis. Consequently, we have characterized the full β-tubulin gene family in H. contortus. Further to the previously identified Hco-tbb-iso-1 and Hco-tbb-iso-2 genes, we have characterized two additional family members designated Hco-tbb-iso-3 and Hco-tbb-iso-4. We show that Hco-tbb-iso-1 is not a one-to-one orthologue with Cel-ben-1, the only β-tubulin gene in C. elegans that is a benzimidazole drug target. Instead, both Hco-tbb-iso-1 and Hco-tbb-iso-2 have a complex evolutionary relationship with three C. elegans β-tubulin genes: Cel-ben-1, Cel-tbb-1 and Cel-tbb-2. Furthermore, we show that both Hco-tbb-iso-1 and Hco-tbb-iso-2 are highly expressed in adult worms; in contrast, Hco-tbb-iso-3 and Hco-tbb-iso-4 are expressed only at very low levels and are orthologous to the Cel-mec-7 and Cel-tbb-4 genes, respectively, suggesting that they have specialized functional roles. Indeed, we have found that the expression pattern of Hco-tbb-iso-3 in H. contortus is identical to that of Cel-mec-7 in C. elegans, being expressed in just six "touch receptor" mechano-sensory neurons. These results suggest that further investigation is warranted into the potential involvement of strongylid isotype-2 β-tubulin genes in mechanisms of benzimidazole resistance.

    Funded by: Canadian Institutes of Health Research: 230937; Wellcome Trust: WT098051

    International journal for parasitology 2013;43;6;465-75

  • The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.

    Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS and Cotton JA

    Background: The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most economically important parasites of livestock and humans.

    Results: Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug target families, providing the most comprehensive understanding yet of the action of several important anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut that provide a rich source of vaccine and drug target candidates.

    Conclusions: The H. contortus genome and transcriptome provide an essential platform for postgenomic research in this and other important strongylid parasites.

    Funded by: Biotechnology and Biological Sciences Research Council; Canadian Institutes of Health Research: 230927; Wellcome Trust: 067811, 098051

    Genome biology 2013;14;8;R88

  • New approaches for unravelling reassortment pathways.

    Svinti V, Cotton JA and McInerney JO

    Department of Biology, National University of Ireland at Maynooth, Maynooth, Co Kildare, Ireland.

    Background: Every year the human population encounters epidemic outbreaks of influenza, and history reveals recurring pandemics that have had devastating consequences. The current work focuses on the development of a robust algorithm for detecting influenza strains that have a composite genomic architecture. These influenza subtypes can be generated through a reassortment process, whereby a virus can inherit gene segments from two different types of influenza particles during replication. Reassortant strains are often not immediately recognised by the adaptive immune system of the hosts and hence may be the source of pandemic outbreaks. Owing to their importance in public health and their infectious ability, it is essential to identify reassortant influenza strains in order to understand the evolution of this virus and describe reassortment pathways that may be biased towards particular viral segments. Phylogenetic methods have been used traditionally to identify reassortant viruses. In many studies up to now, the assumption has been that if two phylogenetic trees differ, it is because reassortment has caused them to be different. While phylogenetic incongruence may be caused by real differences in evolutionary history, it can also be the result of phylogenetic error. Therefore, we wish to develop a method for distinguishing between topological inconsistency that is due to confounding effects and topological inconsistency that is due to reassortment.

    Results: The current work describes the implementation of two approaches for robustly identifying reassortment events. The algorithms rest on the idea of significance of difference between phylogenetic trees or phylogenetic tree sets, and subtree pruning and regrafting operations, which mimic the effect of reassortment on tree topologies. The first method is based on a maximum likelihood (ML) framework (MLreassort) and the second implements a Bayesian approach (Breassort) for reassortment detection. We focus on reassortment events that are found by both methods. We test both methods on a simulated dataset and on a small collection of real viral data isolated in Hong Kong in 1999.

    Conclusions: The nature of segmented viral genomes present many challenges with respect to disease. The algorithms developed here can effectively identify reassortment events in small viral datasets and can be applied not only to influenza but also to other segmented viruses. Owing to computational demands of comparing tree topologies, further development in this area is necessary to allow their application to larger datasets.

    BMC evolutionary biology 2013;13;1

  • Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.

    Reid AJ, Vermont SJ, Cotton JA, Harris D, Hill-Cawthorne GA, Könen-Waisman S, Latham SM, Mourier T, Norton R, Quail MA, Sanders M, Shanmugam D, Sohal A, Wasmuth JD, Brunk B, Grigg ME, Howard JC, Parkinson J, Roos DS, Trees AJ, Berriman M, Pain A and Wastling JM

    Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.

    Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z

    PLoS pathogens 2012;8;3;e1002567

  • Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.

    Downing T, Imamura H, Decuypere S, Clark TG, Coombs GH, Cotton JA, Hilley JD, de Doncker S, Maes I, Mottram JC, Quail MA, Rijal S, Sanders M, Schönian G, Stark O, Sundar S, Vanaerschot M, Hertz-Fowler C, Dujardin JC and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.

    Funded by: Wellcome Trust: 076355, 085775/Z/08/Z

    Genome research 2011;21;12;2143-56

  • Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.

    Kikuchi T, Cotton JA, Dalzell JJ, Hasegawa K, Kanzaki N, McVeigh P, Takanashi T, Tsai IJ, Assefa SA, Cock PJ, Otto TD, Hunt M, Reid AJ, Sanchez-Flores A, Tsuchihara K, Yokoi T, Larsson MC, Miwa J, Maule AG, Sahashi N, Jones JT and Berriman M

    Forestry and Forest Products Research Institute, Tsukuba, Japan. kikuchit@affrc.go.jp

    Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    PLoS pathogens 2011;7;9;e1002219

  • Cetaceans on a molecular fast track to ultrasonic hearing.

    Liu Y, Rossiter SJ, Han X, Cotton JA and Zhang S

    School of Life Sciences, East China Normal University, Shanghai, China.

    The early radiation of cetaceans coincides with the origin of their defining ecological and sensory differences [1, 2]. Toothed whales (Odontoceti) evolved echolocation for hunting 36-34 million years ago, whereas baleen whales (Mysticeti) evolved filter feeding and do not echolocate [2]. Echolocation in toothed whales demands exceptional high-frequency hearing [3], and both echolocation and ultrasonic hearing have also evolved independently in bats [4, 5]. The motor protein Prestin that drives the electromotility of the outer hair cells (OHCs) is likely to be especially important in ultrasonic hearing, because it is the vibratory response of OHC to incoming sound waves that confers the enhanced sensitivity and selectivity of the mammalian auditory system [6, 7]. Prestin underwent adaptive change early in mammal evolution [8] and also shows sequence convergence between bats and dolphins [9, 10], as well as within bats [11]. Focusing on whales, we show for the first time that the extent of protein evolution in Prestin can be linked directly to the evolution of high-frequency hearing. Moreover, we find that independent cases of sequence convergence in mammals have involved numerous identical amino acid site replacements. Our findings shed new light on the importance of Prestin in the evolution of mammalian hearing.

    Current biology : CB 2010;20;20;1834-9

  • Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function.

    Cotton JA and McInerney JO

    Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland.

    The traditional tree of life shows eukaryotes as a distinct lineage of living things, but many studies have suggested that the first eukaryotic cells were chimeric, descended from both Eubacteria (through the mitochondrion) and Archaebacteria. Eukaryote nuclei thus contain genes of both eubacterial and archaebacterial origins, and these genes have different functions within eukaryotic cells. Here we report that archaebacterium-derived genes are significantly more likely to be essential to yeast viability, are more highly expressed, and are significantly more highly connected and more central in the yeast protein interaction network. These findings hold irrespective of whether the genes have an informational or operational function, so that many features of eukaryotic genes with prokaryotic homologs can be explained by their origin, rather than their function. Taken together, our results show that genes of archaebacterial origin are in some senses more important to yeast metabolism than genes of eubacterial origin. This importance reflects these genes' origin as the ancestral nuclear component of the eukaryotic genome.

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;40;17252-5

  • Experimental design in caecilian systematics: phylogenetic information of mitochondrial genomes and nuclear rag1.

    San Mauro D, Gower DJ, Massingham T, Wilkinson M, Zardoya R and Cotton JA

    Department of Zoology, The Natural History Museum, Cromwell Road, London SW7 5BD, UK. d.san-mauro@nhm.ac.uk

    In molecular phylogenetic studies, a major aspect of experimental design concerns the choice of markers and taxa. Although previous studies have investigated the phylogenetic performance of different genes and the effectiveness of increasing taxon sampling, their conclusions are partly contradictory, probably because they are highly context specific and dependent on the group of organisms used in each study. Goldman introduced a method for experimental design in phylogenetics based on the expected information to be gained that has barely been used in practice. Here we use this method to explore the phylogenetic utility of mitochondrial (mt) genes, mt genomes, and nuclear rag1 for studies of the systematics of caecilian amphibians, as well as the effect of taxon addition on the stabilization of a controversial branch of the tree. Overall phylogenetic information estimates per gene, specific estimates per branch of the tree, estimates for combined (mitogenomic) data sets, and estimates as a hypothetical new taxon is added to different parts of the caecilian tree are calculated and compared. In general, the most informative data sets are those for mt transfer and ribosomal RNA genes. Our results also show at which positions in the caecilian tree the addition of taxa have the greatest potential to increase phylogenetic information with respect to the controversial relationships of Scolecomorphus, Boulengerula, and all other teresomatan caecilians. These positions are, as intuitively expected, mostly (but not all) adjacent to the controversial branch. Generating whole mitogenomic and rag1 data for additional taxa joining the Scolecomorphus branch may be a more efficient strategy than sequencing a similar amount of additional nucleotides spread across the current caecilian taxon sampling. The methodology employed in this study allows an a priori evaluation and testable predictions of the appropriateness of particular experimental designs to solve specific questions at different levels of the caecilian phylogeny.

    Systematic biology 2009;58;4;425-38

Bernardo Foth

bf3@sanger.ac.uk Senior Staff Scientist

I studied biology at the University of Erlangen in Germany, followed by PhD work on the relic plastid of malaria parasites in Melbourne with Geoff McFadden. I then carried out postdoctoral research in the labs of Dominique Soldati (on myosins and Toxoplasma gondii cell biology) and Zbynek Bozdech (on quantitative transcript-protein relationships in malaria parasites). I joined the Parasite Genomics group in December 2010.

Research

I am currently involved in a number of functional genomics-related projects ranging from investigating the genetic basis of drug-resistance in African trypanosomes to differential gene expression in the parasitic nematode Trichuris muris. I am also leading the group's renewed efforts to produce the de novo genome sequence of the avian malaria parasite Plasmodium gallinaceum.

References

  • Quantitative time-course profiling of parasite and host cell proteins in the human malaria parasite Plasmodium falciparum.

    Foth BJ, Zhang N, Chaal BK, Sze SK, Preiser PR and Bozdech Z

    School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551.

    Studies of the Plasmodium falciparum transcriptome have shown that the tightly controlled progression of the parasite through the intra-erythrocytic developmental cycle (IDC) is accompanied by a continuous gene expression cascade in which most expressed genes exhibit a single transcriptional peak. Because the biochemical and cellular functions of most genes are mediated by the encoded proteins, understanding the relationship between mRNA and protein levels is crucial for inferring biological activity from transcriptional gene expression data. Although studies on other organisms show that <50% of protein abundance variation may be attributable to corresponding mRNA levels, the situation in Plasmodium is further complicated by the dynamic nature of the cyclic gene expression cascade. In this study, we simultaneously determined mRNA and protein abundance profiles for P. falciparum parasites during the IDC at 2-hour resolution based on oligonucleotide microarrays and two-dimensional differential gel electrophoresis protein gels. We find that most proteins are represented by more than one isoform, presumably because of post-translational modifications. Like transcripts, most proteins exhibit cyclic abundance profiles with one peak during the IDC, whereas the presence of functionally related proteins is highly correlated. In contrast, the abundance of most parasite proteins peaks significantly later (median 11 h) than the corresponding transcripts and often decreases slowly in the second half of the IDC. Computational modeling indicates that the considerable and varied incongruence between transcript and protein abundance may largely be caused by the dynamics of translation and protein degradation. Furthermore, we present cyclic abundance profiles also for parasite-associated human proteins and confirm the presence of five human proteins with a potential role in antioxidant defense within the parasites. Together, our data provide fundamental insights into transcript-protein relationships in P. falciparum that are important for the correct interpretation of transcriptional data and that may facilitate the improvement and development of malaria diagnostics and drug therapy.

    Molecular & cellular proteomics : MCP 2011;10;8;M110.006411

  • Mitochondrial translation in absence of local tRNA aminoacylation and methionyl tRNA Met formylation in Apicomplexa.

    Pino P, Aeby E, Foth BJ, Sheiner L, Soldati T, Schneider A and Soldati-Favre D

    Department of Microbiology and Molecular Medicine, CMU, University of Geneva, 1 rue Michel-Servet, 1211 Geneva 4, Switzerland.

    Apicomplexans possess three translationally active compartments: the cytosol, a single tubular mitochondrion, and a vestigial plastid organelle called apicoplast. Mitochondrion and apicoplast are of bacterial evolutionary origin and therefore depend on a bacterial-like translation machinery. The minimal mitochondrial genome contains only three ORFs, and in Toxoplasma gondii the absence of mitochondrial tRNA genes is compensated for by the import of cytosolic eukaryotic tRNAs. Although all compartments require a complete set of charged tRNAs, the apicomplexan nuclear genomes do not hold sufficient aminoacyl-tRNA synthetase (aaRSs) genes to be targeted individually to each compartment. This study reveals that aaRSs are either cytosolic, apicoplastic or shared between the two compartments by dual targeting but are absent from the mitochondrion. Consequently, tRNAs are very likely imported in their aminoacylated form. Furthermore, the unexpected absence of tRNA(Met) formyltransferase and peptide deformylase implies that the requirement for a specialized formylmethionyl-tRNA(Met) for translation initiation is bypassed in the mitochondrion of Apicomplexa.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust

    Molecular microbiology 2010;76;3;706-18

  • Evolution of malaria parasite plastid targeting sequences.

    Tonkin CJ, Foth BJ, Ralph SA, Struck N, Cowman AF and McFadden GI

    School of Botany, University of Melbourne, Melbourne, Victoria 3010, Australia.

    The transfer of genes from an endosymbiont to its host typically requires acquisition of targeting signals by the gene product to ensure its return to the endosymbiont for function. Many hundreds of plastid-derived genes must have acquired transit peptides for successful relocation to the nucleus. Here, we explore potential evolutionary origins of plastid transit peptides in the malaria parasite Plasmodium falciparum. We show that exons of the P. falciparum genome could serve as transit peptides after exon shuffling. We further demonstrate that numerous randomized peptides and even whimsical sequences based on English words can also function as transit peptides in vivo. Thus, facile acquisition of transit peptides from existing sequence likely expedited endosymbiont integration through intracellular gene transfer.

    Funded by: Howard Hughes Medical Institute

    Proceedings of the National Academy of Sciences of the United States of America 2008;105;12;4781-5

  • Quantitative protein expression profiling reveals extensive post-transcriptional regulation and post-translational modifications in schizont-stage malaria parasites.

    Foth BJ, Zhang N, Mok S, Preiser PR and Bozdech Z

    School of Biological Sciences, Nanyang Technological University, Nanyang Drive, 637551 Singapore. BFoth@ntu.edu.sg

    Background: Malaria is a one of the most important infectious diseases and is caused by parasitic protozoa of the genus Plasmodium. Previously, quantitative characterization of the P. falciparum transcriptome demonstrated that the strictly controlled progression of these parasites through their intra-erythrocytic developmental cycle is accompanied by a continuous cascade of gene expression. Although such analyses have proven immensely useful, the correlations between abundance of transcripts and their cognate proteins remain poorly characterized.

    Results: Here, we present a quantitative time-course analysis of relative protein abundance for schizont-stage parasites (34 to 46 hours after invasion) based on two-dimensional differential gel electrophoresis of protein samples labeled with fluorescent dyes. For this purpose we analyzed parasite samples taken at 4-hour intervals from a tightly synchronized culture and established more than 500 individual protein abundance profiles with high temporal resolution and quantitative reproducibility. Approximately half of all profiles exhibit a significant change in abundance and 12% display an expression peak during the observed 12-hour time interval. Intriguingly, identification of 54 protein spots by mass spectrometry revealed that 58% of the corresponding proteins--including actin-I, enolase, eukaryotic initiation factor (eIF)4A, eIF5A, and several heat shock proteins--are represented by more than one isoform, presumably caused by post-translational modifications, with the various isoforms of a given protein frequently showing different expression patterns. Furthermore, comparisons with transcriptome data generated from the same parasite samples reveal evidence of significant post-transcriptional gene expression regulation.

    Conclusions: Together, our data indicate that both post-transcriptional and post-translational events are widespread and of presumably great biological significance during the intra-erythrocytic development of P. falciparum.

    Genome biology 2008;9;12;R177

  • Dual targeting of antioxidant and metabolic enzymes to the mitochondrion and the apicoplast of Toxoplasma gondii.

    Pino P, Foth BJ, Kwok LY, Sheiner L, Schepers R, Soldati T and Soldati-Favre D

    Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland.

    Toxoplasma gondii is an aerobic protozoan parasite that possesses mitochondrial antioxidant enzymes to safely dispose of oxygen radicals generated by cellular respiration and metabolism. As with most Apicomplexans, it also harbors a chloroplast-like organelle, the apicoplast, which hosts various biosynthetic pathways and requires antioxidant protection. Most apicoplast-resident proteins are encoded in the nuclear genome and are targeted to the organelle via a bipartite N-terminal targeting sequence. We show here that two antioxidant enzymes-a superoxide dismutase (TgSOD2) and a thioredoxin-dependent peroxidase (TgTPX1/2)-and an aconitase are dually targeted to both the apicoplast and the mitochondrion of T. gondii. In the case of TgSOD2, our results indicate that a single gene product is bimodally targeted due to an inconspicuous variation within the putative signal peptide of the organellar protein, which significantly alters its subcellular localization. Dual organellar targeting of proteins might occur frequently in Apicomplexans to serve important biological functions such as antioxidant protection and carbon metabolism.

    Funded by: Wellcome Trust

    PLoS pathogens 2007;3;8;e115

  • New insights into myosin evolution and classification.

    Foth BJ, Goedecke MC and Soldati D

    Department of Microbiology and Molecular Medicine, Centre Médical Universitaire, University of Geneva, 1 Rue Michel-Servet, 1211 Geneva, Switzerland. bernardo.foth@medecine.unige.ch

    Myosins are eukaryotic actin-dependent molecular motors important for a broad range of functions like muscle contraction, vision, hearing, cell motility, and host cell invasion of apicomplexan parasites. Myosin heavy chains consist of distinct head, neck, and tail domains and have previously been categorized into 18 different classes based on phylogenetic analysis of their conserved heads. Here we describe a comprehensive phylogenetic examination of many previously unclassified myosins, with particular emphasis on sequences from apicomplexan and other chromalveolate protists including the model organism Toxoplasma, the malaria parasite Plasmodium, and the ciliate Tetrahymena. Using different phylogenetic inference methods and taking protein domain architectures, specific amino acid polymorphisms, and organismal distribution into account, we demonstrate a hitherto unrecognized common origin for ciliate and apicomplexan class XIV myosins. Our data also suggest common origins for some apicomplexan myosins and class VI, for classes II and XVIII, for classes XII and XV, and for some microsporidian myosins and class V, thereby reconciling evolutionary history and myosin structure in several cases and corroborating the common coevolution of myosin head, neck, and tail domains. Six novel myosin classes are established to accommodate sequences from chordate metazoans (class XIX), insects (class XX), kinetoplastids (class XXI), and apicomplexans and diatom algae (classes XXII, XXIII, and XXIV). These myosin (sub)classes include sequences with protein domains (FYVE, WW, UBA, ATS1-like, and WD40) previously unknown to be associated with myosin motors. Regarding the apicomplexan "myosome," we significantly update class XIV classification, propose a systematic naming convention, and discuss possible functions in these parasites.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2006;103;10;3681-6

  • The malaria parasite Plasmodium falciparum has only one pyruvate dehydrogenase complex, which is located in the apicoplast.

    Foth BJ, Stimmler LM, Handman E, Crabb BS, Hodder AN and McFadden GI

    Plant Cell Biology Research Centre, School of Botany, University of Melbourne, Parkville, VIC 3010, Australia.

    The relict plastid (apicoplast) of apicomplexan parasites synthesizes fatty acids and is a promising drug target. In plant plastids, a pyruvate dehydrogenase complex (PDH) converts pyruvate into acetyl-CoA, the major fatty acid precursor, whereas a second, distinct PDH fuels the tricarboxylic acid cycle in the mitochondria. In contrast, the presence of genes encoding PDH and related enzyme complexes in the genomes of five Plasmodium species and of Toxoplasma gondii indicate that these parasites contain only one single PDH. PDH complexes are comprised of four subunits (E1alpha, E1beta, E2, E3), and we confirmed four genes encoding a complete PDH in Plasmodium falciparum through sequencing of cDNA clones. In apicomplexan parasites, many nuclear-encoded proteins are targeted to the apicoplast courtesy of two-part N-terminal leader sequences, and the presence of such N-terminal sequences on all four PDH subunits as well as phylogenetic analyses strongly suggest that the P. falciparum PDH is located in the apicoplast. Fusion of the two-part leader sequences from the E1alpha and E2 genes to green fluorescent protein experimentally confirmed apicoplast targeting. Western blot analysis provided evidence for the expression of the E1alpha and E1beta PDH subunits in blood-stage malaria parasites. The recombinantly expressed catalytic domain of the PDH subunit E2 showed high enzymatic activity in vitro indicating that pyruvate is converted to acetyl-CoA in the apicoplast, possibly for use in fatty acid biosynthesis.

    Molecular microbiology 2005;55;1;39-53

  • Tropical infectious diseases: metabolic maps and functions of the Plasmodium falciparum apicoplast.

    Ralph SA, van Dooren GG, Waller RF, Crawford MJ, Fraunholz MJ, Foth BJ, Tonkin CJ, Roos DS and McFadden GI

    Institut Pasteur, Biology of Host-Parasite Interactions, 25 Rue du Docteur Roux, 75724, Paris, Cedex 15, France.

    Nature reviews. Microbiology 2004;2;3;203-16

  • Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum.

    Foth BJ, Ralph SA, Tonkin CJ, Struck NS, Fraunholz M, Roos DS, Cowman AF and McFadden GI

    Plant Cell Biology Research Centre, School of Botany, University of Melbourne, Parkville, VIC 3010, Australia.

    Transit peptides mediate protein targeting into plastids and are only poorly understood. We extracted amino acid features from transit peptides that target proteins to the relict plastid (apicoplast) of malaria parasites. Based on these amino acid characteristics, we identified 466 putative apicoplast proteins in the Plasmodium falciparum genome. Altering the specific charge characteristics in a model transit peptide by site-directed mutagenesis severely disrupted organellar targeting in vivo. Similarly, putative Hsp70 (DnaK) binding sites present in the transit peptide proved to be important for correct targeting.

    Science (New York, N.Y.) 2003;299;5607;705-8

  • Regulated degradation of an endoplasmic reticulum membrane protein in a tubular lysosome in Leishmania mexicana.

    Mullin KA, Foth BJ, Ilgoutz SC, Callaghan JM, Zawadzki JL, McFadden GI and McConville MJ

    Department of Biochemistry and Molecular Biology, The University of Melbourne, Victoria 3010, Australia.

    The cell surface of the human parasite Leishmania mexicana is coated with glycosylphosphatidylinositol (GPI)-anchored macromolecules and free GPI glycolipids. We have investigated the intracellular trafficking of green fluorescent protein- and hemagglutinin-tagged forms of dolichol-phosphate-mannose synthase (DPMS), a key enzyme in GPI biosynthesis in L. mexicana promastigotes. These functionally active chimeras are found in the same subcompartment of the endoplasmic reticulum (ER) as endogenous DPMS but are degraded as logarithmically growing promastigotes reach stationary phase, coincident with the down-regulation of endogenous DPMS activity and GPI biosynthesis in these cells. We provide evidence that these chimeras are constitutively transported to and degraded in a novel multivesicular tubule (MVT) lysosome. This organelle is a terminal lysosome, which is labeled with the endocytic marker FM 4-64, contains lysosomal cysteine and serine proteases and is disrupted by lysomorphotropic agents. Electron microscopy and subcellular fractionation studies suggest that the DPMS chimeras are transported from the ER to the lumen of the MVT via the Golgi apparatus and a population of 200-nm multivesicular bodies. In contrast, soluble ER proteins are not detectably transported to the MVT lysosome in either log or stationary phase promastigotes. Finally, the increased degradation of the DPMS chimeras in stationary phase promastigotes coincides with an increase in the lytic capacity of the MVT lysosome and changes in the morphology of this organelle. We conclude that lysosomal degradation of DPMS may be important in regulating the cellular levels of this enzyme and the stage-dependent biosynthesis of the major surface glycolipids of these parasites.

    Molecular biology of the cell 2001;12;8;2364-77

Tom Huckvale

- Advanced Research Assistant

I completed a BSc. in Biological & Medicinal Chemistry at the University of Exeter in 2008, and went on to finish an MSc. whilst working in a veterinary testing laboratory. I went on to work for a genomics services company in Berlin, and then in the food chemistry department of an analytical sciences firm in London before starting at Sanger in April 2011.

Research

At the Institute, I provide laboratory support to the Parasite Genomics group through the introduction of new and established methods of functional genomics across a range of parasitic species.

References

  • Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction.

    Foth BJ, Tsai IJ, Reid AJ, Bancroft AJ, Nichol S, Tracey A, Holroyd N, Cotton JA, Stanley EJ, Zarowiecki M, Liu JZ, Huckvale T, Cooper PJ, Grencis RK and Berriman M

    1] Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. [2].

    Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. We report here the whole-genome sequences of the human-infective Trichuris trichiura and the mouse laboratory model Trichuris muris. On the basis of whole-transcriptome analyses, we identify many genes that are expressed in a sex- or life stage-specific manner and characterize the transcriptional landscape of a morphological region with unique biological adaptations, namely, bacillary band and stichosome, found only in whipworms and related parasites. Using RNA sequencing data from whipworm-infected mice, we describe the regulated T helper 1 (TH1)-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identified numerous new potential drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection.

    Funded by: Wellcome Trust: 088862/Z/09/Z, 098051, WT083620MA, WT100290MA

    Nature genetics 2014;46;7;693-700

  • Summarizing specific profiles in Illumina sequencing from whole-genome amplified DNA.

    Tsai IJ, Hunt M, Holroyd N, Huckvale T, Berriman M and Kikuchi T

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK Faculty of Medicine, Division of Parasitology, Department of Infectious Disease, University of Miyazaki, Miyazaki 889-1692, Japan.

    Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.

    Funded by: Wellcome Trust: WT 098051

    DNA research : an international journal for rapid publication of reports on genes and genomes 2014;21;3;243-54

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

Sarah Nichol

- unknown

I gained a BSc(Hons)in Zoology from the University of Edinburgh in 2005. I had a particular interest in parasites, which led me to write my final year dissertation on developing a test to detect nematodes in sheep. After a 2 year gap year of working for a year to raise money to fund my solo travels around Australia and New Zealand, I arrived at the Sanger Institute, where I began as a Finisher on the Zebrafish project in 2008. I became a fully fledged member of Parasite Genomics at the end of 2011.

Research

I have contributed to the Parasite Genomics group as a Senior Genome Analyst since 2010. My work involves improvement of a number of helminth genome assemblies, using bespoke software tools and scripts to further improve the assemblies beyond what is possible using automated assembly alone. I also help with manually annotating and improving gene models.

References

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

  • A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.

    Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.

    Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z

    PLoS neglected tropical diseases 2012;6;1;e1455

Thomas Otto

- Senior Staff Scientist

I studied informatics with bioinformatics as minor in Lübeck, Germany. After a short project at the Florida State University (analyzing Functional magnetic resonance imaging data), I started to work at the Fundação Oswaldo Cruz in Rio de Janeiro, Brazil. My role was to provide bioinformatics support to the group and generate algorithmic solutions to biological problems. In 2008, I finished my PhD, presenting alternative ways to improve the assembly of the Brazilian tuberculosis genome.

Research

In 2008 I joined Matt Berriman’s group. My main role is to provide bioinformatics support to our team, other groups at Sanger and within the European EviMalaR network of malaria labs. My projects mostly involve analyzing next generation sequencing data related to Malaria, by developing algorithms.

References

  • Optimal enzymes for amplifying sequencing libraries.

    Quail MA, Otto TD, Gu Y, Harris SR, Skelly TF, McQuillan JA, Swerdlow HP and Oyola SO

    Nature methods 2012;9;1;10-1

  • A scalable pipeline for highly effective genetic modification of a malaria parasite.

    Pfander C, Anar B, Schwach F, Otto TD, Brochet M, Volkmann K, Quail MA, Pain A, Rosen B, Skarnes W, Rayner JC and Billker O

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    In malaria parasites, the systematic experimental validation of drug and vaccine targets by reverse genetics is constrained by the inefficiency of homologous recombination and by the difficulty of manipulating adenine and thymine (A+T)-rich DNA of most Plasmodium species in Escherichia coli. We overcame these roadblocks by creating a high-integrity library of Plasmodium berghei genomic DNA (>77% A+T content) in a bacteriophage N15-based vector that can be modified efficiently using the lambda Red method of recombineering. We built a pipeline for generating P. berghei genetic modification vectors at genome scale in serial liquid cultures on 96-well plates. Vectors have long homology arms, which increase recombination frequency up to tenfold over conventional designs. The feasibility of efficient genetic modification at scale will stimulate collaborative, genome-wide knockout and tagging programs for P. berghei.

    Funded by: Medical Research Council: G0501670, G0501670(76331); Wellcome Trust: 089085, WT089085/Z/09/Z

    Nature methods 2011;8;12;1078-82

  • Genome sequence of Mycobacterium bovis BCG Moreau, the Brazilian vaccine strain against tuberculosis.

    Gomes LH, Otto TD, Vasconcellos EA, Ferrão PM, Maia RM, Moreira AS, Ferreira MA, Castello-Branco LR, Degrave WM and Mendonça-Lima L

    Laboratório de Genômica Funcional e Bioinformática, Pavilhão Leonidas Deane sala 104, Instituto Oswaldo Cruz, Fiocruz Av., Brasil 4365, Manguinhos, 21040-900 Rio de Janeiro, Brazil.

    Mycobacterium bovis bacillus Calmette-Guérin (BCG) is the only vaccine available against tuberculosis, and the strains used worldwide represent a family of daughter strains with distinct genotypic characteristics. Here we report the complete genome sequence of M. bovis BCG Moreau, the strain in continuous use in Brazil for vaccine production since the 1920s.

    Journal of bacteriology 2011;193;19;5600-1

  • Real-time sequencing.

    Otto TD

    Nature reviews. Microbiology 2011;9;9;633

  • RATT: Rapid Annotation Transfer Tool.

    Otto TD, Dillon GP, Degrave WS and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Nucleic acids research 2011;39;9;e57

  • Two nonrecombining sympatric forms of the human malaria parasite Plasmodium ovale occur globally.

    Sutherland CJ, Tanomsing N, Nolder D, Oguike M, Jennison C, Pukrittayakamee S, Dolecek C, Hien TT, do Rosário VE, Arez AP, Pinto J, Michon P, Escalante AA, Nosten F, Burke M, Lee R, Blaze M, Otto TD, Barnwell JW, Pain A, Williams J, White NJ, Day NP, Snounou G, Lockhart PJ, Chiodini PL, Imwong M and Polley SD

    Health Protection Agency Malaria Reference Laboratory, Immunology Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom. colin.sutherland@lshtm.ac.uk

    Background: Malaria in humans is caused by apicomplexan parasites belonging to 5 species of the genus Plasmodium. Infections with Plasmodium ovale are widely distributed but rarely investigated, and the resulting burden of disease is not known. Dimorphism in defined genes has led to P. ovale parasites being divided into classic and variant types. We hypothesized that these dimorphs represent distinct parasite species.

    Methods: Multilocus sequence analysis of 6 genetic characters was carried out among 55 isolates from 12 African and 3 Asia-Pacific countries.

    Results: Each genetic character displayed complete dimorphism and segregated perfectly between the 2 types. Both types were identified in samples from Ghana, Nigeria, São Tomé, Sierra Leone, and Uganda and have been described previously in Myanmar. Splitting of the 2 lineages is estimated to have occurred between 1.0 and 3.5 million years ago in hominid hosts.

    Conclusions: We propose that P. ovale comprises 2 nonrecombining species that are sympatric in Africa and Asia. We speculate on possible scenarios that could have led to this speciation. Furthermore, the relatively high frequency of imported cases of symptomatic P. ovale infection in the United Kingdom suggests that the morbidity caused by ovale malaria has been underestimated.

    Funded by: Wellcome Trust: 093956

    The Journal of infectious diseases 2010;201;10;1544-50

  • New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, Lemieux J, Barrell B, Pain A, Berriman M, Newbold C and Llinás M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

    Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z

    Molecular microbiology 2010;76;1;12-24

  • ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    Otto TD, Catanho M, Tristão C, Bezerra M, Fernandes RM, Elias GS, Scaglia AC, Bovermann B, Berstis V, Lifschitz S, de Miranda AB and Degrave W

    Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil. otto@fiocruz.br

    Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach.

    Availability: The database can be accessed through http://proteinworlddb.org

    Bioinformatics (Oxford, England) 2010;26;5;705-7

  • Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.

    Tsai IJ, Otto TD and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. jit@sanger.ac.uk

    Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Genome biology 2010;11;4;R41

  • ABACAS: algorithm-based automatic contiguation of assembled sequences.

    Assefa S, Keane TM, Otto TD, Newbold C and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK. sa4@sanger.ac.uk

    Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated.

    ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net.

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2009;25;15;1968-9

Anna Protasio

ap6@sanger.ac.uk Postdoctoral Fellow

I obtained my undergraduate degree in Biochemistry at the University of the Republic in Uruguay (2000-2006). In 2005 I won the "Wellcome Trust Sanger Institute Prize Competition" and was awarded a summer placement at the Institute. During 2007 I undertook an internship at the Schistosomiasis Research Group (University of Cambridge, UK) which turned my interests into Schistosomes. Later that year I started my Ph D studies under the supervision of Dr Matt Berriman (Parasite Genomics group) where I focused on gene expression changes in the early stages host invasion in S.mansoni.

Research

My current research in the Parasite Genomics group is focused in characterising and understanding the mechanisms of gene expression regulation in parasitic worms. Given the good state of its genome assembly and gene annotation, I use S.mansoni as my model organism for this sturdies. I am mainly interested in the role of microRNAs, promoter activation/repression and the presence of antisense transcription.

References

  • Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.

    Protasio AV, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosome infection begins with the penetration of cercariae through healthy unbroken host skin. This process leads to the transformation of the free-living larvae into obligate parasites called schistosomula. This irreversible transformation, which occurs in as little as two hours, involves casting the cercaria tail and complete remodelling of the surface membrane. At this stage, parasites are vulnerable to host immune attack and oxidative stress. Consequently, the mechanisms by which the parasite recognises and swiftly adapts to the human host are still the subject of many studies, especially in the context of development of intervention strategies against schistosomiasis infection. Because obtaining enough material from in vivo infections is not always feasible for such studies, the transformation process is often mimicked in the laboratory by application of shear pressure to a cercarial sample resulting in mechanically transformed (MT) schistosomula. These parasites share remarkable morphological and biochemical similarity to the naturally transformed counterparts and have been considered a good proxy for parasites undergoing natural infection. Relying on this equivalency, MT schistosomula have been used almost exclusively in high-throughput studies of gene expression, identification of drug targets and identification of effective drugs against schistosomes. However, the transcriptional equivalency between skin-transformed (ST) and MT schistosomula has never been proven. In our approach to compare these two types of schistosomula preparations and to explore differences in gene expression triggered by the presence of a skin barrier, we performed RNA-seq transcriptome profiling of ST and MT schistosomula at 24 hours post transformation. We report that these two very distinct schistosomula preparations differ only in the expression of 38 genes (out of ∼11,000), providing convincing evidence to resolve the skin vs. mechanical long-lasting controversy.

    Funded by: Wellcome Trust: WT 083931/Z/07/Z, WT 098051

    PLoS neglected tropical diseases 2013;7;3;e2091

  • Progressive cross-reactivity in IgE responses: an explanation for the slow development of human immunity to schistosomiasis?

    Fitzsimmons CM, Jones FM, Pinot de Moira A, Protasio AV, Khalife J, Dickinson HA, Tukahebwa EM and Dunne DW

    Department of Pathology, University of Cambridge, Cambridge, United Kindgdom. cmf1000@cam.ac.uk

    People in regions of Schistosoma mansoni endemicity slowly acquire immunity, but why this takes years to develop is still not clear. It has been associated with increases in parasite-specific IgE, induced, some investigators propose, to antigens exposed during the death of adult worms. These antigens include members of the tegumental-allergen-like protein family (TAL1 to TAL13). Previously, in a group of S. mansoni-infected Ugandan males, we showed that IgE responses to three TALs expressed in worms (TAL1, -3, and -5) became more prevalent with age. Now, in a subcohort we examined associations of these responses with resistance to reinfection and use the data to propose a mechanism for the slow development of immunity. IgE was measured 9 weeks posttreatment and at reinfection at 2 years (n = 144). An anti-TAL5 IgE (herein referred to as TAL5 IgE) response was associated with reduced reinfection even after adjusting for age using regression analysis (geometric mean odds ratio, 0.24; P = 0.016). TAL5 IgE responders were a subset of TAL3 IgE responders, themselves a subset of TAL1 responders. TAL3 IgE and TAL5 IgE were highly cross-reactive, with TAL3 the immunizing antigen and TAL5 the cross-reactive antigen. Transcriptional and translational studies show that TAL3 is most abundant in adult worms and that TAL5 is most abundant in infectious larvae. We propose that in chronic schistosomiasis, older individuals have repeatedly experienced IgE antigens exposed when adult worms die (e.g., TAL3) and that this leads to increasing cross-reactivity with antigens of invading larvae (e.g., TAL5). Progressive accumulation of worm/larvae cross-reactivity could explain the age-dependent immunity observed in areas of endemicity.

    Funded by: Wellcome Trust: 083931/∼/07/Z

    Infection and immunity 2012;80;12;4264-70

  • A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.

    Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.

    Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z

    PLoS neglected tropical diseases 2012;6;1;e1455

  • Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.

    Laing R, Hunt M, Protasio AV, Saunders G, Mungall K, Laing S, Jackson F, Quail M, Beech R, Berriman M and Gilleard JS

    Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    PloS one 2011;6;8;e23216

  • Thioredoxin and glutathione systems differ in parasitic and free-living platyhelminths.

    Otero L, Bonilla M, Protasio AV, Fernández C, Gladyshev VN and Salinas G

    Cátedra de Inmunología, Facultad de Química, Instituto de Higiene, Universidad de la República, Avda, A, Navarro 3051, Montevideo, Uruguay.

    Background: The thioredoxin and/or glutathione pathways occur in all organisms. They provide electrons for deoxyribonucleotide synthesis, function as antioxidant defenses, in detoxification, Fe/S biogenesis and participate in a variety of cellular processes. In contrast to their mammalian hosts, platyhelminth (flatworm) parasites studied so far, lack conventional thioredoxin and glutathione systems. Instead, they possess a linked thioredoxin-glutathione system with the selenocysteine-containing enzyme thioredoxin glutathione reductase (TGR) as the single redox hub that controls the overall redox homeostasis. TGR has been recently validated as a drug target for schistosomiasis and new drug leads targeting TGR have recently been identified for these platyhelminth infections that affect more than 200 million people and for which a single drug is currently available. Little is known regarding the genomic structure of flatworm TGRs, the expression of TGR variants and whether the absence of conventional thioredoxin and glutathione systems is a signature of the entire platyhelminth phylum.

    Results: We examine platyhelminth genomes and transcriptomes and find that all platyhelminth parasites (from classes Cestoda and Trematoda) conform to a biochemical scenario involving, exclusively, a selenium-dependent linked thioredoxin-glutathione system having TGR as a central redox hub. In contrast, the free-living platyhelminth Schmidtea mediterranea (Class Turbellaria) possesses conventional and linked thioredoxin and glutathione systems. We identify TGR variants in Schistosoma spp. derived from a single gene, and demonstrate their expression. We also provide experimental evidence that alternative initiation of transcription and alternative transcript processing contribute to the generation of TGR variants in platyhelminth parasites.

    Conclusions: Our results indicate that thioredoxin and glutathione pathways differ in parasitic and free-living flatworms and that canonical enzymes were specifically lost in the parasitic lineage. Platyhelminth parasites possess a unique and simplified redox system for diverse essential processes, and thus TGR is an excellent drug target for platyhelminth infections. Inhibition of the central redox wire hub would lead to overall disruption of redox homeostasis and disable DNA synthesis.

    Funded by: FIC NIH HHS: TW006959; NIGMS NIH HHS: GM065204; Wellcome Trust: WT 085775/Z/08/Z

    BMC genomics 2010;11;237

  • The genome of the blood fluke Schistosoma mansoni.

    Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG and El-Sayed NM

    Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk

    Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

    Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z

    Nature 2009;460;7253;352-8

  • Platyhelminth mitochondrial and cytosolic redox homeostasis is controlled by a single thioredoxin glutathione reductase and dependent on selenium and glutathione.

    Bonilla M, Denicola A, Novoselov SV, Turanov AA, Protasio A, Izmendi D, Gladyshev VN and Salinas G

    Cátedra de Inmunología, Facultad de Química-Facultad de Ciencias, Instituto de Higiene, Universidad de la República, Piso 2, Montevideo, Uruguay.

    Platyhelminth parasites are a major health problem in developing countries. In contrast to their mammalian hosts, platyhelminth thiol-disulfide redox homeostasis relies on linked thioredoxin-glutathione systems, which are fully dependent on thioredoxin-glutathione reductase (TGR), a promising drug target. TGR is a homodimeric enzyme comprising a glutaredoxin domain and thioredoxin reductase (TR) domains with a C-terminal redox center containing selenocysteine (Sec). In this study, we demonstrate the existence of functional linked thioredoxin-glutathione systems in the cytosolic and mitochondrial compartments of Echinococcus granulosus, the platyhelminth responsible for hydatid disease. The glutathione reductase (GR) activity of TGR exhibited hysteretic behavior regulated by the [GSSG]/[GSH] ratio. This behavior was associated with glutathionylation by GSSG and abolished by deglutathionylation. The K(m) and k(cat) values for mitochondrial and cytosolic thioredoxins (9.5 microm and 131 s(-1), 34 microm and 197 s(-1), respectively) were higher than those reported for mammalian TRs. Analysis of TGR mutants revealed that the glutaredoxin domain is required for the GR activity but did not affect the TR activity. In contrast, both GR and TR activities were dependent on the Sec-containing redox center. The activity loss caused by the Sec-to-Cys mutation could be partially compensated by a Cys-to-Sec mutation of the neighboring residue, indicating that Sec can support catalysis at this alternative position. Consistent with the essential role of TGR in redox control, 2.5 microm auranofin, a known TGR inhibitor, killed larval worms in vitro. These studies establish the selenium- and glutathione-dependent regulation of cytosolic and mitochondrial redox homeostasis through a single TGR enzyme in platyhelminths.

    Funded by: FIC NIH HHS: TW 006959; NIGMS NIH HHS: GM 065204

    The Journal of biological chemistry 2008;283;26;17898-907

  • Use of genomic DNA as an indirect reference for identifying gender-associated transcripts in morphologically identical, but chromosomally distinct, Schistosoma mansoni cercariae.

    Fitzpatrick JM, Protasio AV, McArdle AJ, Williams GA, Johnston DA and Hoffmann KF

    Department of Pathology, University of Cambridge, Cambridge, United Kingdom.

    Background: The use of DNA microarray technology to study global Schistosoma gene expression has led to the rapid identification of novel biological processes, pathways or associations. Implementation of standardized DNA microarray protocols across laboratories would assist maximal interpretation of generated datasets and extend productive application of this technology.

    Utilizing a new Schistosoma mansoni oligonucleotide DNA microarray composed of 37,632 elements, we show that schistosome genomic DNA (gDNA) hybridizes with less variation compared to complex mixed pools of S. mansoni cDNA material (R = 0.993 for gDNA compared to R = 0.956 for cDNA during 'self versus self' hybridizations). Furthermore, these effects are species-specific, with S. japonicum or Mus musculus gDNA failing to bind significantly to S. mansoni oligonucleotide DNA microarrays (e.g R = 0.350 when S. mansoni gDNA is co-hybridized with S. japonicum gDNA). Increased median fluorescent intensities (209.9) were also observed for DNA microarray elements hybridized with S. mansoni gDNA compared to complex mixed pools of S. mansoni cDNA (112.2). Exploiting these valuable characteristics, S. mansoni gDNA was used in two-channel DNA microarray hybridization experiments as a common reference for indirect identification of gender-associated transcripts in cercariae, a schistosome life-stage in which there is no overt sexual dimorphism. This led to the identification of 2,648 gender-associated transcripts. When compared to the 780 gender-associated transcripts identified by hybridization experiments utilizing a two-channel direct method (co-hybridization of male and female cercariae cDNA), indirect methods using gDNA were far superior in identifying greater quantities of differentially expressed transcripts. Interestingly, both methods identified a concordant subset of 188 male-associated and 156 female-associated cercarial transcripts, respectively. Gene ontology classification of these differentially expressed transcripts revealed a greater diversity of categories in male cercariae. Quantitative real-time PCR analysis confirmed the DNA microarray results and supported the reliability of this platform for identifying gender-associated transcripts.

    Schistosome gDNA displays characteristics highly suitable for the comparison of two-channel DNA microarray results obtained from experiments conducted independently across laboratories. The schistosome transcripts identified here demonstrate, for the first time, that gender-associated patterns of expression are already well established in the morphologically identical, but chromosomally distinct, cercariae stage.

    Funded by: Wellcome Trust: 068501/Z/02/Z, 078317/Z/05/Z

    PLoS neglected tropical diseases 2008;2;10;e323

Adam Reid

ar11@sanger.ac.uk Staff scientist

I studied for a Genetics BSc at the University of Sheffield and an MRes in Bioinformatics at the University of York. I subsequently worked for AstraZeneca, providing bioinformatics support to proteomics and genotyping projects. I then did my PhD with Prof. Christine Orengo at University College London looking at the evolution of protein domain families.

I joined the Parasite Genomics group in January 2009.

Research

1. I have led the analysis of the Neospora caninum genome and its comparison with the human pathogen Toxoplasma gondii.

2. I am leading analysis of another apicomplexan genome, the chicken parasite Eimeria tenella (and several related species).

3. I am working on various approaches to use gene expression analysis in investigating host-parasite interactions principally in Malaria, but also helminths and trypanosomes.

References

  • Vector transmission regulates immune control of Plasmodium virulence.

    Spence PJ, Jarra W, Lévy P, Reid AJ, Chappell L, Brugat T, Sanders M, Berriman M and Langhorne J

    Division of Parasitology, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK.

    Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents, primates or humans increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission, which is assumed to 'reset' Plasmodium to its original character. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we use mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi to interrogate regulation of parasite virulence. Analysis of SBP P. c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn modifies the elicited mammalian immune response, which in turn attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.

    Funded by: Medical Research Council: MC_U117584248, U.1175.02.004.00004(60507), U117584248; Wellcome Trust: 085775, 089553, 098051

    Nature 2013;498;7453;228-31

  • Genes involved in host-parasite interactions can be revealed by their correlated expression.

    Reid AJ and Berriman M

    Parasite genomics group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ar11@sanger.ac.uk

    Molecular interactions between a parasite and its host are key to the ability of the parasite to enter the host and persist. Our understanding of the genes and proteins involved in these interactions is limited. To better understand these processes it would be advantageous to have a range of methods to predict pairs of genes involved in such interactions. Correlated gene expression profiles can be used to identify molecular interactions within a species. Here we have extended the concept to different species, showing that genes with correlated expression are more likely to encode proteins, which directly or indirectly participate in host-parasite interaction. We go on to examine our predictions of molecular interactions between the malaria parasite and both its mammalian host and insect vector. Our approach could be applied to study any interaction between species, for example, between a host and its parasites or pathogens, but also symbiotic and commensal pairings.

    Funded by: Wellcome Trust: 098051

    Nucleic acids research 2013;41;3;1508-18

  • Characterization and gene expression analysis of the cir multi-gene family of Plasmodium chabaudi chabaudi (AS).

    Lawton J, Brugat T, Yan YX, Reid AJ, Böhme U, Otto TD, Pain A, Jackson A, Berriman M, Cunningham D, Preiser P and Langhorne J

    Division of Parasitology, MRC National Institute for Medical Research, London, UK.

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.

    Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.

    Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein functions, including their contribution to antigenic variation and immune evasion.

    Funded by: Medical Research Council: MC_EX_G0901345, U117584248

    BMC genomics 2012;13;125

  • Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.

    Reid AJ, Vermont SJ, Cotton JA, Harris D, Hill-Cawthorne GA, Könen-Waisman S, Latham SM, Mourier T, Norton R, Quail MA, Sanders M, Shanmugam D, Sohal A, Wasmuth JD, Brunk B, Grigg ME, Howard JC, Parkinson J, Roos DS, Trees AJ, Berriman M, Pain A and Wastling JM

    Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.

    Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z

    PLoS pathogens 2012;8;3;e1002567

  • Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.

    Kikuchi T, Cotton JA, Dalzell JJ, Hasegawa K, Kanzaki N, McVeigh P, Takanashi T, Tsai IJ, Assefa SA, Cock PJ, Otto TD, Hunt M, Reid AJ, Sanchez-Flores A, Tsuchihara K, Yokoi T, Larsson MC, Miwa J, Maule AG, Sahashi N, Jones JT and Berriman M

    Forestry and Forest Products Research Institute, Tsukuba, Japan. kikuchit@affrc.go.jp

    Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    PLoS pathogens 2011;7;9;e1002219

  • CODA: accurate detection of functional associations between proteins in eukaryotic genomes using domain fusion.

    Reid AJ, Ranea JA, Clegg AB and Orengo CA

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom. ar11@sanger.ac.uk

    Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.

    We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.

    The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/.

    Funded by: Biotechnology and Biological Sciences Research Council

    PloS one 2010;5;6;e10908

  • Comparative evolutionary analysis of protein complexes in E. coli and yeast.

    Reid AJ, Ranea JA and Orengo CA

    Research Department of Structural & Molecular Biology, University College London, London, WC1E 6BT, UK. ar11@sanger.ac.uk

    Background: Proteins do not act in isolation; they frequently act together in protein complexes to carry out concerted cellular functions. The evolution of complexes is poorly understood, especially in organisms other than yeast, where little experimental data has been available.

    Results: We generated accurate, high coverage datasets of protein complexes for E. coli and yeast in order to study differences in the evolution of complexes between these two species. We show that substantial differences exist in how complexes have evolved between these organisms. A previously proposed model of complex evolution identified complexes with cores of interacting homologues. We support findings of the relative importance of this mode of evolution in yeast, but find that it is much less common in E. coli. Additionally it is shown that those homologues which do cluster in complexes are involved in eukaryote-specific functions. Furthermore we identify correlated pairs of non-homologous domains which occur in multiple protein complexes. These were identified in both yeast and E. coli and we present evidence that these too may represent complex cores in yeast but not those of E. coli.

    Conclusions: Our results suggest that there are differences in the way protein complexes have evolved in E. coli and yeast. Whereas some yeast complexes have evolved by recruiting paralogues, this is not apparent in E. coli. Furthermore, such complexes are involved in eukaryotic-specific functions. This implies that the increase in gene family sizes seen in eukaryotes in part reflects multiple family members being used within complexes. However, in general, in both E. coli and yeast, homologous domains are used in different complexes.

    Funded by: Biotechnology and Biological Sciences Research Council

    BMC genomics 2010;11;79

  • Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone.

    Reid AJ, Yeats C and Orengo CA

    Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK. reid@bioichem.ucl.ac.uk

    Motivation: A recent development in sequence-based remote homologue detection is the introduction of profile-profile comparison methods. These are more powerful than previous technologies and can detect potentially homologous relationships missed by structural classifications such as CATH and SCOP. As structural classifications traditionally act as the gold standard of homology this poses a challenge in benchmarking them.

    Results: We present a novel approach which allows an accurate benchmark of these methods against the CATH structural classification. We then apply this approach to assess the accuracy of a range of publicly available methods for remote homology detection including several profile-profile methods (COMPASS, HHSearch, PRC) from two perspectives. First, in distinguishing homologous domains from non-homologues and second, in annotating proteomes with structural domain families. PRC is shown to be the best method for distinguishing homologues. We show that SAM is the best practical method for annotating genomes, whilst using COMPASS for the most remote homologues would increase coverage. Finally, we introduce a simple approach to increase the sensitivity of remote homologue detection by up to 10%. This is achieved by combining multiple methods with a jury vote.

    Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2007;23;18;2353-60

Florian Sessler

fs8@sanger.ac.uk PhD Student

Having completed my BSc in Biology with Microbiology at Imperial College London, I started a 4-year PhD program at the Wellcome Trust Sanger Institute in 2011. After 6 months of rotations in different 3 pathogen labs I started my PhD project in Matt Berriman's Parasite Genomics group.

Research

My PhD project focuses on characterizing male and female Schistosoma mansoni, a tropical parasite about 200 million people are currently infected with. In order to better understand sexual development and maturation, I use a range of different techniques, but high throughput sequencing (RNA-seq) and transcriptome analysis currently form the basis of my research.

Eleanor Stanley

es9@sanger.ac.uk Senior Bioinformatician

I studied Biological Sciences, specialising in genetics, at University of Birmingham. The final year literature study on the duplication of the Adh region in Drosophila aided my successful application to become a Flybase curator at University of Cambridge. After 5 fabulous years I moved to the European Bioinformatics Institute to become a UniProt curator. In addition, I enjoyed roles managing an alternative splicing project and the Complete proteomes team. In my final year of being a biocurator, I completed an Msc(Res) in Bioinformatics and joined the parasite genomics group at the Sanger Institute in April 2012.

Research

My role within the team is to build a pipeline to generate gene models for the 50 Helminth genome project. To achieve this I am using Ensembl and Maker.

References

  • Toward community standards in the quest for orthologs.

    Dessimoz C, Gabaldón T, Roos DS, Sonnhammer EL, Herrero J and Quest for Orthologs Consortium

    The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.

    Funded by: PHS HHS: HHSN266200400037C; Wellcome Trust: 095908

    Bioinformatics (Oxford, England) 2012;28;6;900-4

  • Reorganizing the protein space at the Universal Protein Resource (UniProt).

    UniProt Consortium

    The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    The mission of UniProt is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. A key development at UniProt is the provision of complete, reference and representative proteomes. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

    Funded by: British Heart Foundation: SP/07/007/23671; NCRR NIH HHS: 3P20RR016472-09S2; NHGRI NIH HHS: 1U41HG006104-02, 2P41HG02273-07; NIGMS NIH HHS: 2R01GM080646-06, 3R01GM080646-04S2, 5R01GM080646-05; NLM NIH HHS: 5G08LM010720-02

    Nucleic acids research 2012;40;Database issue;D71-5

  • ASTD: The Alternative Splicing and Transcript Diversity database.

    Koscielny G, Le Texier V, Gopalakrishnan C, Kumanduri V, Riethoven JJ, Nardone F, Stanley E, Fallsehr C, Hofmann O, Kull M, Harrington E, Boué S, Eyras E, Plass M, Lopez F, Ritchie W, Moucadel V, Ara T, Pospisil H, Herrmann A, G Reich J, Guigó R, Bork P, Doeberitz Mv, Vilo J, Hide W, Apweiler R, Thanaraj TA and Gautheret D

    European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.

    Genomics 2009;93;3;213-20

  • The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

    Genome Information Integration Project And H-Invitational 2, Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, Takeda J, Taniya T, Sakate R, Kikugawa S, Shimada M, Tanino M, Koyanagi KO, Barrero RA, Gough C, Chun HW, Habara T, Hanaoka H, Hayakawa Y, Hilton PB, Kaneko Y, Kanno M, Kawahara Y, Kawamura T, Matsuya A, Nagata N, Nishikata K, Noda AO, Nurimoto S, Saichi N, Sakai H, Sanbonmatsu R, Shiba R, Suzuki M, Takabayashi K, Takahashi A, Tamura T, Tanaka M, Tanaka S, Todokoro F, Yamaguchi K, Yamamoto N, Okido T, Mashima J, Hashizume A, Jin L, Lee KB, Lin YC, Nozaki A, Sakai K, Tada M, Miyazaki S, Makino T, Ohyanagi H, Osato N, Tanaka N, Suzuki Y, Ikeo K, Saitou N, Sugawara H, O'Donovan C, Kulikova T, Whitfield E, Halligan B, Shimoyama M, Twigger S, Yura K, Kimura K, Yasuda T, Nishikawa T, Akiyama Y, Motono C, Mukai Y, Nagasaki H, Suwa M, Horton P, Kikuno R, Ohara O, Lancet D, Eveno E, Graudens E, Imbeaud S, Debily MA, Hayashizaki Y, Amid C, Han M, Osanger A, Endo T, Thomas MA, Hirakawa M, Makalowski W, Nakao M, Kim NS, Yoo HS, De Souza SJ, Bonaldo Mde F, Niimura Y, Kuryshev V, Schupp I, Wiemann S, Bellgard M, Shionyu M, Jia L, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Zhang Q, Go M, Minoshima S, Ohtsubo M, Hanada K, Tonellato P, Isogai T, Zhang J, Lenhard B, Kim S, Chen Z, Hinz U, Estreicher A, Nakai K, Makalowska I, Hide W, Tiffin N, Wilming L, Chakraborty R, Soares MB, Chiusano ML, Suzuki Y, Auffray C, Yamaguchi-Kabata Y, Itoh T, Hishiki T, Fukuchi S, Nishikawa K, Sugano S, Nomura N, Tateno Y, Imanishi T and Gojobori T

    Japan Biological Information Research Center, Japan Biological Informatics Consortium, Japan.

    Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

    Funded by: NHLBI NIH HHS: R01 HL064541; Wellcome Trust: 077198

    Nucleic acids research 2008;36;Database issue;D793-9

  • The Rice Annotation Project Database (RAP-DB): 2008 update.

    Rice Annotation Project, Tanaka T, Antonio BA, Kikuchi S, Matsumoto T, Nagamura Y, Numa H, Sakai H, Wu J, Itoh T, Sasaki T, Aono R, Fujii Y, Habara T, Harada E, Kanno M, Kawahara Y, Kawashima H, Kubooka H, Matsuya A, Nakaoka H, Saichi N, Sanbonmatsu R, Sato Y, Shinso Y, Suzuki M, Takeda J, Tanino M, Todokoro F, Yamaguchi K, Yamamoto N, Yamasaki C, Imanishi T, Okido T, Tada M, Ikeo K, Tateno Y, Gojobori T, Lin YC, Wei FJ, Hsing YI, Zhao Q, Han B, Kramer MR, McCombie RW, Lonsdale D, O'Donovan CC, Whitfield EJ, Apweiler R, Koyanagi KO, Khurana JP, Raghuvanshi S, Singh NK, Tyagi AK, Haberer G, Fujisawa M, Hosokawa S, Ito Y, Ikawa H, Shibata M, Yamamoto M, Bruskiewich RM, Hoen DR, Bureau TE, Namiki N, Ohyanagi H, Sakai Y, Nobushima S, Sakata K, Barrero RA, Sato Y, Souvorov A, Smith-White B, Tatusova T, An S, An G, OOta S, Fuks G, Fuks G, Messing J, Christie KR, Lieberherr D, Kim H, Zuccolo A, Wing RA, Nobuta K, Green PJ, Lu C, Meyers BC, Chaparro C, Piegu B, Panaud O and Echeverria M

    National Institute of Agrobiological Sciences, Ibaraki 305-8602, Japan.

    The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.

    Nucleic acids research 2008;36;Database issue;D1028-33

  • Bioinformatics database infrastructure for biotechnology research.

    Whitfield EJ, Pruess M and Apweiler R

    EMBL-EBI, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambs CB10 1SD, UK. eleanor@ebi.ac.uk

    Many databases are available that provide valuable data resources for the biotechnological researcher. According to their core data, they can be divided into different types. Some databases provide primary data, like all published nucleotide sequences, others deal with protein sequences. In addition to these two basic types of databases, a huge number of more specialized resources are available, like databases about protein structures, protein identification, special features of genes and/or proteins, or certain organisms. Furthermore, some resources offer integrated views on different types of data, allowing the user to do easy customized queries over large datasets and to compare different types of data.

    Journal of biotechnology 2006;124;4;629-39

  • Annotation of the Drosophila melanogaster euchromatic genome: a systematic review.

    Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, Smith CD, Tupy JL, Whitfied EJ, Bayraktaroglu L, Berman BP, Bettencourt BR, Celniker SE, de Grey AD, Drysdale RA, Harris NL, Richter J, Russo S, Schroeder AJ, Shu SQ, Stapleton M, Yamada C, Ashburner M, Gelbart WM, Rubin GM and Lewis SE

    Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA. sima@fruitfly.org

    Background: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.

    Results: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.

    Conclusions: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.

    Funded by: NHGRI NIH HHS: HG00739, HG00750

    Genome biology 2002;3;12;RESEARCH0083

  • FlyBase: a Drosophila database.

    FlyBase Consortium

    FlyBase, Biological Laboratories, 16 Divinity Avenue, Cambridge, MA 02138, USA.

    FlyBase (http://flybase.bio.indiana.edu/) is a comprehensive database of genetic and molecular data concerning Drosophila . FlyBase is maintained as a relational database (in Sybase) and is made available as html documents and flat files. The scope of FlyBase includes: genes, alleles (with phenotypes), aberrations, transposons, pointers to sequence data, gene products, maps, clones, stock lists, Drosophila workers and bibliographic references.

    Nucleic acids research 1998;26;1;85-8

Sascha Steinbiss

ss34@sanger.ac.uk Senior Bioinformatician

After having worked as a software developer for several years, I studied computer science and bioinformatics at the University of Hamburg, Germany. In Prof. Stefan Kurtz's genome informatics group, I designed and implemented software for de novo identification and annotation of LTR retrotransposons as well as methods and tools for efficient sequence storage and annotation handling. After finishing my Ph.D. work, I joined the parasite genomics group in December 2013.

Research

My work within the team includes development and maintenance of the GeneDB software as well as the implementation of automatic genome annotation pipelines for kinetoplastids.

References

  • GenomeTools: a comprehensive software library for efficient processing of structured genome annotations.

    Gremme G, Steinbiss S and Kurtz S

    University of Hamburg, Hamburg.

    Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 2013;10;3;645-56

  • A new efficient data structure for storage and retrieval of multiple biosequences.

    Steinbiss S and Kurtz S

    University of Hamburg, Hamburg.

    Today's genome analysis applications require sequence representations allowing for fast access to their contents while also being memory-efficient enough to facilitate analyses of large-scale data. While a wide variety of sequence representations exist, lack of a generic implementation of efficient sequence storage has led to a plethora of poorly reusable or programming language-specific implementations. We present a novel, space-efficient data structure (GtEncseq) for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths. For the human genome (3.1 gigabases, including 237 million wildcard characters) our representation requires only 2 + 8 &#x00D7; 10^-6bits per character. Implemented in C, our portable software implementation provides a variety of methods for random and sequential access to characters and substrings (including different reading directions) using an object-oriented interface. In addition, it includes access to metadata like sequence descriptions or character distributions. The library is extensible to be used from various scripting languages. GtEncseq is much more versatile than previous solutions, adding features that were previously unavailable. Benchmarks show that it is competitive with respect to space and time requirements.

    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 2012;9;2;330-44

  • LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.

    Steinbiss S, Kastens S and Kurtz S

    Center for Bioinformatics, University of Hamburg, 20146 Hamburg, Bundesstrasse 43, Germany. kurtz@zbh.uni-hamburg.de.

    Unlabelled: <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide de novo detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a de novo classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow.

    Results: We have developed LTRsift, an interactive graphical software tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how LTRsift can be employed in the context of a genome-wide LTR retrotransposon survey effort.

    Conclusions: LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software for predicting LTR retrotransposons up to the stage of preparing full-length reference sequence libraries. The LTRsift software is freely available at http://www.zbh.uni-hamburg.de/LTRsift under an open-source license.

    Mobile DNA 2012;3;1;18

  • FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context.

    Mader M, Simon R, Steinbiss S and Kurtz S

    Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany. kurtz@zbh.uni-hamburg.de.

    Background: The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate.

    Results: We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists.

    Conclusions: FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle.

    Journal of clinical bioinformatics 2011;1;1;20

  • Fine-grained annotation and classification of de novo predicted LTR retrotransposons.

    Steinbiss S, Willhoeft U, Gremme G and Kurtz S

    Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany. steinbiss@zbh.uni-hamburg.de

    Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity.

    Nucleic acids research 2009;37;21;7002-13

  • AnnotationSketch: a genome annotation drawing library.

    Steinbiss S, Gremme G, Schärfer C, Mader M and Kurtz S

    Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany. steinbiss@zbh.uni-hamburg.de

    Summary: To analyse the vast amount of genome annotation data available today, a visual representation of genomic features in a given sequence range is required. We developed a C library which provides layout and drawing capabilities for annotation features. It supports several common input and output formats and can easily be integrated into custom C applications. To exemplify the use of AnnotationSketch in other languages, we provide bindings to the scripting languages Ruby, Python and Lua.

    Availability: The software is available under an open-source license as part of GenomeTools (http://genometools.org/annotationsketch.html).

    Bioinformatics (Oxford, England) 2009;25;4;533-4

Alan Tracey

- Senior Computer Biologist

After graduating with a Geography BA (Hons) from Anglia Polytechnic University in 1997, I arrived at the Sanger Centre in 1998 to work as a sequencer on the Human Genome Project. After 1 year, I became a "finisher" and started learning about assembly improvement. I worked on a variety of genome projects including human, mouse, zebrafish, pig, tomato and many besides, notably contributing 1% of the finished human genome. In later projects, I worked on the most intractable repetitive regions learning many valuable problem solving skills.

Research

I joined the parasite genomics group as a Senior Genome Analyst in 2010 and have made significant contributions to a variety of helminth genome assemblies, bringing over a decade of experience as a "finisher" to bear in this group. My work involves iterative assembly improvement using a combination of bespoke software tools and algorithms to surpass what is achievable by automated assembly of de novo sequence. I seek to provide software development ideas and bug reporting to developers. I also work to manually annotate and refine gene models as necessary.

References

  • The zebrafish reference genome sequence and its relationship to the human genome.

    Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, McLaren S, Sealy I, Caccamo M, Churcher C, Scott C, Barrett JC, Koch R, Rauch GJ, White S, Chow W, Kilian B, Quintais LT, Guerra-Assunção JA, Zhou Y, Gu Y, Yen J, Vogel JH, Eyre T, Redmond S, Banerjee R, Chi J, Fu B, Langley E, Maguire SF, Laird GK, Lloyd D, Kenyon E, Donaldson S, Sehra H, Almeida-King J, Loveland J, Trevanion S, Jones M, Quail M, Willey D, Hunt A, Burton J, Sims S, McLay K, Plumb B, Davis J, Clee C, Oliver K, Clark R, Riddle C, Elliot D, Eliott D, Threadgold G, Harden G, Ware D, Begum S, Mortimore B, Mortimer B, Kerry G, Heath P, Phillimore B, Tracey A, Corby N, Dunn M, Johnson C, Wood J, Clark S, Pelan S, Griffiths G, Smith M, Glithero R, Howden P, Barker N, Lloyd C, Stevens C, Harley J, Holt K, Panagiotidis G, Lovell J, Beasley H, Henderson C, Gordon D, Auger K, Wright D, Collins J, Raisen C, Dyer L, Leung K, Robertson L, Ambridge K, Leongamornlert D, McGuire S, Gilderthorp R, Griffiths C, Manthravadi D, Nichol S, Barker G, Whitehead S, Kay M, Brown J, Murnane C, Gray E, Humphries M, Sycamore N, Barker D, Saunders D, Wallis J, Babbage A, Hammond S, Mashreghi-Mohammadi M, Barr L, Martin S, Wray P, Ellington A, Matthews N, Ellwood M, Woodmansey R, Clark G, Cooper J, Cooper J, Tromans A, Grafham D, Skuce C, Pandian R, Andrews R, Harrison E, Kimberley A, Garnett J, Fosker N, Hall R, Garner P, Kelly D, Bird C, Palmer S, Gehring I, Berger A, Dooley CM, Ersan-Ürün Z, Eser C, Geiger H, Geisler M, Karotki L, Kirn A, Konantz J, Konantz M, Oberländer M, Rudolph-Geiger S, Teucke M, Lanz C, Raddatz G, Osoegawa K, Zhu B, Rapp A, Widaa S, Langford C, Yang F, Schuster SC, Carter NP, Harrow J, Ning Z, Herrero J, Searle SM, Enright A, Geisler R, Plasterk RH, Lee C, Westerfield M, de Jong PJ, Zon LI, Postlethwait JH, Nüsslein-Volhard C, Hubbard TJ, Roest Crollius H, Rogers J and Stemple DL

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

    Funded by: NCRR NIH HHS: R01 RR010715, R01 RR020833; NICHD NIH HHS: P01 HD022486, P01 HD22486; NIDDK NIH HHS: 1 R01 DK55377-01A1; NIGMS NIH HHS: R01 GM085318; NIH HHS: R01 OD011116; Wellcome Trust: 095908, 098051

    Nature 2013;496;7446;498-503

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

  • A large palindrome with interchromosomal gene duplications in the pericentromeric region of the D. melanogaster Y chromosome.

    Méndez-Lago M, Bergman CM, de Pablos B, Tracey A, Whitehead SL and Villasante A

    Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain.

    The non-recombining Y chromosome is expected to degenerate over evolutionary time, however, gene gain is a common feature of Y chromosomes of mammals and Drosophila. Here, we report that a large palindrome containing interchromosomal segmental duplications is located in the vicinity of the first amplicon detected in the Y chromosome of D. melanogaster. The recent appearance of such amplicons suggests that duplications to the Y chromosome, followed by the amplification of the segmental duplications, are a mechanism for the continuing evolution of Drosophila Y chromosomes.

    Funded by: Wellcome Trust

    Molecular biology and evolution 2011;28;7;1967-71

  • Novel sequencing strategy for repetitive DNA in a Drosophila BAC clone reveals that the centromeric region of the Y chromosome evolved from a telomere.

    Méndez-Lago M, Wild J, Whitehead SL, Tracey A, de Pablos B, Rogers J, Szybalski W and Villasante A

    Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Madrid, Spain.

    The centromeric and telomeric heterochromatin of eukaryotic chromosomes is mainly composed of middle-repetitive elements, such as transposable elements and tandemly repeated DNA sequences. Because of this repetitive nature, Whole Genome Shotgun Projects have failed in sequencing these regions. We describe a novel kind of transposon-based approach for sequencing highly repetitive DNA sequences in BAC clones. The key to this strategy relies on physical mapping the precise position of the transposon insertion, which enables the correct assembly of the repeated DNA. We have applied this strategy to a clone from the centromeric region of the Y chromosome of Drosophila melanogaster. The analysis of the complete sequence of this clone has allowed us to prove that this centromeric region evolved from a telomere, possibly after a pericentric inversion of an ancestral telocentric chromosome. Our results confirm that the use of transposon-mediated sequencing, including positional mapping information, improves current finishing strategies. The strategy we describe could be a universal approach to resolving the heterochromatic regions of eukaryotic genomes.

    Funded by: Wellcome Trust

    Nucleic acids research 2009;37;7;2264-73

  • The DNA sequence and biological annotation of human chromosome 1.

    Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CC, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RI, Aubin K, Babbage AK, Bagguley CL, Bailey J, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Buckley D, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dunn M, Earthrowl M, Ellington AG, Errington H, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MR, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ES, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NS, McLaren S, Milne S, Mistry S, Moore MJ, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall M, Wallis JM, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR, Banerjee R, Bryant SP, Burford DC, Burrill WD, Clegg SM, Dhami P, Dovey O, Faulkner LM, Gribble SM, Langford CF, Pandian RD, Porter KM and Prigmore E

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. sgregory@chg.duhs.duke.edu

    The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.

    Funded by: Medical Research Council: G0000107; Wellcome Trust

    Nature 2006;441;7091;315-21

  • The DNA sequence and comparative analysis of human chromosome 10.

    Deloukas P, Earthrowl ME, Grafham DV, Rubenfield M, French L, Steward CA, Sims SK, Jones MC, Searle S, Scott C, Howe K, Hunt SE, Andrews TD, Gilbert JG, Swarbreck D, Ashurst JL, Taylor A, Battles J, Bird CP, Ainscough R, Almeida JP, Ashwell RI, Ambrose KD, Babbage AK, Bagguley CL, Bailey J, Banerjee R, Bates K, Beasley H, Bray-Allen S, Brown AJ, Brown JY, Burford DC, Burrill W, Burton J, Cahill P, Camire D, Carter NP, Chapman JC, Clark SY, Clarke G, Clee CM, Clegg S, Corby N, Coulson A, Dhami P, Dutta I, Dunn M, Faulkner L, Frankish A, Frankland JA, Garner P, Garnett J, Gribble S, Griffiths C, Grocock R, Gustafson E, Hammond S, Harley JL, Hart E, Heath PD, Ho TP, Hopkins B, Horne J, Howden PJ, Huckle E, Hynds C, Johnson C, Johnson D, Kana A, Kay M, Kimberley AM, Kershaw JK, Kokkinaki M, Laird GK, Lawlor S, Lee HM, Leongamornlert DA, Laird G, Lloyd C, Lloyd DM, Loveland J, Lovell J, McLaren S, McLay KE, McMurray A, Mashreghi-Mohammadi M, Matthews L, Milne S, Nickerson T, Nguyen M, Overton-Larty E, Palmer SA, Pearce AV, Peck AI, Pelan S, Phillimore B, Porter K, Rice CM, Rogosin A, Ross MT, Sarafidou T, Sehra HK, Shownkeen R, Skuce CD, Smith M, Standring L, Sycamore N, Tester J, Thorpe A, Torcasso W, Tracey A, Tromans A, Tsolas J, Wall M, Walsh J, Wang H, Weinstock K, West AP, Willey DL, Whitehead SL, Wilming L, Wray PW, Young L, Chen Y, Lovering RC, Moschonas NK, Siebert R, Fechtel K, Bentley D, Durbin R, Hubbard T, Doucette-Stamm L, Beck S, Smith DR and Rogers J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. panos@sanger.ac.uk

    The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.

    Nature 2004;429;6990;375-81

  • DNA sequence and analysis of human chromosome 9.

    Humphray SJ, Oliver K, Hunt AR, Plumb RW, Loveland JE, Howe KL, Andrews TD, Searle S, Hunt SE, Scott CE, Jones MC, Ainscough R, Almeida JP, Ambrose KD, Ashwell RI, Babbage AK, Babbage S, Bagguley CL, Bailey J, Banerjee R, Barker DJ, Barlow KF, Bates K, Beasley H, Beasley O, Bird CP, Bray-Allen S, Brown AJ, Brown JY, Burford D, Burrill W, Burton J, Carder C, Carter NP, Chapman JC, Chen Y, Clarke G, Clark SY, Clee CM, Clegg S, Collier RE, Corby N, Crosier M, Cummings AT, Davies J, Dhami P, Dunn M, Dutta I, Dyer LW, Earthrowl ME, Faulkner L, Fleming CJ, Frankish A, Frankland JA, French L, Fricker DG, Garner P, Garnett J, Ghori J, Gilbert JG, Glison C, Grafham DV, Gribble S, Griffiths C, Griffiths-Jones S, Grocock R, Guy J, Hall RE, Hammond S, Harley JL, Harrison ES, Hart EA, Heath PD, Henderson CD, Hopkins BL, Howard PJ, Howden PJ, Huckle E, Johnson C, Johnson D, Joy AA, Kay M, Keenan S, Kershaw JK, Kimberley AM, King A, Knights A, Laird GK, Langford C, Lawlor S, Leongamornlert DA, Leversha M, Lloyd C, Lloyd DM, Lovell J, Martin S, Mashreghi-Mohammadi M, Matthews L, McLaren S, McLay KE, McMurray A, Milne S, Nickerson T, Nisbett J, Nordsiek G, Pearce AV, Peck AI, Porter KM, Pandian R, Pelan S, Phillimore B, Povey S, Ramsey Y, Rand V, Scharfe M, Sehra HK, Shownkeen R, Sims SK, Skuce CD, Smith M, Steward CA, Swarbreck D, Sycamore N, Tester J, Thorpe A, Tracey A, Tromans A, Thomas DW, Wall M, Wallis JM, West AP, Whitehead SL, Willey DL, Williams SA, Wilming L, Wray PW, Young L, Ashurst JL, Coulson A, Blöcker H, Durbin R, Sulston JE, Hubbard T, Jackson MJ, Bentley DR, Beck S, Rogers J and Dunham I

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. sjh@sanger.ac.uk

    Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection.

    Nature 2004;429;6990;369-74

  • The DNA sequence and analysis of human chromosome 13.

    Dunham A, Matthews LH, Burton J, Ashurst JL, Howe KL, Ashcroft KJ, Beare DM, Burford DC, Hunt SE, Griffiths-Jones S, Jones MC, Keenan SJ, Oliver K, Scott CE, Ainscough R, Almeida JP, Ambrose KD, Andrews DT, Ashwell RI, Babbage AK, Bagguley CL, Bailey J, Bannerjee R, Barlow KF, Bates K, Beasley H, Bird CP, Bray-Allen S, Brown AJ, Brown JY, Burrill W, Carder C, Carter NP, Chapman JC, Clamp ME, Clark SY, Clarke G, Clee CM, Clegg SC, Cobley V, Collins JE, Corby N, Coville GJ, Deloukas P, Dhami P, Dunham I, Dunn M, Earthrowl ME, Ellington AG, Faulkner L, Frankish AG, Frankland J, French L, Garner P, Garnett J, Gilbert JG, Gilson CJ, Ghori J, Grafham DV, Gribble SM, Griffiths C, Hall RE, Hammond S, Harley JL, Hart EA, Heath PD, Howden PJ, Huckle EJ, Hunt PJ, Hunt AR, Johnson C, Johnson D, Kay M, Kimberley AM, King A, Laird GK, Langford CJ, Lawlor S, Leongamornlert DA, Lloyd DM, Lloyd C, Loveland JE, Lovell J, Martin S, Mashreghi-Mohammadi M, McLaren SJ, McMurray A, Milne S, Moore MJ, Nickerson T, Palmer SA, Pearce AV, Peck AI, Pelan S, Phillimore B, Porter KM, Rice CM, Searle S, Sehra HK, Shownkeen R, Skuce CD, Smith M, Steward CA, Sycamore N, Tester J, Thomas DW, Tracey A, Tromans A, Tubby B, Wall M, Wallis JM, West AP, Whitehead SL, Willey DL, Wilming L, Wray PW, Wright MW, Young L, Coulson A, Durbin R, Hubbard T, Sulston JE, Beck S, Bentley DR, Rogers J and Ross MT

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. ad1@sanger.ac.uk

    Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.

    Nature 2004;428;6982;522-8

  • The DNA sequence and analysis of human chromosome 6.

    Mungall AJ, Palmer SA, Sims SK, Edwards CA, Ashurst JL, Wilming L, Jones MC, Horton R, Hunt SE, Scott CE, Gilbert JG, Clamp ME, Bethel G, Milne S, Ainscough R, Almeida JP, Ambrose KD, Andrews TD, Ashwell RI, Babbage AK, Bagguley CL, Bailey J, Banerjee R, Barker DJ, Barlow KF, Bates K, Beare DM, Beasley H, Beasley O, Bird CP, Blakey S, Bray-Allen S, Brook J, Brown AJ, Brown JY, Burford DC, Burrill W, Burton J, Carder C, Carter NP, Chapman JC, Clark SY, Clark G, Clee CM, Clegg S, Cobley V, Collier RE, Collins JE, Colman LK, Corby NR, Coville GJ, Culley KM, Dhami P, Davies J, Dunn M, Earthrowl ME, Ellington AE, Evans KA, Faulkner L, Francis MD, Frankish A, Frankland J, French L, Garner P, Garnett J, Ghori MJ, Gilby LM, Gillson CJ, Glithero RJ, Grafham DV, Grant M, Gribble S, Griffiths C, Griffiths M, Hall R, Halls KS, Hammond S, Harley JL, Hart EA, Heath PD, Heathcott R, Holmes SJ, Howden PJ, Howe KL, Howell GR, Huckle E, Humphray SJ, Humphries MD, Hunt AR, Johnson CM, Joy AA, Kay M, Keenan SJ, Kimberley AM, King A, Laird GK, Langford C, Lawlor S, Leongamornlert DA, Leversha M, Lloyd CR, Lloyd DM, Loveland JE, Lovell J, Martin S, Mashreghi-Mohammadi M, Maslen GL, Matthews L, McCann OT, McLaren SJ, McLay K, McMurray A, Moore MJ, Mullikin JC, Niblett D, Nickerson T, Novik KL, Oliver K, Overton-Larty EK, Parker A, Patel R, Pearce AV, Peck AI, Phillimore B, Phillips S, Plumb RW, Porter KM, Ramsey Y, Ranby SA, Rice CM, Ross MT, Searle SM, Sehra HK, Sheridan E, Skuce CD, Smith S, Smith M, Spraggon L, Squares SL, Steward CA, Sycamore N, Tamlyn-Hall G, Tester J, Theaker AJ, Thomas DW, Thorpe A, Tracey A, Tromans A, Tubby B, Wall M, Wallis JM, West AP, White SS, Whitehead SL, Whittaker H, Wild A, Willey DJ, Wilmer TE, Wood JM, Wray PW, Wyatt JC, Young L, Younger RM, Bentley DR, Coulson A, Durbin R, Hubbard T, Sulston JE, Dunham I, Rogers J and Beck S

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. ajm@sanger.ac.uk

    Chromosome 6 is a metacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs, representing the largest chromosome sequenced so far. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes. Here we report that at least 96% of the protein-coding genes have been identified, as assessed by multi-species comparative sequence analysis, and provide evidence for the presence of further, otherwise unsupported exons/genes. Among these are genes directly implicated in cancer, schizophrenia, autoimmunity and many other diseases. Chromosome 6 harbours the largest transfer RNA gene cluster in the genome; we show that this cluster co-localizes with a region of high transcriptional activity. Within the essential immune loci of the major histocompatibility complex, we find HLA-B to be the most polymorphic gene on chromosome 6 and in the human genome.

    Nature 2003;425;6960;805-11

  • The DNA sequence and comparative analysis of human chromosome 20.

    Deloukas P, Matthews LH, Ashurst J, Burton J, Gilbert JG, Jones M, Stavrides G, Almeida JP, Babbage AK, Bagguley CL, Bailey J, Barlow KF, Bates KN, Beard LM, Beare DM, Beasley OP, Bird CP, Blakey SE, Bridgeman AM, Brown AJ, Buck D, Burrill W, Butler AP, Carder C, Carter NP, Chapman JC, Clamp M, Clark G, Clark LN, Clark SY, Clee CM, Clegg S, Cobley VE, Collier RE, Connor R, Corby NR, Coulson A, Coville GJ, Deadman R, Dhami P, Dunn M, Ellington AG, Frankland JA, Fraser A, French L, Garner P, Grafham DV, Griffiths C, Griffiths MN, Gwilliam R, Hall RE, Hammond S, Harley JL, Heath PD, Ho S, Holden JL, Howden PJ, Huckle E, Hunt AR, Hunt SE, Jekosch K, Johnson CM, Johnson D, Kay MP, Kimberley AM, King A, Knights A, Laird GK, Lawlor S, Lehvaslaiho MH, Leversha M, Lloyd C, Lloyd DM, Lovell JD, Marsh VL, Martin SL, McConnachie LJ, McLay K, McMurray AA, Milne S, Mistry D, Moore MJ, Mullikin JC, Nickerson T, Oliver K, Parker A, Patel R, Pearce TA, Peck AI, Phillimore BJ, Prathalingam SR, Plumb RW, Ramsay H, Rice CM, Ross MT, Scott CE, Sehra HK, Shownkeen R, Sims S, Skuce CD, Smith ML, Soderlund C, Steward CA, Sulston JE, Swann M, Sycamore N, Taylor R, Tee L, Thomas DW, Thorpe A, Tracey A, Tromans AC, Vaudin M, Wall M, Wallis JM, Whitehead SL, Whittaker P, Willey DL, Williams L, Williams SA, Wilming L, Wray PW, Hubbard T, Durbin RM, Bentley DR, Beck S and Rogers J

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. panos@sanger.ac.uk

    The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.

    Nature 2001;414;6866;865-71

Alessandra Traini

at8@sanger.ac.uk Senior Bioinformatician

I studied Physics at the University of Rome and obtained my PhD in Computational Biology at the Second University of Naples (Italy) in 2009. I have been involved in the international Tomato Genome project as part of the bioinformatics group for the annotation (iTAG).

Up to the end of 2012 I was post-doc at the University of Naples, where I was in charge of fitting out and managing the Italian bioinformatic multilevel platform plant ‘omics data for the Solanaceae species.

For all of 2013 I was Bioinformatician at The East Malling Research (EMR) responsible for NGS data analysis and management.

Research

I joined the parasite genomics group as a Senior Bioinformatician in 2014.

My role is focused on genome annotation, comparative genomics and data organization, with the final aim to construct a biological database resource to provide rapid access to the increasing number of available genomes from parasitic helminths.

References

  • Genome Microscale Heterogeneity among Wild Potatoes Revealed by Diversity Arrays Technology Marker Sequences.

    Traini A, Iorizzo M, Mann H, Bradeen JM, Carputo D, Frusciante L and Chiusano ML

    Department of Agricultural Sciences, University of Naples Federico II, Via Università 100, 80055 Portici, Naples, Italy.

    Tuber-bearing potato species possess several genes that can be exploited to improve the genetic background of the cultivated potato Solanum tuberosum. Among them, S. bulbocastanum and S. commersonii are well known for their strong resistance to environmental stresses. However, scant information is available for these species in terms of genome organization, gene function, and regulatory networks. Consequently, genomic tools to assist breeding are meager, and efficient exploitation of these species has been limited so far. In this paper, we employed the reference genome sequences from cultivated potato and tomato and a collection of sequences of 1,423 potato Diversity Arrays Technology (DArT) markers that show polymorphic representation across the genomes of S. bulbocastanum and/or S. commersonii genotypes. Our results highlighted microscale genome sequence heterogeneity that may play a significant role in functional and structural divergence between related species. Our analytical approach provides knowledge of genome structural and sequence variability that could not be detected by transcriptome and proteome approaches.

    International journal of genomics 2013;2013;257218

  • Use of MSAP markers to analyse the effects of salt stress on DNA methylation in rapeseed (Brassica napus var. oleifera).

    Marconi G, Pace R, Traini A, Raggi L, Lutts S, Chiusano M, Guiducci M, Falcinelli M, Benincasa P and Albertini E

    Department of Applied Biology, University of Perugia, Perugia, Italy.

    Excessive soil salinity is a major ecological and agronomical problem, the adverse effects of which are becoming a serious issue in regions where saline water is used for irrigation. Plants can employ regulatory strategies, such as DNA methylation, to enable relatively rapid adaptation to new conditions. In this regard, cytosine methylation might play an integral role in the regulation of gene expression at both the transcriptional and post-transcriptional levels. Rapeseed, which is the most important oilseed crop in Europe, is classified as being tolerant of salinity, although cultivars can vary substantially in their levels of tolerance. In this study, the Methylation Sensitive Amplified Polymorphism (MSAP) approach was used to assess the extent of cytosine methylation under salinity stress in salinity-tolerant (Exagone) and salinity-sensitive (Toccata) rapeseed cultivars. Our data show that salinity affected the level of DNA methylation. In particular methylation decreased in Exagone and increased in Toccata. Nineteen DNA fragments showing polymorphisms related to differences in methylation were sequenced. In particular, two of these were highly similar to genes involved in stress responses (Lacerata and trehalose-6-phosphatase synthase S4) and were chosen to further characterization. Bisulfite sequencing and quantitative RT-PCR analysis of selected MSAP loci showed that cytosine methylation changes under salinity as well as gene expression varied. In particular, our data show that salinity stress influences the expression of the two stress-related genes. Moreover, we quantified the level of trehalose in Exagone shoots and found that it was correlated to TPS4 expression and, therefore, to DNA methylation. In conclusion, we found that salinity could induce genome-wide changes in DNA methylation status, and that these changes, when averaged across different genotypes and developmental stages, accounted for 16.8% of the total site-specific methylation differences in the rapeseed genome, as detected by MSAP analysis.

    PloS one 2013;8;9;e75597

  • The tomato genome sequence provides insights into fleshy fruit evolution.

    Tomato Genome Consortium

    Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C509731/1, BB/G006199/1

    Nature 2012;485;7400;635-41

  • Euchromatic and heterochromatic compositional properties emerging from the analysis of Solanum lycopersicum BAC sequences.

    Di Filippo M, Traini A, D'Agostino N, Frusciante L and Chiusano ML

    University of Naples Federico II, Dept. of Soil, Plant, Environmental and Animal Production Sciences, Via Università 100, 80055 Portici, Italy. miriam.difilippo@gmail.com

    The consortium responsible for the sequencing of the tomato (Solanum lycopersicum) genome initially focused on the sequencing of the euchromatic regions using a BAC-by-BAC strategy. We analyzed the compositional features of the whole collection of BAC sequences publically available. This analysis highlights specific peculiarities of heterochromatic and euchromatic BACs, in particular: the whole BAC collection has i) a large variability in repeat and gene content, ii) a positive and significant correlation of LTR retrotransposons of the Gypsy class with the repeat content and iii) the preferential location of the SINEs (short interspersed nuclear elements) in BAC sequences showing a low repeat content. Our results point out a typical design of the tomato chromosomes and pave the way for further investigations on the relationship between DNA primary structure and chromatin organization in Solanaceae genomes.

    Gene 2012;499;1;176-81

  • Evolutionary meta-analysis of solanaceous resistance gene and solanum resistance gene analog sequences and a practical framework for cross-species comparisons.

    Quirin EA, Mann H, Meyer RS, Traini A, Chiusano ML, Litt A and Bradeen JM

    University of Minnesota, Department of Plant Pathology, 495 Borlaug Hall/1991 Upper Buford Circle, St. Paul, MN 55108,USA.

    Cross-species comparative genomics approaches have been employed to map and clone many important disease resistance (R) genes from Solanum species-especially wild relatives of potato and tomato. These efforts will increase with the recent release of potato genome sequence and the impending release of tomato genome sequence. Most R genes belong to the prominent nucleotide binding site-leucine rich repeat (NBS-LRR) class and conserved NBS-LRR protein motifs enable survey of the R gene space of a plant genome by generation of resistance gene analogs (RGA), polymerase chain reaction fragments derived from R genes. We generated a collection of 97 RGA from the disease-resistant wild potato S. bulbocastanum, complementing smaller collections from other Solanum species. To further comparative genomics approaches, we combined all known Solanum RGA and cloned solanaceous NBS-LRR gene sequences, nearly 800 sequences in total, into a single meta-analysis. We defined R gene diversity bins that reflect both evolutionary relationships and DNA cross-hybridization results. The resulting framework is amendable and expandable, providing the research community with a common vocabulary for present and future study of R gene lineages. Through a series of sequence and hybridization experiments, we demonstrate that all tested R gene lineages are of ancient origin, are shared between Solanum species, and can be successfully accessed via comparative genomics approaches.

    Molecular plant-microbe interactions : MPMI 2012;25;5;603-12

  • Estrogen receptor alpha controls a gene network in luminal-like breast cancer cells comprising multiple transcription factors and microRNAs.

    Cicatiello L, Mutarelli M, Grober OM, Paris O, Ferraro L, Ravo M, Tarallo R, Luo S, Schroth GP, Seifert M, Zinser C, Chiusano ML, Traini A, De Bortoli M and Weisz A

    Department of General Pathology, Second University of Naples, Napoli, Italy.

    Luminal-like breast tumor cells express estrogen receptor alpha (ERalpha), a member of the nuclear receptor family of ligand-activated transcription factors that controls their proliferation, survival, and functional status. To identify the molecular determinants of this hormone-responsive tumor phenotype, a comprehensive genome-wide analysis was performed in estrogen stimulated MCF-7 and ZR-75.1 cells by integrating time-course mRNA expression profiling with global mapping of genomic ERalpha binding sites by chromatin immunoprecipitation coupled to massively parallel sequencing, microRNA expression profiling, and in silico analysis of transcription units and receptor binding regions identified. All 1270 genes that were found to respond to 17beta-estradiol in both cell lines cluster in 33 highly concordant groups, each of which showed defined kinetics of RNA changes. This hormone-responsive gene set includes several direct targets of ERalpha and is organized in a gene regulation cascade, stemming from ligand-activated receptor and reaching a large number of downstream targets via AP-2gamma, B-cell activating transcription factor, E2F1 and 2, E74-like factor 3, GTF2IRD1, hairy and enhancer of split homologue-1, MYB, SMAD3, RARalpha, and RXRalpha transcription factors. MicroRNAs are also integral components of this gene regulation network because miR-107, miR-424, miR-570, miR-618, and miR-760 are regulated by 17beta-estradiol along with other microRNAs that can target a significant number of transcripts belonging to one or more estrogen-responsive gene clusters.

    The American journal of pathology 2010;176;5;2113-30

  • SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes.

    D'Agostino N, Traini A, Frusciante L and Chiusano ML

    University of Naples 'Federico II', Dept of Soil, Plant, Environmental and Animal Production Sciences, Via Università 100, 80055 Portici, Italy. nunzio.dagostino@gmail.com

    Background: Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of Solanaceae transcriptomes, an attractive route for understanding Solanaceae genome functionality and a powerful reference for the structural annotation of emerging Solanaceae genome sequences.

    Description: We describe the SolEST database http://biosrv.cab.unina.it/solestdb which integrates different EST datasets from both cultivated and wild Solanaceae species and from two species of the genus Coffea. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing Solanaceae EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different Solanaceae assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community.

    Conclusion: Different databases have been established worldwide for collecting Solanaceae ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of Solanaceae transcriptomes.

    BMC plant biology 2009;9;142

  • ISOL@: an Italian SOLAnaceae genomics resource.

    Chiusano ML, D'Agostino N, Traini A, Licciardello C, Raimondo E, Aversano M, Frusciante L and Monti L

    Department of Soil, Plant, Environmental and Animal Production Sciences, University Federico II of Naples, Portici (NA), Italy. chiusano@unina.it

    Background: Present-day '-omics' technologies produce overwhelming amounts of data which include genome sequences, information on gene expression (transcripts and proteins) and on cell metabolic status. These data represent multiple aspects of a biological system and need to be investigated as a whole to shed light on the mechanisms which underpin the system functionality. The gathering and convergence of data generated by high-throughput technologies, the effective integration of different data-sources and the analysis of the information content based on comparative approaches are key methods for meaningful biological interpretations. In the frame of the International Solanaceae Genome Project, we propose here ISOLA, an Italian SOLAnaceae genomics resource.

    Results: ISOLA (available at http://biosrv.cab.unina.it/isola) represents a trial platform and it is conceived as a multi-level computational environment.ISOLA currently consists of two main levels: the genome and the expression level. The cornerstone of the genome level is represented by the Solanum lycopersicum genome draft sequences generated by the International Tomato Genome Sequencing Consortium. Instead, the basic element of the expression level is the transcriptome information from different Solanaceae species, mainly in the form of species-specific comprehensive collections of Expressed Sequence Tags (ESTs). The cross-talk between the genome and the expression levels is based on data source sharing and on tools that enhance data quality, that extract information content from the levels' under parts and produce value-added biological knowledge.

    Conclusions: ISOLA is the result of a bioinformatics effort that addresses the challenges of the post-genomics era. It is designed to exploit '-omics' data based on effective integration to acquire biological knowledge and to approach a systems biology view. Beyond providing experimental biologists with a preliminary annotation of the tomato genome, this effort aims to produce a trial computational environment where different aspects and details are maintained as they are relevant for the analysis of the organization, the functionality and the evolution of the Solanaceae family.

    BMC bioinformatics 2008;9 Suppl 2;S7

  • Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome.

    D'Agostino N, Traini A, Frusciante L and Chiusano ML

    Department of Structural and Functional Biology, University Federico II, 80126 Naples, Italy. nunzio.dagostino@gmail.com

    Background: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes.

    Results: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences.GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium.

    Conclusion: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation.

    BMC bioinformatics 2007;8 Suppl 1;S9

Magdalena Zarowiecki

mz3@sanger.ac.uk unknown

My research interests are tropical diseases, in particular the evolution of parasitism and host-parasite interactions. I did a M.Sc. in Zoological Systematic at Gothenburg University, Sweden, and an M.Res. in Biosystematics, at Natural History Museum and Imperial College, London. I worked with many non-model worms; ribbon worms, Oligochaetes, Cestodes and Trematodes. I also have interests in the wider field of tropical diseases from a Ph.D. in population genetics of mosquitoes. I previously held a postdoctoral position funded by the SynTax scheme; working with assembly and annotation of the Hymenolepis microstoma genome, and comparative phylogeny of flatworms.

Research

The current research is focusing on genomics of parasitic flatworms, including important platyhelminth parasites of humans in the genera Taenia, Hymenolepis, Echinococcus and Schistosoma. These platyhelminths have severe impact on the health and productivity of the poorest people in developing countries. The aim of the post-doc project is to develop comparative genomics of flatworms within the Parasite Genomics group. We use high-throughput approaches including RNAseq, gene-prediction, methylome studies, re-sequencing and microRNA-studies to increase the accuracy and biological depth of our platyhelminth genome annotations. Producing good-quality genomes, gene models and annotations is a vital underpinning for future translational research.

References

  • Cestode genomics - progress and prospects for advancing basic and applied aspects of flatworm biology.

    Olson PD, Zarowiecki M, Kiss F and Brehm K

    Department of Zoology, The Natural History Museum, London, UK.

    Characterization of the first tapeworm genome, Echinococcus multilocularis, is now nearly complete, and genome assemblies of E. granulosus, Taenia solium and Hymenolepis microstoma are in advanced draft versions. These initiatives herald the beginning of a genomic era in cestodology and underpin a diverse set of research agendas targeting both basic and applied aspects of tapeworm biology. We discuss the progress in the genomics of these species, provide insights into the presence and composition of immunologically relevant gene families, including the antigen B- and EG95/45W families, and discuss chemogenomic approaches toward the development of novel chemotherapeutics against cestode diseases. In addition, we discuss the evolution of tapeworm parasites and introduce the research programmes linked to genome initiatives that are aimed at understanding signalling systems involved in basic host-parasite interactions and morphogenesis.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151

    Parasite immunology 2012;34;2-3;130-50

  • Animals learn new tricks from microorganisms.

    Zarowiecki M

    Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. microbes@sanger.ac.uk

    Nature reviews. Microbiology 2011;9;12;836

  • Towards a new role for vector systematics in parasite control.

    Zarowiecki M, Loaiza JR and Conn JE

    Dept. of Zoology, Natural History Museum, London SW75BD, UK. mz3@sanger.ac.uk

    Vector systematics research is being transformed by the recent development of theoretical, experimental and analytical methods, as well as conceptual insights into speciation and reconstruction of evolutionary history. We review this progress using examples from the mosquito genus Anopheles. The conclusion is that recent progress, particularly in the development of better tools for understanding evolutionary history, makes systematics much more informative for vector control purposes, and has increasing potential to inform and improve targeted vector control programmes.

    Parasitology 2011;138;13;1723-9

  • Rapid evolution of yeast centromeres in the absence of drive.

    Bensasson D, Zarowiecki M, Burt A and Koufopanou V

    Division of Biology, Imperial College London, Ascot SL5 7PY, United Kingdom.

    To find the most rapidly evolving regions in the yeast genome we compared most of chromosome III from three closely related lineages of the wild yeast Saccharomyces paradoxus. Unexpectedly, the centromere appears to be the fastest-evolving part of the chromosome, evolving even faster than DNA sequences unlikely to be under selective constraint (i.e., synonymous sites after correcting for codon usage bias and remnant transposable elements). Centromeres on other chromosomes also show an elevated rate of nucleotide substitution. Rapid centromere evolution has also been reported for some plants and animals and has been attributed to selection for inclusion in the egg or the ovule at female meiosis. But Saccharomyces yeasts have symmetrical meioses with all four products surviving, thus providing no opportunity for meiotic drive. In addition, yeast centromeres show the high levels of polymorphism expected under a neutral model of molecular evolution. We suggest that yeast centromeres suffer an elevated rate of mutation relative to other chromosomal regions and they change through a process of "centromere drift," not drive.

    Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust

    Genetics 2008;178;4;2161-7

  • Making the most of mitochondrial genomes--markers for phylogeny, molecular ecology and barcodes in Schistosoma (Platyhelminthes: Digenea).

    Zarowiecki MZ, Huyse T and Littlewood DT

    Wolfson Wellcome Biomedical Laboratories, Department of Zoology, Natural History Museum, Cromwell Road, London SW7 5BD, UK.

    An increasing number of complete sequences of mitochondrial (mt) genomes provides the opportunity to optimise the choice of molecular markers for phylogenetic and ecological studies. This is particularly the case where mt genomes from closely related taxa have been sequenced; e.g., within Schistosoma. These blood flukes include species that are the causative agents of schistosomiasis, where there has been a need to optimise markers for species and strain recognition. For many phylogenetic and population genetic studies, the choice of nucleotide sequences depends primarily on suitable PCR primers. Complete mt genomes allow individual gene or other mt markers to be assessed relative to one another for potential information content, prior to broad-scale sampling. We assess the phylogenetic utility of individual genes and identify regions that contain the greatest interspecific variation for molecular ecological and diagnostic markers. We show that variable characters are not randomly distributed along the genome and there is a positive correlation between polymorphism and divergence. The mt genomes of African and Asian schistosomes were compared with the available intraspecific dataset of Schistosoma mansoni through sliding window analyses, in order to assess whether the observed polymorphism was at a level predicted from interspecific comparisons. We found a positive correlation except for the two genes (cox1 and nad1) adjoining the putative control region in S. mansoni. The genes nad1, nad4, nad5, cox1 and cox3 resolved phylogenies that were consistent with a benchmark phylogeny and in general, longer genes performed better in phylogenetic reconstruction. Considering the information content of entire mt genome sequences, partial cox1 would not be the ideal marker for either species identification (barcoding) or population studies with Schistosoma species. Instead, we suggest the use of cox3 and nad5 for both phylogenetic and population studies. Five primer pairs designed against Schistosoma mekongi and Schistosoma malayensis were tested successfully against Schistosoma japonicum. In combination, these fragments encompass 20-27% of the variation amongst the genomes (average total length approximately 14,000bp), thus providing an efficient means of encapsulating the greatest amount of variation within the shortest sequence. Comparative mitogenomics provides the basis of a rational approach to molecular marker selection and optimisation.

    International journal for parasitology 2007;37;12;1401-18

Background

We are interested in studying the diversity of eukaryotic parasites and their complex interactions with their hosts. In particular, we wish to uncover the genomic basis for differences in the biology of parasites causing malaria and Neglected Tropical Diseases. Our approach starts with the establishment of a reference genome, followed by comparative sequencing of related strains or species to find candidate genes (or other sequences) relating to species-specific differences, such as diseases tropisms.

* quick link - http://q.sanger.ac.uk/paragen