Research
The Parasite Genomics group uses genome sequencing, comparative and functional genomics to investigate the biology of helminths and protozoan parasites.
Sequencing genomes
Perhaps the single most useful tool for any molecular biologist is a high quality genome sequence for their organism of interest. We are closely partnered with scientific communities interested in particular organisms and through this collaborative network we acquire DNA samples. We then utilise the outstanding sequencing facilities at the Sanger Institute to generate the data from which we can put together draft genome sequences. We develop computational tools to improve genome sequences, but also use manual improvement and this allows us to produce very high quality genomes which improve the quality of our collaborator's research. Our gold standard is the malaria reference genome, which we have been carefully curating for ten years.
Understanding genomes
Functional genomics deals with dynamic biological data, such as changes in the transcriptome, proteome and epigenome in the course of a parasite's life cycle. We make and use large-scale data sets to ask questions about the functions of parasite genes, using genome sequences to support our analysis. Many genes are unique to parasites, so we need these new data sets to unveil the functions of uncharacterised parasite genes. We can often infer functional information about a gene by understanding when and where it is expressed in the life cycle of the parasite, or identifying which genes change upon interaction with the parasite's host. High-throughput sequencing is a key tool in functional genomics, underpinning methods such as RNA-seq and ChIP-seq.
We apply these approaches to:
Helminths
Despite their importance globally, both medically and economically, parasitic helminth (worm) research has remained relatively untouched by genomics. Worm infections account for morbidity equivalent to more than 100 million disability-adjusted life years from more than one billion infections globally. With this in mind, we have developed the Sanger Helminth Genomes Initiative. Initially we are using de novo sequencing to produce reference genomes for a cross-phyla list that includes hookworms, whipworms, threadworms, Schistosomes, a tapeworm and the filarial parasite responsible for river blindness. We are also producing draft genomes of a broad list of parasitic helminths.
Protozoa
Amongst the protozoan parasites, we focus on two areas:
- The Apicomplexa, including malaria parasites
- The Kinetoplastida, which include Trypanosoma and Leishmania parasites.
We have built comparative genomic studies around high-quality reference genomes and while this continues we are also embarking on studies to understand host-parasite interactions and population structure.
Data download
Sequence data is available for download.
- Reference and comparator Helminth genomes.
- Genomes from the Helminth Genomes Initiative are accessible from the FTP site. These are available from the Sanger Institute as part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see the data sharing policy.
- Complete, ongoing and forthcoming Protozoan genomes.
Resources
Databases
- WormBase. Genetics of C elegans and related nematodes.
- GeneDB. A window to our annotation as it is produced.
Tool development
Software that supports our annotation and analysis are under constant development. In particular, we work with the Pathogen Informatics team to develop
- ABACAS. Rapidly contiguates (align, order, orientate), visualizes and designs primers to close gaps on shotgun assembled contigs based on a reference sequence.
- Artemis and ACT. Portable and intuitive sequence viewing and browsing tools. Recently a new Chado database version has been launched.
- iCORN. Corrects reference genome sequences by iteratively mapping reads and finding differences in the sequence.
- IMAGE. Closes gaps in a draft assembly using Illumina paired end reads.
- PAGIT. Generates high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
- RATT. Transfer annotation from a reference (annotated) genome to an unannotated query genome.
- REAPR. Evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison.
Selected Publications
-
The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.
Genome biology 2013;14;8;R88
PUBMED: 23985316; DOI: 10.1186/gb-2013-14-8-r88
-
Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.
Molecular microbiology 2013
PUBMED: 23980881; DOI: 10.1111/mmi.12381
-
Vector transmission regulates immune control of Plasmodium virulence.
Nature 2013;498;7453;228-31
PUBMED: 23719378; DOI: 10.1038/nature12231
-
REAPR: a universal tool for genome assembly evaluation.
Genome biology 2013;14;5;R47
PUBMED: 23710727; DOI: 10.1186/gb-2013-14-5-r47
-
The genomes of four tapeworm species reveal adaptations to parasitism.
Nature 2013;496;7443;57-63
PUBMED: 23485966; DOI: 10.1038/nature12031
-
Efficient depletion of host DNA contamination in malaria clinical sequencing.
Journal of clinical microbiology 2013;51;3;745-51
PUBMED: 23224084; PMC: 3592063; DOI: 10.1128/JCM.02507-12
-
Genes involved in host-parasite interactions can be revealed by their correlated expression.
Nucleic acids research 2013;41;3;1508-18
PUBMED: 23275547; PMC: 3561955; DOI: 10.1093/nar/gks1340
-
Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.
PLoS neglected tropical diseases 2013;7;3;e2091
PUBMED: 23516644; PMC: 3597483; DOI: 10.1371/journal.pntd.0002091
-
Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.
Nature 2012;487;7407;375-9
PUBMED: 22722859; PMC: 3738909; DOI: 10.1038/nature11174
-
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.
Nature protocols 2012;7;7;1260-84
PUBMED: 22678431; PMC: 3648784; DOI: 10.1038/nprot.2012.068
-
A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.
PLoS neglected tropical diseases 2012;6;1;e1455
PUBMED: 22253936; PMC: 3254664; DOI: 10.1371/journal.pntd.0001455
-
Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.
PLoS pathogens 2012;8;3;e1002567
PUBMED: 22457617; PMC: 3310773; DOI: 10.1371/journal.ppat.1002567
-
Germline transgenesis and insertional mutagenesis in Schistosoma mansoni mediated by murine leukemia virus.
PLoS pathogens 2012;8;7;e1002820
PUBMED: 22911241; PMC: 3406096; DOI: 10.1371/journal.ppat.1002820
-
Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.
Genome research 2011;21;12;2143-56
PUBMED: 22038251; PMC: 3227103; DOI: 10.1101/gr.123430.111
-
RATT: Rapid Annotation Transfer Tool.
Nucleic acids research 2011;39;9;e57
PUBMED: 21306991; PMC: 3089447; DOI: 10.1093/nar/gkq1268
-
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.
Bioinformatics (Oxford, England) 2010;26;14;1704-7
PUBMED: 20562415; PMC: 2894513; DOI: 10.1093/bioinformatics/btq269
-
New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.
Molecular microbiology 2010;76;1;12-24
PUBMED: 20141604; PMC: 2859250; DOI: 10.1111/j.1365-2958.2009.07026.x
-
Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.
Genome biology 2010;11;4;R41
PUBMED: 20388197; PMC: 2884544; DOI: 10.1186/gb-2010-11-4-r41
-
ABACAS: algorithm-based automatic contiguation of assembled sequences.
Bioinformatics (Oxford, England) 2009;25;15;1968-9
PUBMED: 19497936; PMC: 2712343; DOI: 10.1093/bioinformatics/btp347
-
The genome of the blood fluke Schistosoma mansoni.
Nature 2009;460;7253;352-8
PUBMED: 19606141; PMC: 2756445; DOI: 10.1038/nature08160
-
Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.
Genome research 2008;18;2;281-92
PUBMED: 18096748; PMC: 2203626; DOI: 10.1101/gr.6836108
-
Schistosoma mansoni genome: closing in on a final gene set.
Experimental parasitology 2007;117;3;225-8
PUBMED: 17643433; DOI: 10.1016/j.exppara.2007.06.005
-
Comparative genomic analysis of three Leishmania species that cause diverse human disease.
Nature genetics 2007;39;7;839-47
PUBMED: 17572675; PMC: 2592530; DOI: 10.1038/ng2053
-
Genome variation and evolution of the malaria parasite Plasmodium falciparum.
Nature genetics 2007;39;1;120-5
PUBMED: 17159978; PMC: 2663918; DOI: 10.1038/ng1931
-
ACT: the Artemis Comparison Tool.
Bioinformatics (Oxford, England) 2005;21;16;3422-3
PUBMED: 15976072; DOI: 10.1093/bioinformatics/bti553
-
The genome of the African trypanosome Trypanosoma brucei.
Science (New York, N.Y.) 2005;309;5733;416-22
PUBMED: 16020726; DOI: 10.1126/science.1112642
-
The genome of the kinetoplastid parasite, Leishmania major.
Science (New York, N.Y.) 2005;309;5733;436-42
PUBMED: 16020728; PMC: 1470643; DOI: 10.1126/science.1112680
-
Comparative genomics of trypanosomatid parasitic protozoa.
Science (New York, N.Y.) 2005;309;5733;404-9
PUBMED: 16020724; DOI: 10.1126/science.1112181
-
Genome of the host-cell transforming parasite Theileria annulata compared with T. parva.
Science (New York, N.Y.) 2005;309;5731;131-3
PUBMED: 15994557; DOI: 10.1126/science.1110418
-
A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses.
Science (New York, N.Y.) 2005;307;5706;82-6
PUBMED: 15637271; DOI: 10.1126/science.1103717
-
Viewing and annotating sequence data with Artemis.
Briefings in bioinformatics 2003;4;2;124-32
-
Genome sequence of the human malaria parasite Plasmodium falciparum.
Nature 2002;419;6906;498-511
PUBMED: 12368864; DOI: 10.1038/nature01097
-
The architecture of variant surface glycoprotein gene expression sites in Trypanosoma brucei.
Molecular and biochemical parasitology 2002;122;2;131-40
PUBMED: 12106867
Team
Team members
Members
- Helen Beasley
- Computer Biologist - Senior Genome Analyst
- Hayley Bennett
- hb6@sanger.ac.ukPostdoctoral Fellow
- Lia Chappell
- lc5@sanger.ac.ukunknown
- Avril Coghlan
- alc@sanger.ac.ukSenior Bioinformatician
- James Cotton
- jc17@sanger.ac.ukSenior Staff Scientist
- Bernardo Foth
- bf3@sanger.ac.ukSenior Staff Scientist
- Tom Huckvale
- Advanced Research Assistant
- Sarah Nichol
- Computer Biologist - Senior Genome Analyst
- Thomas Otto
- Senior Staff Scientist
- Anna Protasio
- ap6@sanger.ac.ukPostdoctoral Fellow
- Adam Reid
- ar11@sanger.ac.ukPostdoctoral Fellow
- Florian Sessler
- fs8@sanger.ac.ukPhD Student
- Eleanor Stanley
- es9@sanger.ac.ukSenior Bioinformatician
- Alan Tracey
- Computer Biologist - Senior Genome Analyst
- Magdalena Zarowiecki
- mz3@sanger.ac.ukPostdoctoral Fellow
Helen Beasley
- Computer Biologist - Senior Genome Analyst
I graduated from the University of Paisley with BSc(Hons) in Biotechnology, where I developed a keen interest in plant breeding and crop improvement through recombinant DNA techniques. My research project focused on improving somatic hybridisation in Solanaceae species and this led me to study for a MSc in Plant Genetic Manipulation at the University of Nottingham. I joined the Sanger Institute as a finisher working on the Human Genome Project then other large genomes including zebrafish, mouse, pig, and tomato. Latterly I worked on finishing more problematic regions, alongside the coordination of some broad ranging collaborative projects.
Research
I joined the parasite genomics group as a Senior Genome Analyst in 2010 working on the manual improvement of helminths. I have worked on a number of helminth genomes including Schistosoma mansoni, Echinococcus species and Globodera pallida; improving assemblies at the sequence level using software tools to close gaps and resolve mis-assemblies, and through the manual curation of genes and gene training sets used to improve the accuracy of gene prediction software.
References
-
The genomes of four tapeworm species reveal adaptations to parasitism.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 098051
Nature 2013;496;7443;57-63
PUBMED: 23485966; DOI: 10.1038/nature12031
-
The tomato genome sequence provides insights into fleshy fruit evolution.
Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.
Funded by: Biotechnology and Biological Sciences Research Council: BB/C509731/1, BB/G006199/1
Nature 2012;485;7400;635-41
PUBMED: 22660326; PMC: 3378239; DOI: 10.1038/nature11119
-
Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry.
CNRS UMR 7205, Muséum National d'Histoire Naturelle, CP50, 45 Rue Buffon, 75005 Paris, France. joron@mnhn.fr
Supergenes are tight clusters of loci that facilitate the co-segregation of adaptive variation, providing integrated control of complex adaptive phenotypes. Polymorphic supergenes, in which specific combinations of traits are maintained within a single population, were first described for 'pin' and 'thrum' floral types in Primula and Fagopyrum, but classic examples are also found in insect mimicry and snail morphology. Understanding the evolutionary mechanisms that generate these co-adapted gene sets, as well as the mode of limiting the production of unfit recombinant forms, remains a substantial challenge. Here we show that individual wing-pattern morphs in the polymorphic mimetic butterfly Heliconius numata are associated with different genomic rearrangements at the supergene locus P. These rearrangements tighten the genetic linkage between at least two colour-pattern loci that are known to recombine in closely related species, with complete suppression of recombination being observed in experimental crosses across a 400-kilobase interval containing at least 18 genes. In natural populations, notable patterns of linkage disequilibrium (LD) are observed across the entire P region. The resulting divergent haplotype clades and inversion breakpoints are found in complete association with wing-pattern morphs. Our results indicate that allelic combinations at known wing-patterning loci have become locked together in a polymorphic rearrangement at the P locus, forming a supergene that acts as a simple switch between complex adaptive phenotypes found in sympatry. These findings highlight how genomic rearrangements can have a central role in the coexistence of adaptive phenotypes involving several genes acting in concert, by locally limiting recombination and gene flow.
Funded by: Biotechnology and Biological Sciences Research Council: BBE0118451; Wellcome Trust: 079643, 098051
Nature 2011;477;7363;203-6
PUBMED: 21841803; PMC: 3717454; DOI: 10.1038/nature10341
-
Genomic libraries: I. Construction and screening of fosmid genomic libraries.
Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.
Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.
Methods in molecular biology (Clifton, N.J.) 2011;772;37-58
PUBMED: 22065431; DOI: 10.1007/978-1-61779-228-1_3
-
Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.
Sequencing Research and Development, Wellcome Trust Sanger Institute, Cambridge, UK.
Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.
Methods in molecular biology (Clifton, N.J.) 2011;772;59-81
PUBMED: 22065432; DOI: 10.1007/978-1-61779-228-1_4
-
Characterization of a hotspot for mimicry: assembly of a butterfly wing transcriptome to genomic sequence at the HmYb/Sb locus.
Department of Zoology, University of Cambridge, UK.
The mimetic wing patterns of Heliconius butterflies are an excellent example of both adaptive radiation and convergent evolution. Alleles at the HmYb and HmSb loci control the presence/absence of hindwing bar and hindwing margin phenotypes respectively between divergent races of Heliconius melpomene, and also between sister species. Here, we used fine-scale linkage mapping to identify and sequence a BAC tilepath across the HmYb/Sb loci. We also generated transcriptome sequence data for two wing pattern forms of H. melpomene that differed in HmYb/Sb alleles using 454 sequencing technology. Custom scripts were used to process the sequence traces and generate transcriptome assemblies. Genomic sequence for the HmYb/Sb candidate region was annotated both using the MAKER pipeline and manually using transcriptome sequence reads. In total, 28 genes were identified in the HmYb/Sb candidate region, six of which have alternative splice forms. None of these are orthologues of genes previously identified as being expressed in butterfly wing pattern development, implying previously undescribed molecular mechanisms of pattern determination on Heliconius wings. The use of next-generation sequencing has therefore facilitated DNA annotation of a poorly characterized genome, and generated hypotheses regarding the identity of wing pattern at the HmYb/Sb loci.
Funded by: Biotechnology and Biological Sciences Research Council
Molecular ecology 2010;19 Suppl 1;240-54
PUBMED: 20331783; DOI: 10.1111/j.1365-294X.2009.04475.x
-
The genomic sequence and analysis of the swine major histocompatibility complex.
LREG INRA CEA, Jouy en Josas, France.
We describe the generation and analysis of an integrated sequence map of a 2.4-Mb region of pig chromosome 7, comprising the classical class I region, the extended and classical class II regions, and the class III region of the major histocompatibility complex (MHC), also known as swine leukocyte antigen (SLA) complex. We have identified and manually annotated 151 loci, of which 121 are known genes (predicted to be functional), 18 are pseudogenes, 8 are novel CDS loci, 3 are novel transcripts, and 1 is a putative gene. Nearly all of these loci have homologues in other mammalian genomes but orthologues could be identified with confidence for only 123 genes. The 28 genes (including all the SLA class I genes) for which unambiguous orthology to genes within the human reference MHC could not be established are of particular interest with respect to porcine-specific MHC function and evolution. We have compared the porcine MHC to other mammalian MHC regions and identified the differences between them. In comparison to the human MHC, the main differences include the absence of HLA-A and other class I-like loci, the absence of HLA-DP-like loci, and the separation of the extended and classical class II regions from the rest of the MHC by insertion of the centromere. We show that the centromere insertion has occurred within a cluster of BTNL genes located at the boundary of the class II and III regions, which might have resulted in the loss of an orthologue to human C6orf10 from this region.
Funded by: Wellcome Trust
Genomics 2006;88;1;96-110
PUBMED: 16515853; DOI: 10.1016/j.ygeno.2006.01.004
-
The DNA sequence and biological annotation of human chromosome 1.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. sgregory@chg.duhs.duke.edu
The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.
Funded by: Wellcome Trust
Nature 2006;441;7091;315-21
PUBMED: 16710414; DOI: 10.1038/nature04727
-
The DNA sequence and analysis of human chromosome 13.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. ad1@sanger.ac.uk
Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.
Nature 2004;428;6982;522-8
PUBMED: 15057823; PMC: 2665288; DOI: 10.1038/nature02379
Hayley Bennett
hb6@sanger.ac.uk Postdoctoral Fellow
My academic studies started with a degree in Neuroscience from Cardiff University. After working in the Biotech industry for two years, I began my PhD at the University of Bath. My PhD project focused on the neurobiology of nematodes; in particular the characterisation of novel drug targets. Halfway through my PhD I moved with my lab to work at the University of Georgia, USA. Here I enjoyed exposure to a rich diversity of parasite research, and decided to continue to work in this field.
I joined the parasite genomics group at the Wellcome Trust Sanger Institute in June 2012.
Research
My role is focused on using cutting-edge sequencing technology to understand parasitic worms.
Current projects include:
-Sequencing from small input amounts of DNA or RNA
-Sequencing from unusual, rare or clinical samples
-Epigenetic control of transcription and expression
References
-
Microbial genomes as cheat sheets.
Nature reviews. Microbiology 2013;11;5;302
PUBMED: 23563106; DOI: 10.1038/nrmicro3014
-
The genomes of four tapeworm species reveal adaptations to parasitism.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 098051
Nature 2013;496;7443;57-63
PUBMED: 23485966; DOI: 10.1038/nature12031
-
ACR-26: a novel nicotinic receptor subunit of parasitic nematodes.
Department of Infectious Diseases and Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA.
Nematode nicotinic acetylcholine receptors are the targets for many effective anthelmintics, including those recently introduced into the market. We have identified a novel nicotinic receptor subunit sequence, acr-26, that is expressed in all the animal parasitic nematodes we examined from clades III, IV and V, but is not present in the genomes of Trichinella spiralis, Caenorhabditis elegans, Pristionchus pacificus and Meloidogyne spp. In Ascaris suum, ACR-26 is expressed on muscle cells isolated from the head, but not from the mid-body region. Sequence comparisons with other vertebrate and nematode subunits suggested that ACR-26 may be capable of forming a functional homomeric receptor; when acr-26 cRNA was injected into Xenopus oocytes along with Xenopus laevis ric-3 cRNA we occasionally observed the formation of acetylcholine- and nicotine-sensitive channels. The unreliable expression of ACR-26 in vitro may suggest that additional subunits or chaperones may be required for efficient formation of the functional receptors. ACR-26 may represent a novel target for the development of cholinergic anthelmintics specific for animal parasites.
Funded by: Biotechnology and Biological Sciences Research Council
Molecular and biochemical parasitology 2012;183;2;151-7
PUBMED: 22387572; DOI: 10.1016/j.molbiopara.2012.02.010
Lia Chappell
lc5@sanger.ac.uk unknown
I'm on a four year Wellcome Trust PhD studentship here at Sanger, and began the first year of the programme in October 2009. After completing three rotation projects related to parasite biology I began my PhD project in May 2010. I'm jointed supervised by Matt Berriman and Julian Rayner, and interact with researchers in the Parasite Genomics and Malaria programmes.
Before coming to Sanger I completed a Masters degree at the University of Cambridge, specialising in Biochemistry. I was at Emmanuel College for my undergraduate degree, and remain a member as a graduate student.
Research
My PhD project aims to develop techniques that will reduce the amount of biological material required to analyse the transcriptome of Plasmodium parasites. The study of the transcriptome using high throughput sequencing (RNA-seq) is currently limited by the amount of RNA that can be retrieved from samples, so improving on existing methods will allow us to answer a wider range of biological questions.
References
-
Vector transmission regulates immune control of Plasmodium virulence.
Division of Parasitology, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK.
Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents, primates or humans increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission, which is assumed to 'reset' Plasmodium to its original character. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we use mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi to interrogate regulation of parasite virulence. Analysis of SBP P. c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn modifies the elicited mammalian immune response, which in turn attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.
Funded by: Medical Research Council: U117584248; Wellcome Trust: 089553, 098051
Nature 2013;498;7453;228-31
PUBMED: 23719378; PMC: 3784817; DOI: 10.1038/nature12231
-
Finding a needle in a haystack. Microbial metatranscriptomes.
This month's Genome Watch highlights some of the technical challenges that need to be overcome to gain further insight into microbial metatranscriptomes.
Nature reviews. Microbiology 2012;10;7;446
PUBMED: 22699963; DOI: 10.1038/nrmicro2821
-
Expressions of individuality.
Nature reviews. Microbiology 2011;9;10;701
PUBMED: 21921932; DOI: 10.1038/nrmicro2662
Avril Coghlan
alc@sanger.ac.uk Senior Bioinformatician
I studied genetics at Trinity College Dublin, then did a PhD on molecular evolution of nematode genomes with Ken Wolfe at Trinity College Dublin, followed by post-docs with Des Higgins in University College Dublin and with Richard Durbin at the Sanger Institute, Cambridge, working on various topics in phylogenetics, molecular evolution and gene-finding. I was subsequently a lecturer in bioinformatics in University College Cork for four years before joining the parasite genomics group at the Sanger Institute in 2012.
Research
At the Sanger Institute, I'm involved in projects across a range of parasitic species, including parasitic nematodes and schistosomes.
References
-
Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts.
Department Microbiology, University College Cork, Ireland. pwotoole@ucc.ie
Background: The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade.
Results: The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group.
Conclusions: The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host interactions of these motile intestinal lactobacilli.
Microbial cell factories 2011;10 Suppl 1;S13
PUBMED: 21995554; PMC: 3231920; DOI: 10.1186/1475-2859-10-S1-S13
-
The genome of the blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: WT085775/Z/08/Z
Nature 2009;460;7253;352-8
PUBMED: 19606141; PMC: 2756445; DOI: 10.1038/nature08160
-
TreeFam: 2008 Update.
Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14,351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
Funded by: Wellcome Trust
Nucleic acids research 2008;36;Database issue;D735-40
PUBMED: 18056084; PMC: 2238856; DOI: 10.1093/nar/gkm1005
-
nGASP--the nematode genome annotation assessment project.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. alc@sanger.ac.uk
Background: While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase.
Results: The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders.
Conclusion: This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
Funded by: NHGRI NIH HHS: P41 HG02223; Wellcome Trust
BMC bioinformatics 2008;9;549
PUBMED: 19099578; PMC: 2651883; DOI: 10.1186/1471-2105-9-549
-
Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. alc@sanger.ac.uk
Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.
Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.
Availability: Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix
Funded by: Wellcome Trust: 077192
Bioinformatics (Oxford, England) 2007;23;12;1468-75
PUBMED: 17483502; PMC: 2880447; DOI: 10.1093/bioinformatics/btm133
-
TreeFam: a curated database of phylogenetic trees of animal gene families.
Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.
TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; approximately 40-85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at http://www.treefam.org and http://treefam.genomics.org.cn.
Funded by: Wellcome Trust
Nucleic acids research 2006;34;Database issue;D572-80
PUBMED: 16381935; PMC: 1347480; DOI: 10.1093/nar/gkj118
-
Chromosome evolution in eukaryotes: a multi-kingdom perspective.
Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland.
In eukaryotes, chromosomal rearrangements, such as inversions, translocations and duplications, are common and range from part of a gene to hundreds of genes. Lineage-specific patterns are also seen: translocations are rare in dipteran flies, and angiosperm genomes seem prone to polyploidization. In most eukaryotes, there is a strong association between rearrangement breakpoints and repeat sequences. Current data suggest that some repeats promoted rearrangements via non-allelic homologous recombination, for others the association might not be causal but reflects the instability of particular genomic regions. Rearrangement polymorphisms in eukaryotes are correlated with phenotypic differences, so are thought to confer varying fitness in different habitats. Some seem to be under positive selection because they either trap favorable allele combinations together or alter the expression of nearby genes. There is little evidence that chromosomal rearrangements cause speciation, but they probably intensify reproductive isolation between species that have formed by another route.
Funded by: NHGRI NIH HHS: HG02639; NIGMS NIH HHS: GM58815; Wellcome Trust
Trends in genetics : TIG 2005;21;12;673-82
PUBMED: 16242204; DOI: 10.1016/j.tig.2005.09.009
-
Origins of recently gained introns in Caenorhabditis.
Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.
The genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae both contain approximately 100,000 introns, of which >6,000 are unique to one or the other species. To study the origins of new introns, we used a conservative method involving phylogenetic comparisons to animal orthologs and nematode paralogs to identify cases where an intron content difference between C. elegans and C. briggsae was caused by intron insertion rather than deletion. We identified 81 recently gained introns in C. elegans and 41 in C. briggsae. Novel introns have a stronger exon splice site consensus sequence than the general population of introns and show the same preference for phase 0 sites in codons over phases 1 and 2. More of the novel introns are inserted in genes that are expressed in the C. elegans germ line than expected by chance. Thirteen of the 122 gained introns are in genes whose protein products function in premRNA processing, including three gains in the gene for spliceosomal protein SF3B1 and two in the nonsense-mediated decay gene smg-2. Twenty-eight novel introns have significant DNA sequence identity to other introns, including three that are similar to other introns in the same gene. All of these similarities involve minisatellites or palindromes in the intron sequences. Our results suggest that at least some of the intron gains were caused by reverse splicing of a preexisting intron.
Proceedings of the National Academy of Sciences of the United States of America 2004;101;31;11362-7
PUBMED: 15243155; PMC: 509176; DOI: 10.1073/pnas.0308192101
-
The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.. lstein@cshl.org
The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.
Funded by: NHGRI NIH HHS: 5P01 HG00956, 5U01 HG02042, P41 HG02223; NIGMS NIH HHS: R01 GM42432, T32 GM07754-22
PLoS biology 2003;1;2;E45
PUBMED: 14624247; PMC: 261899; DOI: 10.1371/journal.pbio.0000045
-
Fourfold faster rate of genome rearrangement in nematodes than in Drosophila.
Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.
We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that the species diverged 50-120 million years ago, and that since then there have been 4030 rearrangements between their whole genomes. Our estimate of the rearrangement rate, 0.4-1.0 chromosomal breakages/Mb per Myr, is at least four times that of Drosophila, which was previously reported to be the fastest rate among eukaryotes. The breakpoints of translocations are strongly associated with dispersed repeats and gene family members in the C. elegans genome.
Genome research 2002;12;6;857-67
PUBMED: 12045140; PMC: 1383740; DOI: 10.1101/gr.172702
James Cotton
jc17@sanger.ac.uk Senior Staff Scientist
I studied biology at Oxford, and then did a PhD on gene family evolution with Rod Page at the University of Glasgow, followed by post-docs at the Natural History Museum in London and at the National University of Ireland, Maynooth, working on various topics in phylogenetics and molecular evolution. I was subsequently an RCUK Fellow at Queen Mary, University of London for three years before joining the parasite genomics group in 2010.
Research
At the Sanger Institute, I'm involved in a range of projects across a diverse array of parasitic species, including nematodes, schistosomes and kinetoplastids. I play a leading role in a number of de-novo genome sequencing projects, but particularly focus on projects with a strong comparative or population genomics component.
References
-
Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.
Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.
Funded by: Wellcome Trust: 076355, 085775/Z/08/Z
Genome research 2011;21;12;2143-56
PUBMED: 22038251; PMC: 3227103; DOI: 10.1101/gr.123430.111
-
Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.
Forestry and Forest Products Research Institute, Tsukuba, Japan. kikuchit@affrc.go.jp
Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
PLoS pathogens 2011;7;9;e1002219
PUBMED: 21909270; PMC: 3164644; DOI: 10.1371/journal.ppat.1002219
-
Cetaceans on a molecular fast track to ultrasonic hearing.
School of Life Sciences, East China Normal University, Shanghai, China.
The early radiation of cetaceans coincides with the origin of their defining ecological and sensory differences [1, 2]. Toothed whales (Odontoceti) evolved echolocation for hunting 36-34 million years ago, whereas baleen whales (Mysticeti) evolved filter feeding and do not echolocate [2]. Echolocation in toothed whales demands exceptional high-frequency hearing [3], and both echolocation and ultrasonic hearing have also evolved independently in bats [4, 5]. The motor protein Prestin that drives the electromotility of the outer hair cells (OHCs) is likely to be especially important in ultrasonic hearing, because it is the vibratory response of OHC to incoming sound waves that confers the enhanced sensitivity and selectivity of the mammalian auditory system [6, 7]. Prestin underwent adaptive change early in mammal evolution [8] and also shows sequence convergence between bats and dolphins [9, 10], as well as within bats [11]. Focusing on whales, we show for the first time that the extent of protein evolution in Prestin can be linked directly to the evolution of high-frequency hearing. Moreover, we find that independent cases of sequence convergence in mammals have involved numerous identical amino acid site replacements. Our findings shed new light on the importance of Prestin in the evolution of mammalian hearing.
Current biology : CB 2010;20;20;1834-9
PUBMED: 20933423; DOI: 10.1016/j.cub.2010.09.008
-
Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function.
Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland.
The traditional tree of life shows eukaryotes as a distinct lineage of living things, but many studies have suggested that the first eukaryotic cells were chimeric, descended from both Eubacteria (through the mitochondrion) and Archaebacteria. Eukaryote nuclei thus contain genes of both eubacterial and archaebacterial origins, and these genes have different functions within eukaryotic cells. Here we report that archaebacterium-derived genes are significantly more likely to be essential to yeast viability, are more highly expressed, and are significantly more highly connected and more central in the yeast protein interaction network. These findings hold irrespective of whether the genes have an informational or operational function, so that many features of eukaryotic genes with prokaryotic homologs can be explained by their origin, rather than their function. Taken together, our results show that genes of archaebacterial origin are in some senses more important to yeast metabolism than genes of eubacterial origin. This importance reflects these genes' origin as the ancestral nuclear component of the eukaryotic genome.
Proceedings of the National Academy of Sciences of the United States of America 2010;107;40;17252-5
PUBMED: 20852068; PMC: 2951413; DOI: 10.1073/pnas.1000265107
-
Convergent sequence evolution between echolocating bats and dolphins.
School of Life Sciences, East China Normal University, Shanghai 200062, China.
Cases of convergent evolution - where different lineages have evolved similar traits independently - are common and have proven central to our understanding of selection. Yet convincing examples of adaptive convergence at the sequence level are exceptionally rare [1]. The motor protein Prestin is expressed in mammalian outer hair cells (OHCs) and is thought to confer high frequency sensitivity and selectivity in the mammalian auditory system [2]. We previously reported that the Prestin gene has undergone sequence convergence among unrelated lineages of echolocating bat [3]. Here we report that this gene has also undergone convergent amino acid substitutions in echolocating dolphins, which group with echolocating bats in a phylogenetic tree of Prestin. Furthermore, we find evidence that these changes were driven by natural selection.
Current biology : CB 2010;20;2;R53-4
PUBMED: 20129036; DOI: 10.1016/j.cub.2009.11.058
-
The evolution of color vision in nocturnal mammals.
Institute of Zoology and Graduate University, Chinese Academy of Sciences, Beijing 100080, China.
Nonfunctional visual genes are usually associated with species that inhabit poor light environments (aquatic/subterranean/nocturnal), and these genes are believed to have lost function through relaxed selection acting on the visual system. Indeed, the visual system is so adaptive that the reconstruction of intact ancestral opsin genes has been used to reject nocturnality in ancestral primates. To test these assertions, we examined the functionality of the short and medium- to long-wavelength opsin genes in a group of mammals that are supremely adapted to a nocturnal niche: the bats. We sequenced the visual cone opsin genes in 33 species of bat with diverse sensory ecologies and reconstructed their evolutionary history spanning 65 million years. We found that, whereas the long-wave opsin gene was conserved in all species, the short-wave opsin gene has undergone dramatic divergence among lineages. The occurrence of gene defects in the short-wave opsin gene leading to loss of function was found to directly coincide with the origin of high-duty-cycle echolocation and changes in roosting ecology in some lineages. Our findings indicate that both opsin genes have been under purifying selection in the majority bats despite a long history of nocturnality. However, when spectacular losses do occur, these result from an evolutionary sensory modality tradeoff, most likely driven by subtle shifts in ecological specialization rather than a nocturnal lifestyle. Our results suggest that UV color vision plays a considerably more important role in nocturnal mammalian sensory ecology than previously appreciated and highlight the caveat of inferring light environments from visual opsins and vice versa.
Proceedings of the National Academy of Sciences of the United States of America 2009;106;22;8980-5
PUBMED: 19470491; PMC: 2690009; DOI: 10.1073/pnas.0813201106
-
Supertrees join the mainstream of phylogenetics.
School of Biological and Chemical Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK. j.a.cotton@qmul.ac.uk
Supertree methods are fairly widely used to build comprehensive phylogenies for particular groups, but concerns remain over the adequacy of existing approaches. Steel and Rodrigo recently introduced a statistical model of incongruence between trees, allowing maximum-likelihood supertree inference. This approach to supertree construction will enable hypothesis-testing and model-choice methods that are now routine in sequence phylogenetics to be applied in this setting, and might form an important part of future phylogenetic inference from genomic data.
Trends in ecology & evolution 2009;24;1;1-3
PUBMED: 19022523; DOI: 10.1016/j.tree.2008.08.006
-
The hearing gene Prestin reunites echolocating bats.
School of Life Science, East China Normal University, Shanghai 200062, China.
The remarkable high-frequency sensitivity and selectivity of the mammalian auditory system has been attributed to the evolution of mechanical amplification, in which sound waves are amplified by outer hair cells in the cochlea. This process is driven by the recently discovered protein prestin, encoded by the gene Prestin. Echolocating bats use ultrasound for orientation and hunting and possess the highest frequency hearing of all mammals. To test for the involvement of Prestin in the evolution of bat echolocation, we sequenced the coding region in echolocating and nonecholocating species. The resulting putative gene tree showed strong support for a monophyletic assemblage of echolocating species, conflicting with the species phylogeny in which echolocators are paraphyletic. We reject the possibilities that this conflict arises from either gene duplication and loss or relaxed selection in nonecholocating fruit bats. Instead, we hypothesize that the putative gene tree reflects convergence at stretches of functional importance. Convergence is supported by the recovery of the species tree from alignments of hydrophobic transmembrane domains, and the putative gene tree from the intra- and extracellular domains. We also found evidence that Prestin has undergone Darwinian selection associated with the evolution of specialized constant-frequency echolocation, which is characterized by sharp auditory tuning. Our study of a hearing gene in bats strongly implicates Prestin in the evolution of echolocation, and suggests independent evolution of high-frequency hearing in bats. These results highlight the potential problems of extracting phylogenetic signals from functional genes that may be prone to convergence.
Proceedings of the National Academy of Sciences of the United States of America 2008;105;37;13959-64
PUBMED: 18776049; PMC: 2544561; DOI: 10.1073/pnas.0802097105
-
The prokaryotic tree of life: past, present... and future?
Department of Biology, National University of Ireland Maynooth, Maynooth, County Kildare, Ireland. james.o.mcinerney@nuim.ie
No accepted phylogenetic scheme for prokaryotes emerged until the late 1970s. Prior to that, it was assumed that there was a phylogenetic tree uniting all prokaryotes, but no suitable data were available for its construction. For 20 years, through the 1980s and 1990s, rRNA phylogenies were the gold standard. However, beginning in the last decade, findings from genomic data have challenged this new consensus. Gene trees can conflict greatly, and strains of the same species can differ enormously in genome content. Horizontal gene transfer is now known to be a significant influence on genome evolution. The next decade is likely to resolve whether or not we retain the centuries-old metaphor of the tree for all of life.
Trends in ecology & evolution 2008;23;5;276-81
PUBMED: 18367290; DOI: 10.1016/j.tree.2008.01.008
-
The tree of genomes: an empirical comparison of genome-phylogeny reconstruction methods.
Bioinformatics laboratory, Department of Biology, National University of Ireland Maynooth, Maynooth, Co, Kildare, Ireland. angela.mccann@nuim.ie
Background: In the past decade or more, the emphasis for reconstructing species phylogenies has moved from the analysis of a single gene to the analysis of multiple genes and even completed genomes. The simplest method of scaling up is to use familiar analysis methods on a larger scale and this is the most popular approach. However, duplications and losses of genes along with horizontal gene transfer (HGT) can lead to a situation where there is only an indirect relationship between gene and genome phylogenies. In this study we examine five widely-used approaches and their variants to see if indeed they are more-or-less saying the same thing. In particular, we focus on Conditioned Reconstruction as it is a method that is designed to work well even if HGT is present.
Results: We confirm a previous suggestion that this method has a systematic bias. We show that no two methods produce the same results and most current methods of inferring genome phylogenies produce results that are significantly different to other methods.
Conclusion: We conclude that genome phylogenies need to be interpreted differently, depending on the method used to construct them.
BMC evolutionary biology 2008;8;312
PUBMED: 19014489; PMC: 2592249; DOI: 10.1186/1471-2148-8-312
Bernardo Foth
bf3@sanger.ac.uk Senior Staff Scientist
I studied biology at the University of Erlangen in Germany, followed by PhD work on the relic plastid of malaria parasites in Melbourne with Geoff McFadden. I then carried out postdoctoral research in the labs of Dominique Soldati (on myosins and Toxoplasma gondii cell biology) and Zbynek Bozdech (on quantitative transcript-protein relationships in malaria parasites). I joined the Parasite Genomics group in December 2010.
Research
I am currently involved in a number of functional genomics-related projects ranging from investigating the genetic basis of drug-resistance in African trypanosomes to differential gene expression in the parasitic nematode Trichuris muris. I am also leading the group's renewed efforts to produce the de novo genome sequence of the avian malaria parasite Plasmodium gallinaceum.
References
-
Quantitative time-course profiling of parasite and host cell proteins in the human malaria parasite Plasmodium falciparum.
School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551.
Studies of the Plasmodium falciparum transcriptome have shown that the tightly controlled progression of the parasite through the intra-erythrocytic developmental cycle (IDC) is accompanied by a continuous gene expression cascade in which most expressed genes exhibit a single transcriptional peak. Because the biochemical and cellular functions of most genes are mediated by the encoded proteins, understanding the relationship between mRNA and protein levels is crucial for inferring biological activity from transcriptional gene expression data. Although studies on other organisms show that <50% of protein abundance variation may be attributable to corresponding mRNA levels, the situation in Plasmodium is further complicated by the dynamic nature of the cyclic gene expression cascade. In this study, we simultaneously determined mRNA and protein abundance profiles for P. falciparum parasites during the IDC at 2-hour resolution based on oligonucleotide microarrays and two-dimensional differential gel electrophoresis protein gels. We find that most proteins are represented by more than one isoform, presumably because of post-translational modifications. Like transcripts, most proteins exhibit cyclic abundance profiles with one peak during the IDC, whereas the presence of functionally related proteins is highly correlated. In contrast, the abundance of most parasite proteins peaks significantly later (median 11 h) than the corresponding transcripts and often decreases slowly in the second half of the IDC. Computational modeling indicates that the considerable and varied incongruence between transcript and protein abundance may largely be caused by the dynamics of translation and protein degradation. Furthermore, we present cyclic abundance profiles also for parasite-associated human proteins and confirm the presence of five human proteins with a potential role in antioxidant defense within the parasites. Together, our data provide fundamental insights into transcript-protein relationships in P. falciparum that are important for the correct interpretation of transcriptional data and that may facilitate the improvement and development of malaria diagnostics and drug therapy.
Molecular & cellular proteomics : MCP 2011;10;8;M110.006411
PUBMED: 21558492; PMC: 3149090; DOI: 10.1074/mcp.M110.006411
-
Mitochondrial translation in absence of local tRNA aminoacylation and methionyl tRNA Met formylation in Apicomplexa.
Department of Microbiology and Molecular Medicine, CMU, University of Geneva, 1 rue Michel-Servet, 1211 Geneva 4, Switzerland.
Apicomplexans possess three translationally active compartments: the cytosol, a single tubular mitochondrion, and a vestigial plastid organelle called apicoplast. Mitochondrion and apicoplast are of bacterial evolutionary origin and therefore depend on a bacterial-like translation machinery. The minimal mitochondrial genome contains only three ORFs, and in Toxoplasma gondii the absence of mitochondrial tRNA genes is compensated for by the import of cytosolic eukaryotic tRNAs. Although all compartments require a complete set of charged tRNAs, the apicomplexan nuclear genomes do not hold sufficient aminoacyl-tRNA synthetase (aaRSs) genes to be targeted individually to each compartment. This study reveals that aaRSs are either cytosolic, apicoplastic or shared between the two compartments by dual targeting but are absent from the mitochondrion. Consequently, tRNAs are very likely imported in their aminoacylated form. Furthermore, the unexpected absence of tRNA(Met) formyltransferase and peptide deformylase implies that the requirement for a specialized formylmethionyl-tRNA(Met) for translation initiation is bypassed in the mitochondrion of Apicomplexa.
Funded by: Howard Hughes Medical Institute; Wellcome Trust
Molecular microbiology 2010;76;3;706-18
PUBMED: 20374492; DOI: 10.1111/j.1365-2958.2010.07128.x
-
Evolution of malaria parasite plastid targeting sequences.
School of Botany, University of Melbourne, Melbourne, Victoria 3010, Australia.
The transfer of genes from an endosymbiont to its host typically requires acquisition of targeting signals by the gene product to ensure its return to the endosymbiont for function. Many hundreds of plastid-derived genes must have acquired transit peptides for successful relocation to the nucleus. Here, we explore potential evolutionary origins of plastid transit peptides in the malaria parasite Plasmodium falciparum. We show that exons of the P. falciparum genome could serve as transit peptides after exon shuffling. We further demonstrate that numerous randomized peptides and even whimsical sequences based on English words can also function as transit peptides in vivo. Thus, facile acquisition of transit peptides from existing sequence likely expedited endosymbiont integration through intracellular gene transfer.
Funded by: Howard Hughes Medical Institute
Proceedings of the National Academy of Sciences of the United States of America 2008;105;12;4781-5
PUBMED: 18353992; PMC: 2290815; DOI: 10.1073/pnas.0707827105
-
Quantitative protein expression profiling reveals extensive post-transcriptional regulation and post-translational modifications in schizont-stage malaria parasites.
School of Biological Sciences, Nanyang Technological University, Nanyang Drive, 637551 Singapore. BFoth@ntu.edu.sg
Background: Malaria is a one of the most important infectious diseases and is caused by parasitic protozoa of the genus Plasmodium. Previously, quantitative characterization of the P. falciparum transcriptome demonstrated that the strictly controlled progression of these parasites through their intra-erythrocytic developmental cycle is accompanied by a continuous cascade of gene expression. Although such analyses have proven immensely useful, the correlations between abundance of transcripts and their cognate proteins remain poorly characterized.
Results: Here, we present a quantitative time-course analysis of relative protein abundance for schizont-stage parasites (34 to 46 hours after invasion) based on two-dimensional differential gel electrophoresis of protein samples labeled with fluorescent dyes. For this purpose we analyzed parasite samples taken at 4-hour intervals from a tightly synchronized culture and established more than 500 individual protein abundance profiles with high temporal resolution and quantitative reproducibility. Approximately half of all profiles exhibit a significant change in abundance and 12% display an expression peak during the observed 12-hour time interval. Intriguingly, identification of 54 protein spots by mass spectrometry revealed that 58% of the corresponding proteins--including actin-I, enolase, eukaryotic initiation factor (eIF)4A, eIF5A, and several heat shock proteins--are represented by more than one isoform, presumably caused by post-translational modifications, with the various isoforms of a given protein frequently showing different expression patterns. Furthermore, comparisons with transcriptome data generated from the same parasite samples reveal evidence of significant post-transcriptional gene expression regulation.
Conclusions: Together, our data indicate that both post-transcriptional and post-translational events are widespread and of presumably great biological significance during the intra-erythrocytic development of P. falciparum.
Genome biology 2008;9;12;R177
PUBMED: 19091060; PMC: 2646281; DOI: 10.1186/gb-2008-9-12-r177
-
Dual targeting of antioxidant and metabolic enzymes to the mitochondrion and the apicoplast of Toxoplasma gondii.
Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland.
Toxoplasma gondii is an aerobic protozoan parasite that possesses mitochondrial antioxidant enzymes to safely dispose of oxygen radicals generated by cellular respiration and metabolism. As with most Apicomplexans, it also harbors a chloroplast-like organelle, the apicoplast, which hosts various biosynthetic pathways and requires antioxidant protection. Most apicoplast-resident proteins are encoded in the nuclear genome and are targeted to the organelle via a bipartite N-terminal targeting sequence. We show here that two antioxidant enzymes-a superoxide dismutase (TgSOD2) and a thioredoxin-dependent peroxidase (TgTPX1/2)-and an aconitase are dually targeted to both the apicoplast and the mitochondrion of T. gondii. In the case of TgSOD2, our results indicate that a single gene product is bimodally targeted due to an inconspicuous variation within the putative signal peptide of the organellar protein, which significantly alters its subcellular localization. Dual organellar targeting of proteins might occur frequently in Apicomplexans to serve important biological functions such as antioxidant protection and carbon metabolism.
Funded by: Wellcome Trust
PLoS pathogens 2007;3;8;e115
PUBMED: 17784785; PMC: 1959373; DOI: 10.1371/journal.ppat.0030115
-
New insights into myosin evolution and classification.
Department of Microbiology and Molecular Medicine, Centre Médical Universitaire, University of Geneva, 1 Rue Michel-Servet, 1211 Geneva, Switzerland. bernardo.foth@medecine.unige.ch
Myosins are eukaryotic actin-dependent molecular motors important for a broad range of functions like muscle contraction, vision, hearing, cell motility, and host cell invasion of apicomplexan parasites. Myosin heavy chains consist of distinct head, neck, and tail domains and have previously been categorized into 18 different classes based on phylogenetic analysis of their conserved heads. Here we describe a comprehensive phylogenetic examination of many previously unclassified myosins, with particular emphasis on sequences from apicomplexan and other chromalveolate protists including the model organism Toxoplasma, the malaria parasite Plasmodium, and the ciliate Tetrahymena. Using different phylogenetic inference methods and taking protein domain architectures, specific amino acid polymorphisms, and organismal distribution into account, we demonstrate a hitherto unrecognized common origin for ciliate and apicomplexan class XIV myosins. Our data also suggest common origins for some apicomplexan myosins and class VI, for classes II and XVIII, for classes XII and XV, and for some microsporidian myosins and class V, thereby reconciling evolutionary history and myosin structure in several cases and corroborating the common coevolution of myosin head, neck, and tail domains. Six novel myosin classes are established to accommodate sequences from chordate metazoans (class XIX), insects (class XX), kinetoplastids (class XXI), and apicomplexans and diatom algae (classes XXII, XXIII, and XXIV). These myosin (sub)classes include sequences with protein domains (FYVE, WW, UBA, ATS1-like, and WD40) previously unknown to be associated with myosin motors. Regarding the apicomplexan "myosome," we significantly update class XIV classification, propose a systematic naming convention, and discuss possible functions in these parasites.
Funded by: Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2006;103;10;3681-6
PUBMED: 16505385; PMC: 1533776; DOI: 10.1073/pnas.0506307103
-
The malaria parasite Plasmodium falciparum has only one pyruvate dehydrogenase complex, which is located in the apicoplast.
Plant Cell Biology Research Centre, School of Botany, University of Melbourne, Parkville, VIC 3010, Australia.
The relict plastid (apicoplast) of apicomplexan parasites synthesizes fatty acids and is a promising drug target. In plant plastids, a pyruvate dehydrogenase complex (PDH) converts pyruvate into acetyl-CoA, the major fatty acid precursor, whereas a second, distinct PDH fuels the tricarboxylic acid cycle in the mitochondria. In contrast, the presence of genes encoding PDH and related enzyme complexes in the genomes of five Plasmodium species and of Toxoplasma gondii indicate that these parasites contain only one single PDH. PDH complexes are comprised of four subunits (E1alpha, E1beta, E2, E3), and we confirmed four genes encoding a complete PDH in Plasmodium falciparum through sequencing of cDNA clones. In apicomplexan parasites, many nuclear-encoded proteins are targeted to the apicoplast courtesy of two-part N-terminal leader sequences, and the presence of such N-terminal sequences on all four PDH subunits as well as phylogenetic analyses strongly suggest that the P. falciparum PDH is located in the apicoplast. Fusion of the two-part leader sequences from the E1alpha and E2 genes to green fluorescent protein experimentally confirmed apicoplast targeting. Western blot analysis provided evidence for the expression of the E1alpha and E1beta PDH subunits in blood-stage malaria parasites. The recombinantly expressed catalytic domain of the PDH subunit E2 showed high enzymatic activity in vitro indicating that pyruvate is converted to acetyl-CoA in the apicoplast, possibly for use in fatty acid biosynthesis.
Molecular microbiology 2005;55;1;39-53
PUBMED: 15612915; DOI: 10.1111/j.1365-2958.2004.04407.x
-
Tropical infectious diseases: metabolic maps and functions of the Plasmodium falciparum apicoplast.
Institut Pasteur, Biology of Host-Parasite Interactions, 25 Rue du Docteur Roux, 75724, Paris, Cedex 15, France.
Nature reviews. Microbiology 2004;2;3;203-16
PUBMED: 15083156; DOI: 10.1038/nrmicro843
-
Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum.
Plant Cell Biology Research Centre, School of Botany, University of Melbourne, Parkville, VIC 3010, Australia.
Transit peptides mediate protein targeting into plastids and are only poorly understood. We extracted amino acid features from transit peptides that target proteins to the relict plastid (apicoplast) of malaria parasites. Based on these amino acid characteristics, we identified 466 putative apicoplast proteins in the Plasmodium falciparum genome. Altering the specific charge characteristics in a model transit peptide by site-directed mutagenesis severely disrupted organellar targeting in vivo. Similarly, putative Hsp70 (DnaK) binding sites present in the transit peptide proved to be important for correct targeting.
Science (New York, N.Y.) 2003;299;5607;705-8
PUBMED: 12560551; DOI: 10.1126/science.1078599
-
Regulated degradation of an endoplasmic reticulum membrane protein in a tubular lysosome in Leishmania mexicana.
Department of Biochemistry and Molecular Biology, The University of Melbourne, Victoria 3010, Australia.
The cell surface of the human parasite Leishmania mexicana is coated with glycosylphosphatidylinositol (GPI)-anchored macromolecules and free GPI glycolipids. We have investigated the intracellular trafficking of green fluorescent protein- and hemagglutinin-tagged forms of dolichol-phosphate-mannose synthase (DPMS), a key enzyme in GPI biosynthesis in L. mexicana promastigotes. These functionally active chimeras are found in the same subcompartment of the endoplasmic reticulum (ER) as endogenous DPMS but are degraded as logarithmically growing promastigotes reach stationary phase, coincident with the down-regulation of endogenous DPMS activity and GPI biosynthesis in these cells. We provide evidence that these chimeras are constitutively transported to and degraded in a novel multivesicular tubule (MVT) lysosome. This organelle is a terminal lysosome, which is labeled with the endocytic marker FM 4-64, contains lysosomal cysteine and serine proteases and is disrupted by lysomorphotropic agents. Electron microscopy and subcellular fractionation studies suggest that the DPMS chimeras are transported from the ER to the lumen of the MVT via the Golgi apparatus and a population of 200-nm multivesicular bodies. In contrast, soluble ER proteins are not detectably transported to the MVT lysosome in either log or stationary phase promastigotes. Finally, the increased degradation of the DPMS chimeras in stationary phase promastigotes coincides with an increase in the lytic capacity of the MVT lysosome and changes in the morphology of this organelle. We conclude that lysosomal degradation of DPMS may be important in regulating the cellular levels of this enzyme and the stage-dependent biosynthesis of the major surface glycolipids of these parasites.
Molecular biology of the cell 2001;12;8;2364-77
Tom Huckvale
- Advanced Research Assistant
I completed a BSc. in Biological & Medicinal Chemistry at the University of Exeter in 2008, and went on to finish an MSc. whilst working in a veterinary testing laboratory. I went on to work for a genomics services company in Berlin, and then in the food chemistry department of an analytical sciences firm in London before starting at Sanger in April 2011.
Research
At the Institute, I provide laboratory support to the Parasite Genomics group through the introduction of new and established methods of functional genomics across a range of parasitic species.
References
-
The genomes of four tapeworm species reveal adaptations to parasitism.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 098051
Nature 2013;496;7443;57-63
PUBMED: 23485966; DOI: 10.1038/nature12031
Sarah Nichol
- Computer Biologist - Senior Genome Analyst
I gained a BSc(Hons)in Zoology from the University of Edinburgh in 2005. I had a particular interest in parasites, which led me to write my final year dissertation on developing a test to detect nematodes in sheep. After a 2 year gap year of working for a year to raise money to fund my solo travels around Australia and New Zealand, I arrived at the Sanger Institute, where I began as a Finisher on the Zebrafish project in 2008. I became a fully fledged member of Parasite Genomics at the end of 2011.
Research
I have contributed to the Parasite Genomics group as a Senior Genome Analyst since 2010. My work involves improvement of a number of helminth genome assemblies, using bespoke software tools and scripts to further improve the assemblies beyond what is possible using automated assembly alone. I also help with manually annotating and improving gene models.
References
-
The genomes of four tapeworm species reveal adaptations to parasitism.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 098051
Nature 2013;496;7443;57-63
PUBMED: 23485966; DOI: 10.1038/nature12031
-
A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.
Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z
PLoS neglected tropical diseases 2012;6;1;e1455
PUBMED: 22253936; PMC: 3254664; DOI: 10.1371/journal.pntd.0001455
Thomas Otto
- Senior Staff Scientist
I studied informatics with bioinformatics as minor in Lübeck, Germany. After a short project at the Florida State University (analyzing Functional magnetic resonance imaging data), I started to work at the Fundação Oswaldo Cruz in Rio de Janeiro, Brazil. My role was to provide bioinformatics support to the group and generate algorithmic solutions to biological problems. In 2008, I finished my PhD, presenting alternative ways to improve the assembly of the Brazilian tuberculosis genome.
Research
In 2008 I joined Matt Berriman’s group. My main role is to provide bioinformatics support to our team, other groups at Sanger and within the European EviMalaR network of malaria labs. My projects mostly involve analyzing next generation sequencing data related to Malaria, by developing algorithms.
References
-
Optimal enzymes for amplifying sequencing libraries.
Nature methods 2012;9;1;10-1
PUBMED: 22205512; DOI: 10.1038/nmeth.1814
-
A scalable pipeline for highly effective genetic modification of a malaria parasite.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
In malaria parasites, the systematic experimental validation of drug and vaccine targets by reverse genetics is constrained by the inefficiency of homologous recombination and by the difficulty of manipulating adenine and thymine (A+T)-rich DNA of most Plasmodium species in Escherichia coli. We overcame these roadblocks by creating a high-integrity library of Plasmodium berghei genomic DNA (>77% A+T content) in a bacteriophage N15-based vector that can be modified efficiently using the lambda Red method of recombineering. We built a pipeline for generating P. berghei genetic modification vectors at genome scale in serial liquid cultures on 96-well plates. Vectors have long homology arms, which increase recombination frequency up to tenfold over conventional designs. The feasibility of efficient genetic modification at scale will stimulate collaborative, genome-wide knockout and tagging programs for P. berghei.
Funded by: Medical Research Council: G0501670, G0501670(76331); Wellcome Trust: 089085, WT089085/Z/09/Z
Nature methods 2011;8;12;1078-82
PUBMED: 22020067; PMC: 3431185; DOI: 10.1038/nmeth.1742
-
Genome sequence of Mycobacterium bovis BCG Moreau, the Brazilian vaccine strain against tuberculosis.
Laboratório de Genômica Funcional e Bioinformática, Pavilhão Leonidas Deane sala 104, Instituto Oswaldo Cruz, Fiocruz Av., Brasil 4365, Manguinhos, 21040-900 Rio de Janeiro, Brazil.
Mycobacterium bovis bacillus Calmette-Guérin (BCG) is the only vaccine available against tuberculosis, and the strains used worldwide represent a family of daughter strains with distinct genotypic characteristics. Here we report the complete genome sequence of M. bovis BCG Moreau, the strain in continuous use in Brazil for vaccine production since the 1920s.
Journal of bacteriology 2011;193;19;5600-1
PUBMED: 21914899; PMC: 3187452; DOI: 10.1128/JB.05827-11
-
Real-time sequencing.
Nature reviews. Microbiology 2011;9;9;633
PUBMED: 21836624; DOI: 10.1038/nrmicro2638
-
RATT: Rapid Annotation Transfer Tool.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk
Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
Nucleic acids research 2011;39;9;e57
PUBMED: 21306991; PMC: 3089447; DOI: 10.1093/nar/gkq1268
-
Two nonrecombining sympatric forms of the human malaria parasite Plasmodium ovale occur globally.
Health Protection Agency Malaria Reference Laboratory, Immunology Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom. colin.sutherland@lshtm.ac.uk
Background: Malaria in humans is caused by apicomplexan parasites belonging to 5 species of the genus Plasmodium. Infections with Plasmodium ovale are widely distributed but rarely investigated, and the resulting burden of disease is not known. Dimorphism in defined genes has led to P. ovale parasites being divided into classic and variant types. We hypothesized that these dimorphs represent distinct parasite species.
Methods: Multilocus sequence analysis of 6 genetic characters was carried out among 55 isolates from 12 African and 3 Asia-Pacific countries.
Results: Each genetic character displayed complete dimorphism and segregated perfectly between the 2 types. Both types were identified in samples from Ghana, Nigeria, São Tomé, Sierra Leone, and Uganda and have been described previously in Myanmar. Splitting of the 2 lineages is estimated to have occurred between 1.0 and 3.5 million years ago in hominid hosts.
Conclusions: We propose that P. ovale comprises 2 nonrecombining species that are sympatric in Africa and Asia. We speculate on possible scenarios that could have led to this speciation. Furthermore, the relatively high frequency of imported cases of symptomatic P. ovale infection in the United Kingdom suggests that the morbidity caused by ovale malaria has been underestimated.
Funded by: Wellcome Trust
The Journal of infectious diseases 2010;201;10;1544-50
PUBMED: 20380562; DOI: 10.1086/652240
-
New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.
Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z
Molecular microbiology 2010;76;1;12-24
PUBMED: 20141604; PMC: 2859250; DOI: 10.1111/j.1365-2958.2009.07026.x
-
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.
Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil. otto@fiocruz.br
Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach.
Availability: The database can be accessed through http://proteinworlddb.org
Bioinformatics (Oxford, England) 2010;26;5;705-7
PUBMED: 20089515; PMC: 2828119; DOI: 10.1093/bioinformatics/btq011
-
Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. jit@sanger.ac.uk
Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
Genome biology 2010;11;4;R41
PUBMED: 20388197; PMC: 2884544; DOI: 10.1186/gb-2010-11-4-r41
-
ABACAS: algorithm-based automatic contiguation of assembled sequences.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK. sa4@sanger.ac.uk
Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net.
Funded by: Wellcome Trust: WT085775/Z/08/Z
Bioinformatics (Oxford, England) 2009;25;15;1968-9
PUBMED: 19497936; PMC: 2712343; DOI: 10.1093/bioinformatics/btp347
Anna Protasio
ap6@sanger.ac.uk Postdoctoral Fellow
I obtained my undergraduate degree in Biochemistry at the University of the Republic in Uruguay (2000-2006). In 2005 I won the "Wellcome Trust Sanger Institute Prize Competition" and was awarded a summer placement at the Institute. During 2007 I undertook an internship at the Schistosomiasis Research Group (University of Cambridge, UK) which turned my interests into Schistosomes. Later that year I started my Ph D studies under the supervision of Dr Matt Berriman (Parasite Genomics group) where I focused on gene expression changes in the early stages host invasion in S.mansoni.
Research
My current research in the Parasite Genomics group is focused in characterising and understanding the mechanisms of gene expression regulation in parasitic worms. Given the good state of its genome assembly and gene annotation, I use S.mansoni as my model organism for this sturdies. I am mainly interested in the role of microRNAs, promoter activation/repression and the presence of antisense transcription.
References
-
Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Schistosome infection begins with the penetration of cercariae through healthy unbroken host skin. This process leads to the transformation of the free-living larvae into obligate parasites called schistosomula. This irreversible transformation, which occurs in as little as two hours, involves casting the cercaria tail and complete remodelling of the surface membrane. At this stage, parasites are vulnerable to host immune attack and oxidative stress. Consequently, the mechanisms by which the parasite recognises and swiftly adapts to the human host are still the subject of many studies, especially in the context of development of intervention strategies against schistosomiasis infection. Because obtaining enough material from in vivo infections is not always feasible for such studies, the transformation process is often mimicked in the laboratory by application of shear pressure to a cercarial sample resulting in mechanically transformed (MT) schistosomula. These parasites share remarkable morphological and biochemical similarity to the naturally transformed counterparts and have been considered a good proxy for parasites undergoing natural infection. Relying on this equivalency, MT schistosomula have been used almost exclusively in high-throughput studies of gene expression, identification of drug targets and identification of effective drugs against schistosomes. However, the transcriptional equivalency between skin-transformed (ST) and MT schistosomula has never been proven. In our approach to compare these two types of schistosomula preparations and to explore differences in gene expression triggered by the presence of a skin barrier, we performed RNA-seq transcriptome profiling of ST and MT schistosomula at 24 hours post transformation. We report that these two very distinct schistosomula preparations differ only in the expression of 38 genes (out of ∼11,000), providing convincing evidence to resolve the skin vs. mechanical long-lasting controversy.
Funded by: Wellcome Trust: WT 083931/Z/07/Z, WT 098051
PLoS neglected tropical diseases 2013;7;3;e2091
PUBMED: 23516644; PMC: 3597483; DOI: 10.1371/journal.pntd.0002091
-
Progressive cross-reactivity in IgE responses: an explanation for the slow development of human immunity to schistosomiasis?
Department of Pathology, University of Cambridge, Cambridge, United Kindgdom. cmf1000@cam.ac.uk
People in regions of Schistosoma mansoni endemicity slowly acquire immunity, but why this takes years to develop is still not clear. It has been associated with increases in parasite-specific IgE, induced, some investigators propose, to antigens exposed during the death of adult worms. These antigens include members of the tegumental-allergen-like protein family (TAL1 to TAL13). Previously, in a group of S. mansoni-infected Ugandan males, we showed that IgE responses to three TALs expressed in worms (TAL1, -3, and -5) became more prevalent with age. Now, in a subcohort we examined associations of these responses with resistance to reinfection and use the data to propose a mechanism for the slow development of immunity. IgE was measured 9 weeks posttreatment and at reinfection at 2 years (n = 144). An anti-TAL5 IgE (herein referred to as TAL5 IgE) response was associated with reduced reinfection even after adjusting for age using regression analysis (geometric mean odds ratio, 0.24; P = 0.016). TAL5 IgE responders were a subset of TAL3 IgE responders, themselves a subset of TAL1 responders. TAL3 IgE and TAL5 IgE were highly cross-reactive, with TAL3 the immunizing antigen and TAL5 the cross-reactive antigen. Transcriptional and translational studies show that TAL3 is most abundant in adult worms and that TAL5 is most abundant in infectious larvae. We propose that in chronic schistosomiasis, older individuals have repeatedly experienced IgE antigens exposed when adult worms die (e.g., TAL3) and that this leads to increasing cross-reactivity with antigens of invading larvae (e.g., TAL5). Progressive accumulation of worm/larvae cross-reactivity could explain the age-dependent immunity observed in areas of endemicity.
Funded by: Wellcome Trust: 083931/∼/07/Z
Infection and immunity 2012;80;12;4264-70
PUBMED: 23006852; PMC: 3497412; DOI: 10.1128/IAI.00641-12
-
A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.
Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z
PLoS neglected tropical diseases 2012;6;1;e1455
PUBMED: 22253936; PMC: 3254664; DOI: 10.1371/journal.pntd.0001455
-
Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.
Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
PloS one 2011;6;8;e23216
PUBMED: 21858033; PMC: 3156134; DOI: 10.1371/journal.pone.0023216
-
Thioredoxin and glutathione systems differ in parasitic and free-living platyhelminths.
Cátedra de Inmunología, Facultad de Química, Instituto de Higiene, Universidad de la República, Avda, A, Navarro 3051, Montevideo, Uruguay.
Background: The thioredoxin and/or glutathione pathways occur in all organisms. They provide electrons for deoxyribonucleotide synthesis, function as antioxidant defenses, in detoxification, Fe/S biogenesis and participate in a variety of cellular processes. In contrast to their mammalian hosts, platyhelminth (flatworm) parasites studied so far, lack conventional thioredoxin and glutathione systems. Instead, they possess a linked thioredoxin-glutathione system with the selenocysteine-containing enzyme thioredoxin glutathione reductase (TGR) as the single redox hub that controls the overall redox homeostasis. TGR has been recently validated as a drug target for schistosomiasis and new drug leads targeting TGR have recently been identified for these platyhelminth infections that affect more than 200 million people and for which a single drug is currently available. Little is known regarding the genomic structure of flatworm TGRs, the expression of TGR variants and whether the absence of conventional thioredoxin and glutathione systems is a signature of the entire platyhelminth phylum.
Results: We examine platyhelminth genomes and transcriptomes and find that all platyhelminth parasites (from classes Cestoda and Trematoda) conform to a biochemical scenario involving, exclusively, a selenium-dependent linked thioredoxin-glutathione system having TGR as a central redox hub. In contrast, the free-living platyhelminth Schmidtea mediterranea (Class Turbellaria) possesses conventional and linked thioredoxin and glutathione systems. We identify TGR variants in Schistosoma spp. derived from a single gene, and demonstrate their expression. We also provide experimental evidence that alternative initiation of transcription and alternative transcript processing contribute to the generation of TGR variants in platyhelminth parasites.
Conclusions: Our results indicate that thioredoxin and glutathione pathways differ in parasitic and free-living flatworms and that canonical enzymes were specifically lost in the parasitic lineage. Platyhelminth parasites possess a unique and simplified redox system for diverse essential processes, and thus TGR is an excellent drug target for platyhelminth infections. Inhibition of the central redox wire hub would lead to overall disruption of redox homeostasis and disable DNA synthesis.
Funded by: FIC NIH HHS: TW006959; NIGMS NIH HHS: GM065204; Wellcome Trust: WT 085775/Z/08/Z
BMC genomics 2010;11;237
PUBMED: 20385027; PMC: 2873472; DOI: 10.1186/1471-2164-11-237
-
The genome of the blood fluke Schistosoma mansoni.
Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: WT085775/Z/08/Z
Nature 2009;460;7253;352-8
PUBMED: 19606141; PMC: 2756445; DOI: 10.1038/nature08160
-
Platyhelminth mitochondrial and cytosolic redox homeostasis is controlled by a single thioredoxin glutathione reductase and dependent on selenium and glutathione.
Cátedra de Inmunología, Facultad de Química-Facultad de Ciencias, Instituto de Higiene, Universidad de la República, Piso 2, Montevideo, Uruguay.
Platyhelminth parasites are a major health problem in developing countries. In contrast to their mammalian hosts, platyhelminth thiol-disulfide redox homeostasis relies on linked thioredoxin-glutathione systems, which are fully dependent on thioredoxin-glutathione reductase (TGR), a promising drug target. TGR is a homodimeric enzyme comprising a glutaredoxin domain and thioredoxin reductase (TR) domains with a C-terminal redox center containing selenocysteine (Sec). In this study, we demonstrate the existence of functional linked thioredoxin-glutathione systems in the cytosolic and mitochondrial compartments of Echinococcus granulosus, the platyhelminth responsible for hydatid disease. The glutathione reductase (GR) activity of TGR exhibited hysteretic behavior regulated by the [GSSG]/[GSH] ratio. This behavior was associated with glutathionylation by GSSG and abolished by deglutathionylation. The K(m) and k(cat) values for mitochondrial and cytosolic thioredoxins (9.5 microm and 131 s(-1), 34 microm and 197 s(-1), respectively) were higher than those reported for mammalian TRs. Analysis of TGR mutants revealed that the glutaredoxin domain is required for the GR activity but did not affect the TR activity. In contrast, both GR and TR activities were dependent on the Sec-containing redox center. The activity loss caused by the Sec-to-Cys mutation could be partially compensated by a Cys-to-Sec mutation of the neighboring residue, indicating that Sec can support catalysis at this alternative position. Consistent with the essential role of TGR in redox control, 2.5 microm auranofin, a known TGR inhibitor, killed larval worms in vitro. These studies establish the selenium- and glutathione-dependent regulation of cytosolic and mitochondrial redox homeostasis through a single TGR enzyme in platyhelminths.
Funded by: FIC NIH HHS: TW 006959; NIGMS NIH HHS: GM 065204
The Journal of biological chemistry 2008;283;26;17898-907
PUBMED: 18408002; PMC: 2440607; DOI: 10.1074/jbc.M710609200
-
Use of genomic DNA as an indirect reference for identifying gender-associated transcripts in morphologically identical, but chromosomally distinct, Schistosoma mansoni cercariae.
Department of Pathology, University of Cambridge, Cambridge, United Kingdom.
Background: The use of DNA microarray technology to study global Schistosoma gene expression has led to the rapid identification of novel biological processes, pathways or associations. Implementation of standardized DNA microarray protocols across laboratories would assist maximal interpretation of generated datasets and extend productive application of this technology.
Utilizing a new Schistosoma mansoni oligonucleotide DNA microarray composed of 37,632 elements, we show that schistosome genomic DNA (gDNA) hybridizes with less variation compared to complex mixed pools of S. mansoni cDNA material (R = 0.993 for gDNA compared to R = 0.956 for cDNA during 'self versus self' hybridizations). Furthermore, these effects are species-specific, with S. japonicum or Mus musculus gDNA failing to bind significantly to S. mansoni oligonucleotide DNA microarrays (e.g R = 0.350 when S. mansoni gDNA is co-hybridized with S. japonicum gDNA). Increased median fluorescent intensities (209.9) were also observed for DNA microarray elements hybridized with S. mansoni gDNA compared to complex mixed pools of S. mansoni cDNA (112.2). Exploiting these valuable characteristics, S. mansoni gDNA was used in two-channel DNA microarray hybridization experiments as a common reference for indirect identification of gender-associated transcripts in cercariae, a schistosome life-stage in which there is no overt sexual dimorphism. This led to the identification of 2,648 gender-associated transcripts. When compared to the 780 gender-associated transcripts identified by hybridization experiments utilizing a two-channel direct method (co-hybridization of male and female cercariae cDNA), indirect methods using gDNA were far superior in identifying greater quantities of differentially expressed transcripts. Interestingly, both methods identified a concordant subset of 188 male-associated and 156 female-associated cercarial transcripts, respectively. Gene ontology classification of these differentially expressed transcripts revealed a greater diversity of categories in male cercariae. Quantitative real-time PCR analysis confirmed the DNA microarray results and supported the reliability of this platform for identifying gender-associated transcripts.
Schistosome gDNA displays characteristics highly suitable for the comparison of two-channel DNA microarray results obtained from experiments conducted independently across laboratories. The schistosome transcripts identified here demonstrate, for the first time, that gender-associated patterns of expression are already well established in the morphologically identical, but chromosomally distinct, cercariae stage.
Funded by: Wellcome Trust: 068501/Z/02/Z, 078317/Z/05/Z
PLoS neglected tropical diseases 2008;2;10;e323
PUBMED: 18941520; PMC: 2565838; DOI: 10.1371/journal.pntd.0000323
Adam Reid
ar11@sanger.ac.uk Postdoctoral Fellow
I studied for a Genetics BSc at the University of Sheffield and an MRes in Bioinformatics at the University of York. I subsequently worked for AstraZeneca, providing bioinformatics support to proteomics and genotyping projects. I then did my PhD with Prof. Christine Orengo at University College London looking at the evolution of protein domain families.
I joined the Parasite Genomics group in January 2009.
Research
1. I have led the analysis of the Neospora caninum genome and its comparison with the human pathogen Toxoplasma gondii.
2. I am leading analysis of another apicomplexan genome, the chicken parasite Eimeria tenella (and several related species).
3. I am working on various approaches to use gene expression analysis in investigating host-parasite interactions principally in Malaria, but also helminths and trypanosomes.
References
-
Vector transmission regulates immune control of Plasmodium virulence.
Division of Parasitology, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK.
Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents, primates or humans increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission, which is assumed to 'reset' Plasmodium to its original character. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we use mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi to interrogate regulation of parasite virulence. Analysis of SBP P. c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn modifies the elicited mammalian immune response, which in turn attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.
Funded by: Medical Research Council: U117584248; Wellcome Trust: 089553, 098051
Nature 2013;498;7453;228-31
PUBMED: 23719378; PMC: 3784817; DOI: 10.1038/nature12231
-
Genes involved in host-parasite interactions can be revealed by their correlated expression.
Parasite genomics group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ar11@sanger.ac.uk
Molecular interactions between a parasite and its host are key to the ability of the parasite to enter the host and persist. Our understanding of the genes and proteins involved in these interactions is limited. To better understand these processes it would be advantageous to have a range of methods to predict pairs of genes involved in such interactions. Correlated gene expression profiles can be used to identify molecular interactions within a species. Here we have extended the concept to different species, showing that genes with correlated expression are more likely to encode proteins, which directly or indirectly participate in host-parasite interaction. We go on to examine our predictions of molecular interactions between the malaria parasite and both its mammalian host and insect vector. Our approach could be applied to study any interaction between species, for example, between a host and its parasites or pathogens, but also symbiotic and commensal pairings.
Funded by: Wellcome Trust: 098051
Nucleic acids research 2013;41;3;1508-18
PUBMED: 23275547; PMC: 3561955; DOI: 10.1093/nar/gks1340
-
Characterization and gene expression analysis of the cir multi-gene family of Plasmodium chabaudi chabaudi (AS).
Division of Parasitology, MRC National Institute for Medical Research, London, UK.
Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.
Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.
Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein functions, including their contribution to antigenic variation and immune evasion.
Funded by: Medical Research Council: U117584248
BMC genomics 2012;13;125
PUBMED: 22458863; PMC: 3384456; DOI: 10.1186/1471-2164-13-125
-
Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.
Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.
Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z
PLoS pathogens 2012;8;3;e1002567
PUBMED: 22457617; PMC: 3310773; DOI: 10.1371/journal.ppat.1002567
-
Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus.
Forestry and Forest Products Research Institute, Tsukuba, Japan. kikuchit@affrc.go.jp
Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
PLoS pathogens 2011;7;9;e1002219
PUBMED: 21909270; PMC: 3164644; DOI: 10.1371/journal.ppat.1002219
-
CODA: accurate detection of functional associations between proteins in eukaryotic genomes using domain fusion.
Wellcome Trust Sanger Institute, Cambridge, United Kingdom. ar11@sanger.ac.uk
Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.
We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.
The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/.
Funded by: Biotechnology and Biological Sciences Research Council
PloS one 2010;5;6;e10908
PUBMED: 20532224; PMC: 2879367; DOI: 10.1371/journal.pone.0010908
-
Comparative evolutionary analysis of protein complexes in E. coli and yeast.
Research Department of Structural & Molecular Biology, University College London, London, WC1E 6BT, UK. ar11@sanger.ac.uk
Background: Proteins do not act in isolation; they frequently act together in protein complexes to carry out concerted cellular functions. The evolution of complexes is poorly understood, especially in organisms other than yeast, where little experimental data has been available.
Results: We generated accurate, high coverage datasets of protein complexes for E. coli and yeast in order to study differences in the evolution of complexes between these two species. We show that substantial differences exist in how complexes have evolved between these organisms. A previously proposed model of complex evolution identified complexes with cores of interacting homologues. We support findings of the relative importance of this mode of evolution in yeast, but find that it is much less common in E. coli. Additionally it is shown that those homologues which do cluster in complexes are involved in eukaryote-specific functions. Furthermore we identify correlated pairs of non-homologous domains which occur in multiple protein complexes. These were identified in both yeast and E. coli and we present evidence that these too may represent complex cores in yeast but not those of E. coli.
Conclusions: Our results suggest that there are differences in the way protein complexes have evolved in E. coli and yeast. Whereas some yeast complexes have evolved by recruiting paralogues, this is not apparent in E. coli. Furthermore, such complexes are involved in eukaryotic-specific functions. This implies that the increase in gene family sizes seen in eukaryotes in part reflects multiple family members being used within complexes. However, in general, in both E. coli and yeast, homologous domains are used in different complexes.
Funded by: Biotechnology and Biological Sciences Research Council
BMC genomics 2010;11;79
PUBMED: 20122144; PMC: 2837643; DOI: 10.1186/1471-2164-11-79
-
Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone.
Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK. reid@bioichem.ucl.ac.uk
Motivation: A recent development in sequence-based remote homologue detection is the introduction of profile-profile comparison methods. These are more powerful than previous technologies and can detect potentially homologous relationships missed by structural classifications such as CATH and SCOP. As structural classifications traditionally act as the gold standard of homology this poses a challenge in benchmarking them.
Results: We present a novel approach which allows an accurate benchmark of these methods against the CATH structural classification. We then apply this approach to assess the accuracy of a range of publicly available methods for remote homology detection including several profile-profile methods (COMPASS, HHSearch, PRC) from two perspectives. First, in distinguishing homologous domains from non-homologues and second, in annotating proteomes with structural domain families. PRC is shown to be the best method for distinguishing homologues. We show that SAM is the best practical method for annotating genomes, whilst using COMPASS for the most remote homologues would increase coverage. Finally, we introduce a simple approach to increase the sensitivity of remote homologue detection by up to 10%. This is achieved by combining multiple methods with a jury vote.
Supplementary data are available at Bioinformatics online.
Bioinformatics (Oxford, England) 2007;23;18;2353-60
PUBMED: 17709341; DOI: 10.1093/bioinformatics/btm355
Florian Sessler
fs8@sanger.ac.uk PhD Student
Having completed my BSc in Biology with Microbiology at Imperial College London, I started a 4-year PhD program at the Wellcome Trust Sanger Institute in 2011. After 6 months of rotations in different 3 pathogen labs I started my PhD project in Matt Berriman's Parasite Genomics group.
Research
My PhD project focuses on characterizing male and female Schistosoma mansoni, a tropical parasite about 200 million people are currently infected with. In order to better understand sexual development and maturation, I use a range of different techniques, but high throughput sequencing (RNA-seq) and transcriptome analysis currently form the basis of my research.
Eleanor Stanley
es9@sanger.ac.uk Senior Bioinformatician
I studied Biological Sciences, specialising in genetics, at University of Birmingham. The final year literature study on the duplication of the Adh region in Drosophila aided my successful application to become a Flybase curator at University of Cambridge. After 5 fabulous years I moved to the European Bioinformatics Institute to become a UniProt curator. In addition, I enjoyed roles managing an alternative splicing project and the Complete proteomes team. In my final year of being a biocurator, I completed an Msc(Res) in Bioinformatics and joined the parasite genomics group at the Sanger Institute in April 2012.
Research
My role within the team is to build a pipeline to generate gene models for the 50 Helminth genome project. To achieve this I am using Ensembl and Maker.
References
-
Toward community standards in the quest for orthologs.
The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
Funded by: PHS HHS: HHSN266200400037C
Bioinformatics (Oxford, England) 2012;28;6;900-4
PUBMED: 22332236; PMC: 3307119; DOI: 10.1093/bioinformatics/bts050
-
Reorganizing the protein space at the Universal Protein Resource (UniProt).
The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
The mission of UniProt is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. A key development at UniProt is the provision of complete, reference and representative proteomes. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Funded by: British Heart Foundation: SP/07/007/23671; NCRR NIH HHS: 3P20RR016472-09S2; NHGRI NIH HHS: 1U41HG006104-02, 2P41HG02273-07; NIGMS NIH HHS: 2R01GM080646-06, 3R01GM080646-04S2, 5R01GM080646-05; NLM NIH HHS: 5G08LM010720-02
Nucleic acids research 2012;40;Database issue;D71-5
PUBMED: 22102590; PMC: 3245120; DOI: 10.1093/nar/gkr981
-
ASTD: The Alternative Splicing and Transcript Diversity database.
European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.
Genomics 2009;93;3;213-20
PUBMED: 19059335; DOI: 10.1016/j.ygeno.2008.11.003
-
The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.
Japan Biological Information Research Center, Japan Biological Informatics Consortium, Japan.
Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.
Funded by: Wellcome Trust: 077198
Nucleic acids research 2008;36;Database issue;D793-9
PUBMED: 18089548; PMC: 2238988; DOI: 10.1093/nar/gkm999
-
The Rice Annotation Project Database (RAP-DB): 2008 update.
National Institute of Agrobiological Sciences, Ibaraki 305-8602, Japan.
The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.
Nucleic acids research 2008;36;Database issue;D1028-33
PUBMED: 18089549; PMC: 2238920; DOI: 10.1093/nar/gkm978
-
Bioinformatics database infrastructure for biotechnology research.
EMBL-EBI, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambs CB10 1SD, UK. eleanor@ebi.ac.uk
Many databases are available that provide valuable data resources for the biotechnological researcher. According to their core data, they can be divided into different types. Some databases provide primary data, like all published nucleotide sequences, others deal with protein sequences. In addition to these two basic types of databases, a huge number of more specialized resources are available, like databases about protein structures, protein identification, special features of genes and/or proteins, or certain organisms. Furthermore, some resources offer integrated views on different types of data, allowing the user to do easy customized queries over large datasets and to compare different types of data.
Journal of biotechnology 2006;124;4;629-39
PUBMED: 16757051; DOI: 10.1016/j.jbiotec.2006.04.006
-
Annotation of the Drosophila melanogaster euchromatic genome: a systematic review.
Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA. sima@fruitfly.org
Background: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.
Results: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.
Conclusions: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
Funded by: NHGRI NIH HHS: HG00739, HG00750
Genome biology 2002;3;12;RESEARCH0083
-
FlyBase: a Drosophila database. Flybase Consortium.
FlyBase, Biological Laboratories, 16 Divinity Avenue, Cambridge, MA 02138, USA.
FlyBase (http://flybase.bio.indiana.edu/) is a comprehensive database of genetic and molecular data concerning Drosophila . FlyBase is maintained as a relational database (in Sybase) and is made available as html documents and flat files. The scope of FlyBase includes: genes, alleles (with phenotypes), aberrations, transposons, pointers to sequence data, gene products, maps, clones, stock lists, Drosophila workers and bibliographic references.
Nucleic acids research 1998;26;1;85-8
Alan Tracey
- Computer Biologist - Senior Genome Analyst
After graduating with a Geography BA (Hons) from Anglia Polytechnic University in 1997, I arrived at the Sanger Centre in 1998 to work as a sequencer on the Human Genome Project. After 1 year, I became a "finisher" and started learning about assembly improvement. I worked on a variety of genome projects including human, mouse, zebrafish, pig, tomato and many besides, notably contributing 1% of the finished human genome. In later projects, I worked on the most intractable repetitive regions learning many valuable problem solving skills.
Research
I joined the parasite genomics group as a Senior Genome Analyst in 2010 and have made significant contributions to a variety of helminth genome assemblies, bringing over a decade of experience as a "finisher" to bear in this group. My work involves iterative assembly improvement using a combination of bespoke software tools and algorithms to surpass what is achievable by automated assembly of de novo sequence. I seek to provide software development ideas and bug reporting to developers. I also work to manually annotate and refine gene models as necessary.
References
-
The zebrafish reference genome sequence and its relationship to the human genome.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Funded by: NCRR NIH HHS: R01 RR010715, R01 RR020833; NICHD NIH HHS: P01 HD022486, P01 HD22486; NIDDK NIH HHS: 1 R01 DK55377-01A1; NIGMS NIH HHS: R01 GM085318; NIH HHS: R01 OD011116; Wellcome Trust: 098051
Nature 2013;496;7446;498-503
PUBMED: 23594743; PMC: 3703927; DOI: 10.1038/nature12111
-
The genomes of four tapeworm species reveal adaptations to parasitism.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 098051
Nature 2013;496;7443;57-63
PUBMED: 23485966; DOI: 10.1038/nature12031
-
A large palindrome with interchromosomal gene duplications in the pericentromeric region of the D. melanogaster Y chromosome.
Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain.
The non-recombining Y chromosome is expected to degenerate over evolutionary time, however, gene gain is a common feature of Y chromosomes of mammals and Drosophila. Here, we report that a large palindrome containing interchromosomal segmental duplications is located in the vicinity of the first amplicon detected in the Y chromosome of D. melanogaster. The recent appearance of such amplicons suggests that duplications to the Y chromosome, followed by the amplification of the segmental duplications, are a mechanism for the continuing evolution of Drosophila Y chromosomes.
Funded by: Wellcome Trust
Molecular biology and evolution 2011;28;7;1967-71
PUBMED: 21297157; DOI: 10.1093/molbev/msr034
-
Novel sequencing strategy for repetitive DNA in a Drosophila BAC clone reveals that the centromeric region of the Y chromosome evolved from a telomere.
Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Madrid, Spain.
The centromeric and telomeric heterochromatin of eukaryotic chromosomes is mainly composed of middle-repetitive elements, such as transposable elements and tandemly repeated DNA sequences. Because of this repetitive nature, Whole Genome Shotgun Projects have failed in sequencing these regions. We describe a novel kind of transposon-based approach for sequencing highly repetitive DNA sequences in BAC clones. The key to this strategy relies on physical mapping the precise position of the transposon insertion, which enables the correct assembly of the repeated DNA. We have applied this strategy to a clone from the centromeric region of the Y chromosome of Drosophila melanogaster. The analysis of the complete sequence of this clone has allowed us to prove that this centromeric region evolved from a telomere, possibly after a pericentric inversion of an ancestral telocentric chromosome. Our results confirm that the use of transposon-mediated sequencing, including positional mapping information, improves current finishing strategies. The strategy we describe could be a universal approach to resolving the heterochromatic regions of eukaryotic genomes.
Funded by: Wellcome Trust
Nucleic acids research 2009;37;7;2264-73
PUBMED: 19237394; PMC: 2673431; DOI: 10.1093/nar/gkp085
-
The DNA sequence and biological annotation of human chromosome 1.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. sgregory@chg.duhs.duke.edu
The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.
Funded by: Wellcome Trust
Nature 2006;441;7091;315-21
PUBMED: 16710414; DOI: 10.1038/nature04727
-
The DNA sequence and comparative analysis of human chromosome 10.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. panos@sanger.ac.uk
The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.
Nature 2004;429;6990;375-81
PUBMED: 15164054; DOI: 10.1038/nature02462
-
DNA sequence and analysis of human chromosome 9.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. sjh@sanger.ac.uk
Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection.
Nature 2004;429;6990;369-74
PUBMED: 15164053; PMC: 2734081; DOI: 10.1038/nature02465
-
The DNA sequence and analysis of human chromosome 13.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. ad1@sanger.ac.uk
Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.
Nature 2004;428;6982;522-8
PUBMED: 15057823; PMC: 2665288; DOI: 10.1038/nature02379
-
The DNA sequence and analysis of human chromosome 6.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. ajm@sanger.ac.uk
Chromosome 6 is a metacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs, representing the largest chromosome sequenced so far. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes. Here we report that at least 96% of the protein-coding genes have been identified, as assessed by multi-species comparative sequence analysis, and provide evidence for the presence of further, otherwise unsupported exons/genes. Among these are genes directly implicated in cancer, schizophrenia, autoimmunity and many other diseases. Chromosome 6 harbours the largest transfer RNA gene cluster in the genome; we show that this cluster co-localizes with a region of high transcriptional activity. Within the essential immune loci of the major histocompatibility complex, we find HLA-B to be the most polymorphic gene on chromosome 6 and in the human genome.
Nature 2003;425;6960;805-11
PUBMED: 14574404; DOI: 10.1038/nature02055
-
The DNA sequence and comparative analysis of human chromosome 20.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. panos@sanger.ac.uk
The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.
Nature 2001;414;6866;865-71
PUBMED: 11780052; DOI: 10.1038/414865a
Magdalena Zarowiecki
mz3@sanger.ac.uk Postdoctoral Fellow
My research interests are tropical diseases, in particular the evolution of parasitism and host-parasite interactions. I did a M.Sc. in Zoological Systematic at Gothenburg University, Sweden, and an M.Res. in Biosystematics, at Natural History Museum and Imperial College, London. I worked with many non-model worms; ribbon worms, Oligochaetes, Cestodes and Trematodes. I also have interests in the wider field of tropical diseases from a Ph.D. in population genetics of mosquitoes. I previously held a postdoctoral position funded by the SynTax scheme; working with assembly and annotation of the Hymenolepis microstoma genome, and comparative phylogeny of flatworms.
Research
The current research is focusing on genomics of parasitic flatworms, including important platyhelminth parasites of humans in the genera Taenia, Hymenolepis, Echinococcus and Schistosoma. These platyhelminths have severe impact on the health and productivity of the poorest people in developing countries. The aim of the post-doc project is to develop comparative genomics of flatworms within the Parasite Genomics group. We use high-throughput approaches including RNAseq, gene-prediction, methylome studies, re-sequencing and microRNA-studies to increase the accuracy and biological depth of our platyhelminth genome annotations. Producing good-quality genomes, gene models and annotations is a vital underpinning for future translational research.
References
-
Cestode genomics - progress and prospects for advancing basic and applied aspects of flatworm biology.
Department of Zoology, The Natural History Museum, London, UK.
Characterization of the first tapeworm genome, Echinococcus multilocularis, is now nearly complete, and genome assemblies of E. granulosus, Taenia solium and Hymenolepis microstoma are in advanced draft versions. These initiatives herald the beginning of a genomic era in cestodology and underpin a diverse set of research agendas targeting both basic and applied aspects of tapeworm biology. We discuss the progress in the genomics of these species, provide insights into the presence and composition of immunologically relevant gene families, including the antigen B- and EG95/45W families, and discuss chemogenomic approaches toward the development of novel chemotherapeutics against cestode diseases. In addition, we discuss the evolution of tapeworm parasites and introduce the research programmes linked to genome initiatives that are aimed at understanding signalling systems involved in basic host-parasite interactions and morphogenesis.
Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151
Parasite immunology 2012;34;2-3;130-50
PUBMED: 21793855; DOI: 10.1111/j.1365-3024.2011.01319.x
-
Animals learn new tricks from microorganisms.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. microbes@sanger.ac.uk
Nature reviews. Microbiology 2011;9;12;836
PUBMED: 22085859; DOI: 10.1038/nrmicro2694
-
Towards a new role for vector systematics in parasite control.
Dept. of Zoology, Natural History Museum, London SW75BD, UK. mz3@sanger.ac.uk
Vector systematics research is being transformed by the recent development of theoretical, experimental and analytical methods, as well as conceptual insights into speciation and reconstruction of evolutionary history. We review this progress using examples from the mosquito genus Anopheles. The conclusion is that recent progress, particularly in the development of better tools for understanding evolutionary history, makes systematics much more informative for vector control purposes, and has increasing potential to inform and improve targeted vector control programmes.
Parasitology 2011;138;13;1723-9
PUBMED: 21679487; DOI: 10.1017/S003118201100062X
-
Rapid evolution of yeast centromeres in the absence of drive.
Division of Biology, Imperial College London, Ascot SL5 7PY, United Kingdom.
To find the most rapidly evolving regions in the yeast genome we compared most of chromosome III from three closely related lineages of the wild yeast Saccharomyces paradoxus. Unexpectedly, the centromere appears to be the fastest-evolving part of the chromosome, evolving even faster than DNA sequences unlikely to be under selective constraint (i.e., synonymous sites after correcting for codon usage bias and remnant transposable elements). Centromeres on other chromosomes also show an elevated rate of nucleotide substitution. Rapid centromere evolution has also been reported for some plants and animals and has been attributed to selection for inclusion in the egg or the ovule at female meiosis. But Saccharomyces yeasts have symmetrical meioses with all four products surviving, thus providing no opportunity for meiotic drive. In addition, yeast centromeres show the high levels of polymorphism expected under a neutral model of molecular evolution. We suggest that yeast centromeres suffer an elevated rate of mutation relative to other chromosomal regions and they change through a process of "centromere drift," not drive.
Funded by: Biotechnology and Biological Sciences Research Council; Wellcome Trust
Genetics 2008;178;4;2161-7
PUBMED: 18430941; PMC: 2323805; DOI: 10.1534/genetics.107.083980
-
Making the most of mitochondrial genomes--markers for phylogeny, molecular ecology and barcodes in Schistosoma (Platyhelminthes: Digenea).
Wolfson Wellcome Biomedical Laboratories, Department of Zoology, Natural History Museum, Cromwell Road, London SW7 5BD, UK.
An increasing number of complete sequences of mitochondrial (mt) genomes provides the opportunity to optimise the choice of molecular markers for phylogenetic and ecological studies. This is particularly the case where mt genomes from closely related taxa have been sequenced; e.g., within Schistosoma. These blood flukes include species that are the causative agents of schistosomiasis, where there has been a need to optimise markers for species and strain recognition. For many phylogenetic and population genetic studies, the choice of nucleotide sequences depends primarily on suitable PCR primers. Complete mt genomes allow individual gene or other mt markers to be assessed relative to one another for potential information content, prior to broad-scale sampling. We assess the phylogenetic utility of individual genes and identify regions that contain the greatest interspecific variation for molecular ecological and diagnostic markers. We show that variable characters are not randomly distributed along the genome and there is a positive correlation between polymorphism and divergence. The mt genomes of African and Asian schistosomes were compared with the available intraspecific dataset of Schistosoma mansoni through sliding window analyses, in order to assess whether the observed polymorphism was at a level predicted from interspecific comparisons. We found a positive correlation except for the two genes (cox1 and nad1) adjoining the putative control region in S. mansoni. The genes nad1, nad4, nad5, cox1 and cox3 resolved phylogenies that were consistent with a benchmark phylogeny and in general, longer genes performed better in phylogenetic reconstruction. Considering the information content of entire mt genome sequences, partial cox1 would not be the ideal marker for either species identification (barcoding) or population studies with Schistosoma species. Instead, we suggest the use of cox3 and nad5 for both phylogenetic and population studies. Five primer pairs designed against Schistosoma mekongi and Schistosoma malayensis were tested successfully against Schistosoma japonicum. In combination, these fragments encompass 20-27% of the variation amongst the genomes (average total length approximately 14,000bp), thus providing an efficient means of encapsulating the greatest amount of variation within the shortest sequence. Comparative mitogenomics provides the basis of a rational approach to molecular marker selection and optimisation.
International journal for parasitology 2007;37;12;1401-18
PUBMED: 17570370; DOI: 10.1016/j.ijpara.2007.04.014
Background
We are interested in studying the diversity of eukaryotic parasites and their complex interactions with their hosts. In particular, we wish to uncover the genomic basis for differences in the biology of parasites causing malaria and Neglected Tropical Diseases. Our approach starts with the establishment of a reference genome, followed by comparative sequencing of related strains or species to find candidate genes (or other sequences) relating to species-specific differences, such as diseases tropisms.

Dr Matt Berriman
