Dr Matt Berriman

Matt leads a programme in the genomics of Neglected Tropical Disease parasites, including helminths such as Schistomosomes, tapeworms, roundworms, hookworms, threadworms and whipworms.

Matt graduated in 1994 from the University of Manchester with a degree in biochemistry.

He moved to the London School of Hygiene and Tropical Medicine at the University of Dundee to undertake a PhD with Professor Alan Fairlamb, biochemically characterising a putative drug target from the malaria parasite Plasmodium falciparum. After being awarded a Wellcome Trust Travelling Prize Fellowship, Matt moved to the laboratory of Professor George Cross at Rockefeller University in New York to study Trypanosome telomeres. At the end of 2000, he joined the Wellcome Trust Sanger Institute as a Senior Computer Biologist in the Pathogen Sequencing Unit. There, he analysed and annotated the genomes of Plasmodium falciparum and Trypanosoma brucei. From 2003, Matt took over the leadership of more than 20 eukaryotic pathogen sequencing projects, primarily focused on the medically important Apicomplexan and Kinetoplastid protozoa, which include malaria parasites and trypanosomes, respectively. Matt joined the faculty in 2008 and is leading a programme in the genomics of Neglected Tropical Disease parasites, which include parasitic helminths such as Schistomosomes, tapeworms, roundworms, hookworms, threadworms and whipworms.

Selected Publications

  • A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium.

    Sinha A, Hughes KR, Modrzynska KK, Otto TD, Pfander C, Dickens NJ, Religa AA, Bushell E, Graham AL, Cameron R, Kafsack BF, Williams AE, Llinás M, Berriman M, Billker O and Waters AP

    1] Wellcome Trust Centre for Molecular Parasitology, University of Glasgow, Glasgow G12 8QQ, UK [2].

    Commitment to and completion of sexual development are essential for malaria parasites (protists of the genus Plasmodium) to be transmitted through mosquitoes. The molecular mechanism(s) responsible for commitment have been hitherto unknown. Here we show that PbAP2-G, a conserved member of the apicomplexan AP2 (ApiAP2) family of DNA-binding proteins, is essential for the commitment of asexually replicating forms to sexual development in Plasmodium berghei, a malaria parasite of rodents. PbAP2-G was identified from mutations in its encoding gene, PBANKA_143750, which account for the loss of sexual development frequently observed in parasites transmitted artificially by blood passage. Systematic gene deletion of conserved ApiAP2 genes in Plasmodium confirmed the role of PbAP2-G and revealed a second ApiAP2 member (PBANKA_103430, here termed PbAP2-G2) that significantly modulates but does not abolish gametocytogenesis, indicating that a cascade of ApiAP2 proteins are involved in commitment to the production and maturation of gametocytes. The data suggest a mechanism of commitment to gametocytogenesis in Plasmodium consistent with a positive feedback loop involving PbAP2-G that could be exploited to prevent the transmission of this pernicious parasite.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0501670; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50GM071508; Wellcome Trust: 083811/Z/07/Z, 085349, 098051

    Nature 2014;507;7491;253-7

  • A comprehensive evaluation of assembly scaffolding tools.

    Hunt M, Newbold C, Berriman M and Otto TD

    Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics.

    Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behavior of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data.

    Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.

    Genome biology 2014;15;3;R42

  • The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode.

    Cotton JA, Lilley CJ, Jones LM, Kikuchi T, Reid AJ, Thorpe P, Tsai IJ, Beasley H, Blok V, Cock PJ, Eves-van den Akker S, Holroyd N, Hunt M, Mantelin S, Naghra H, Pain A, Palomares-Rius JE, Zarowiecki M, Berriman M, Jones JT and Urwin PE

    Background: Globodera pallida is a devastating pathogen of potato crops, making it one of the most economically important plant parasitic nematodes. It is also an important model for the biology of cyst nematodes. Cyst nematodes and root-knot nematodes are the two most important plant parasitic nematode groups and together represent a global threat to food security.

    Results: We present the complete genome sequence of G. pallida, together with transcriptomic data from most of the nematode life cycle, particularly focusing on the life cycle stages involved in root invasion and establishment of the biotrophic feeding site. Despite the relatively close phylogenetic relationship with root-knot nematodes, we describe a very different gene family content between the two groups and in particular extensive differences in the repertoire of effectors, including an enormous expansion of the SPRY domain protein family in G. pallida, which includes the SPRYSEC family of effectors. This highlights the distinct biology of cyst nematodes compared to the root-knot nematodes that were, until now, the only sedentary plant parasitic nematodes for which genome information was available. We also present in-depth descriptions of the repertoires of other genes likely to be important in understanding the unique biology of cyst nematodes and of potential drug targets and other targets for their control.

    Conclusions: The data and analyses we present will be central in exploiting post-genomic approaches in the development of much-needed novel strategies for the control of G. pallida and related pathogens.

    Genome biology 2014;15;3;R43

  • The peculiar epidemiology of dracunculiasis in Chad.

    Eberhard ML, Ruiz-Tiben E, Hopkins DR, Farrell C, Toe F, Weiss A, Withers PC, Jenks MH, Thiele EA, Cotton JA, Hance Z, Holroyd N, Cama VA, Tahir MA and Mounda T

    Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, Georgia; The Carter Center, Atlanta, Georgia; The Carter Center, N'Djamena, Chad; LifeSource Biomedical, Centreville, Virginia; The Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Ministry of Public Health, N'Djamena, Chad.

    Dracunculiasis was rediscovered in Chad in 2010 after an apparent absence of 10 years. In April 2012 active village-based surveillance was initiated to determine where, when, and how transmission of the disease was occurring, and to implement interventions to interrupt it. The current epidemiologic pattern of the disease in Chad is unlike that seen previously in Chad or other endemic countries, i.e., no clustering of cases by village or association with a common water source, the average number of worms per person was small, and a large number of dogs were found to be infected. Molecular sequencing suggests these infections were all caused by Dracunculus medinensis. It appears that the infection in dogs is serving as the major driving force sustaining transmission in Chad, that an aberrant life cycle involving a paratenic host common to people and dogs is occurring, and that the cases in humans are sporadic and incidental.

    Funded by: Wellcome Trust: 098051

    The American journal of tropical medicine and hygiene 2014;90;1;61-70

  • Genomic confirmation of hybridisation and recent inbreeding in a vector-isolated Leishmania population.

    Rogers MB, Downing T, Smith BA, Imamura H, Sanders M, Svobodova M, Volf P, Berriman M, Cotton JA and Smith DF

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom ; Centre for Immunology and Infection, Department of Biology, University of York, York, United Kingdom.

    Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle.

    Funded by: Wellcome Trust: 076355, 085822, 098051

    PLoS genetics 2014;10;1;e1004092

  • WormBase 2014: new views of curated biology.

    Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L and Sternberg PW

    Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada, Genome Sequencing Center, Washington University, School of Medicine, St Louis, MO 63108, USA, Division of Biology and Biological Engineering 156-29, California Institute of Technology, Pasadena, CA 91125, USA, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Department of Genetics Campus, Washington University School of Medicine, St. Louis, MO 63110, USA, Genetics Unit, Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, CA 91125, USA.

    WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119; NHGRI NIH HHS: P41 HG002223, U41-HG002223

    Nucleic acids research 2014;42;Database issue;D789-93

  • Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.

    Lemieux JE, Kyes SA, Otto TD, Feller AI, Eastman RT, Pinches RA, Berriman M, Su XZ and Newbold CI

    Weatherall Institute of Molecular Medicine, Headington, Oxford, OX3 9DS, UK; National Institute of Allergy and Infectious Disease, NIH, Rockville, MD, 20892, USA.

    Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites.

    Funded by: Wellcome Trust: 082130, 082130/Z/07/Z, 098051

    Molecular microbiology 2013;90;3;519-37

  • Vector transmission regulates immune control of Plasmodium virulence.

    Spence PJ, Jarra W, Lévy P, Reid AJ, Chappell L, Brugat T, Sanders M, Berriman M and Langhorne J

    Division of Parasitology, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK.

    Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents, primates or humans increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission, which is assumed to 'reset' Plasmodium to its original character. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we use mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi to interrogate regulation of parasite virulence. Analysis of SBP P. c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn modifies the elicited mammalian immune response, which in turn attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.

    Funded by: Medical Research Council: MC_U117584248, U.1175.02.004.00004(60507), U117584248; Wellcome Trust: 085775, 089553, 098051

    Nature 2013;498;7453;228-31

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

  • Genes involved in host-parasite interactions can be revealed by their correlated expression.

    Reid AJ and Berriman M

    Parasite genomics group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ar11@sanger.ac.uk

    Molecular interactions between a parasite and its host are key to the ability of the parasite to enter the host and persist. Our understanding of the genes and proteins involved in these interactions is limited. To better understand these processes it would be advantageous to have a range of methods to predict pairs of genes involved in such interactions. Correlated gene expression profiles can be used to identify molecular interactions within a species. Here we have extended the concept to different species, showing that genes with correlated expression are more likely to encode proteins, which directly or indirectly participate in host-parasite interaction. We go on to examine our predictions of molecular interactions between the malaria parasite and both its mammalian host and insect vector. Our approach could be applied to study any interaction between species, for example, between a host and its parasites or pathogens, but also symbiotic and commensal pairings.

    Funded by: Wellcome Trust: 098051

    Nucleic acids research 2013;41;3;1508-18

  • The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.

    Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS and Cotton JA

    Background: The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most economically important parasites of livestock and humans.

    Results: Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug target families, providing the most comprehensive understanding yet of the action of several important anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut that provide a rich source of vaccine and drug target candidates.

    Conclusions: The H. contortus genome and transcriptome provide an essential platform for postgenomic research in this and other important strongylid parasites.

    Genome biology 2013;14;8;R88

  • Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.

    Protasio AV, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosome infection begins with the penetration of cercariae through healthy unbroken host skin. This process leads to the transformation of the free-living larvae into obligate parasites called schistosomula. This irreversible transformation, which occurs in as little as two hours, involves casting the cercaria tail and complete remodelling of the surface membrane. At this stage, parasites are vulnerable to host immune attack and oxidative stress. Consequently, the mechanisms by which the parasite recognises and swiftly adapts to the human host are still the subject of many studies, especially in the context of development of intervention strategies against schistosomiasis infection. Because obtaining enough material from in vivo infections is not always feasible for such studies, the transformation process is often mimicked in the laboratory by application of shear pressure to a cercarial sample resulting in mechanically transformed (MT) schistosomula. These parasites share remarkable morphological and biochemical similarity to the naturally transformed counterparts and have been considered a good proxy for parasites undergoing natural infection. Relying on this equivalency, MT schistosomula have been used almost exclusively in high-throughput studies of gene expression, identification of drug targets and identification of effective drugs against schistosomes. However, the transcriptional equivalency between skin-transformed (ST) and MT schistosomula has never been proven. In our approach to compare these two types of schistosomula preparations and to explore differences in gene expression triggered by the presence of a skin barrier, we performed RNA-seq transcriptome profiling of ST and MT schistosomula at 24 hours post transformation. We report that these two very distinct schistosomula preparations differ only in the expression of 38 genes (out of ∼11,000), providing convincing evidence to resolve the skin vs. mechanical long-lasting controversy.

    Funded by: Wellcome Trust: WT 083931/Z/07/Z, WT 098051

    PLoS neglected tropical diseases 2013;7;3;e2091

  • REAPR: a universal tool for genome assembly evaluation.

    Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M and Otto TD

    Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.

    Genome biology 2013;14;5;R47

  • Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

    Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, O'Brien J, Djimde A, Doumbo O, Zongo I, Ouedraogo JB, Michon P, Mueller I, Siba P, Nzila A, Borrmann S, Kiara SM, Marsh K, Jiang H, Su XZ, Amaratunga C, Fairhurst R, Socheat D, Nosten F, Imwong M, White NJ, Sanders M, Anastasi E, Alcock D, Drury E, Oyola S, Quail MA, Turner DJ, Ruano-Rubio V, Jyothi D, Amenga-Etego L, Hubbart C, Jeffreys A, Rowlands K, Sutherland C, Roper C, Mangano V, Modiano D, Tan JC, Ferdig MT, Amambua-Ngwa A, Conway DJ, Takala-Harrison S, Plowe CV, Rayner JC, Rockett KA, Clark TG, Newbold CI, Berriman M, MacInnis B and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 082370, 089275, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 092654, 098051

    Nature 2012;487;7407;375-9

  • A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

    Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M and Otto TD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.

    Funded by: Wellcome Trust: 098051

    Nature protocols 2012;7;7;1260-84

  • A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.

    Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.

    Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z

    PLoS neglected tropical diseases 2012;6;1;e1455

  • Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.

    Reid AJ, Vermont SJ, Cotton JA, Harris D, Hill-Cawthorne GA, Könen-Waisman S, Latham SM, Mourier T, Norton R, Quail MA, Sanders M, Shanmugam D, Sohal A, Wasmuth JD, Brunk B, Grigg ME, Howard JC, Parkinson J, Roos DS, Trees AJ, Berriman M, Pain A and Wastling JM

    Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.

    Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z

    PLoS pathogens 2012;8;3;e1002567

  • Germline transgenesis and insertional mutagenesis in Schistosoma mansoni mediated by murine leukemia virus.

    Rinaldi G, Eckert SE, Tsai IJ, Suttiprapa S, Kines KJ, Tort JF, Mann VH, Turner DJ, Berriman M and Brindley PJ

    Department of Microbiology, Immunology & Tropical Medicine, School of Medicine & Health Sciences, The George Washington University, Washington, DC, United States of America.

    Functional studies will facilitate characterization of role and essentiality of newly available genome sequences of the human schistosomes, Schistosoma mansoni, S. japonicum and S. haematobium. To develop transgenesis as a functional approach for these pathogens, we previously demonstrated that pseudotyped murine leukemia virus (MLV) can transduce schistosomes leading to chromosomal integration of reporter transgenes and short hairpin RNA cassettes. Here we investigated vertical transmission of transgenes through the developmental cycle of S. mansoni after introducing transgenes into eggs. Although MLV infection of schistosome eggs from mouse livers was efficient in terms of snail infectivity, >10-fold higher transgene copy numbers were detected in cercariae derived from in vitro laid eggs (IVLE). After infecting snails with miracidia from eggs transduced by MLV, sequencing of genomic DNA from cercariae released from the snails also revealed the presence of transgenes, demonstrating that transgenes had been transmitted through the asexual developmental cycle, and thereby confirming germline transgenesis. High-throughput sequencing of genomic DNA from schistosome populations exposed to MLV mapped widespread and random insertion of transgenes throughout the genome, along each of the autosomes and sex chromosomes, validating the utility of this approach for insertional mutagenesis. In addition, the germline-transmitted transgene encoding neomycin phosphotransferase rescued cultured schistosomules from toxicity of the antibiotic G418, and PCR analysis of eggs resulting from sexual reproduction of the transgenic worms in mice confirmed that retroviral transgenes were transmitted to the next (F1) generation. These findings provide the first description of wide-scale, random insertional mutagenesis of chromosomes and of germline transmission of a transgene in schistosomes. Transgenic lines of schistosomes expressing antibiotic resistance could advance functional genomics for these significant human pathogens. DATABASE ACCESSION: Sequence data from this study have been submitted to the European Nucleotide Archive (http://www.ebi.ac.uk/embl) under accession number ERP000379.

    Funded by: NIAID NIH HHS: R01AI072773; PHS HHS: HHSN272201000005I; Wellcome Trust: 098051

    PLoS pathogens 2012;8;7;e1002820

  • Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.

    Downing T, Imamura H, Decuypere S, Clark TG, Coombs GH, Cotton JA, Hilley JD, de Doncker S, Maes I, Mottram JC, Quail MA, Rijal S, Sanders M, Schönian G, Stark O, Sundar S, Vanaerschot M, Hertz-Fowler C, Dujardin JC and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.

    Funded by: Wellcome Trust: 076355, 085775/Z/08/Z

    Genome research 2011;21;12;2143-56

  • RATT: Rapid Annotation Transfer Tool.

    Otto TD, Dillon GP, Degrave WS and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Nucleic acids research 2011;39;9;e57

  • Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    Otto TD, Sanders M, Berriman M and Newbold C

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy.

    Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications.

    Availability: The software is available at http://icorn.sourceforge.net

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2010;26;14;1704-7

  • New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, Lemieux J, Barrell B, Pain A, Berriman M, Newbold C and Llinás M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

    Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z

    Molecular microbiology 2010;76;1;12-24

  • Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.

    Tsai IJ, Otto TD and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. jit@sanger.ac.uk

    Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Genome biology 2010;11;4;R41

  • Plasmodium falciparum var gene expression is modified by host immunity.

    Warimwe GM, Keane TM, Fegan G, Musyoki JN, Newton CR, Pain A, Berriman M, Marsh K and Bull PC

    Kenya Medical Research Institute-Wellcome Trust Research Programme, 80108 Kilifi, Kenya.

    Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) is a potentially important family of immune targets, which play a central role in the host-parasite interaction by binding to various host molecules. They are encoded by a diverse family of genes called var, of which there are approximately 60 copies in each parasite genome. In sub-Saharan Africa, although P. falciparum infection occurs throughout life, severe malarial disease tends to occur only in childhood. This could potentially be explained if (i) PfEMP1 variants differ in their capacity to support pathogenesis of severe malaria and (ii) this capacity is linked to the likelihood of each molecule being recognized and cleared by naturally acquired antibodies. Here, in a study of 217 Kenyan children with malaria, we show that expression of a group of var genes "cys2," containing a distinct pattern of cysteine residues, is associated with low host immunity. Expression of cys2 genes was associated with parasites from young children, those with severe malaria, and those with a poorly developed antibody response to parasite-infected erythrocyte surface antigens. Cys-2 var genes form a minor component of all genomic var repertoires analyzed to date. Therefore, the results are compatible with the hypothesis that the genomic var gene repertoire is organized such that PfEMP1 molecules that confer the most virulence to the parasite tend also to be those that are most susceptible to the development of host immunity. This may help the parasite to adapt effectively to the development of host antibodies through modification of the host-parasite relationship.

    Funded by: Wellcome Trust: 076030, 077092, 084535

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;51;21801-6

  • Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans.

    Jackson AP, Gamble JA, Yeomans T, Moran GP, Saunders D, Harris D, Aslett M, Barrell JF, Butler G, Citiulo F, Coleman DC, de Groot PW, Goodwin TJ, Quail MA, McQuillan J, Munro CA, Pain A, Poulter RT, Rajandream MA, Renauld H, Spiering MJ, Tivey A, Gow NA, Barrell B, Sullivan DJ and Berriman M

    Pathogen Genomics Group, Wellcome Trust Sanger Institute, Cambridge, United Kingdom. aj4@sanger.ac.uk

    Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.

    Funded by: Medical Research Council: G0400284; Wellcome Trust: WT085775/Z/08/Z

    Genome research 2009;19;12;2231-44

  • The genome of the blood fluke Schistosoma mansoni.

    Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG and El-Sayed NM

    Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk

    Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

    Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z

    Nature 2009;460;7253;352-8

  • Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    Carver T, Berriman M, Tivey A, Patel C, Böhme U, Barrell BG, Parkhill J and Rajandream MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences.

    Results: Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text.

    Availability: Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/

    Funded by: Wellcome Trust: 082372

    Bioinformatics (Oxford, England) 2008;24;23;2672-6

  • Genetics of mating and sex determination in the parasitic nematode Haemonchus contortus.

    Redman E, Grillo V, Saunders G, Packard E, Jackson F, Berriman M and Gilleard JS

    The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Genetic analysis of parasitic nematodes has been a neglected area of research and the basic genetics of this important group of pathogens are poorly understood. Haemonchus contortus is one of the most economically significant livestock parasites worldwide and is a key experimental model for the strongylid nematode group that includes many important human and animal pathogens. We have undertaken a study of the genetics and the mode of mating of this parasite using microsatellite markers. Inheritance studies with autosomal markers demonstrated obligate dioecious sexual reproduction and polyandrous mating that are reported here for the first time in a parasitic helminth and provide the parasite with a mechanism of increasing genetic diversity. The karyotype of the H. contortus, MHco3(ISE) isolate was determined as 2n = 11 or 12. We have developed a panel of microsatellite markers that are tightly linked on the X chromosome and have used them to determine the sex chromosomal karyotype as XO male and XX female. Haplotype analysis using the X-chromosomal markers also demonstrated polyandry, independent of the autosomal marker analysis, and enabled a more direct estimate of the number of male parental genotypes contributing to each brood. This work provides a basis for future forward genetic analysis on H. contortus and related parasitic nematodes.

    Funded by: Wellcome Trust: 067811/Z/02/Z

    Genetics 2008;180;4;1877-87

  • Genomic-scale prioritization of drug targets: the TDR Targets database.

    Agüero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, Carmona S, Carruthers IM, Chan AW, Chen F, Crowther GJ, Doyle MA, Hertz-Fowler C, Hopkins AL, McAllister G, Nwaka S, Overington JP, Pain A, Paolini GV, Pieper U, Ralph SA, Riechers A, Roos DS, Sali A, Shanmugam D, Suzuki T, Van Voorhis WC and Verlinde CL

    Instituto de Investigaciones Biotecnológicas, Universidad Nacional de General San Martín, San Martín 1650, Buenos Aires, Argentina. fernan@unsam.edu.ar

    The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens.

    Funded by: NIGMS NIH HHS: R01 GM054762-14

    Nature reviews. Drug discovery 2008;7;11;900-7

  • The genome of the simian and human malaria parasite Plasmodium knowlesi.

    Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, Mourier T, Mistry J, Pasini EM, Aslett MA, Balasubrammaniam S, Borgwardt K, Brooks K, Carret C, Carver TJ, Cherevach I, Chillingworth T, Clark TG, Galinski MR, Hall N, Harper D, Harris D, Hauser H, Ivens A, Janssen CS, Keane T, Larke N, Lapp S, Marti M, Moule S, Meyer IM, Ormond D, Peters N, Sanders M, Sanders S, Sargeant TJ, Simmonds M, Smith F, Squares R, Thurston S, Tivey AR, Walker D, White B, Zuiderwijk E, Churcher C, Quail MA, Cowman AF, Turner CM, Rajandream MA, Kocken CH, Thomas AW, Newbold CI, Barrell BG and Berriman M

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ap2@sanger.ac.uk

    Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.

    Funded by: Wellcome Trust: 085775

    Nature 2008;455;7214;799-803

  • Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.

    Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, Ivens A, Newbold C and Pain A

    Ancient DNA and Evolution Group, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark.

    We undertook a genome-wide search for novel noncoding RNAs (ncRNA) in the malaria parasite Plasmodium falciparum. We used the RNAz program to predict structures in the noncoding regions of the P. falciparum 3D7 genome that were conserved with at least one of seven other Plasmodium spp. genome sequences. By using Northern blot analysis for 76 high-scoring predictions and microarray analysis for the majority of candidates, we have verified the expression of 33 novel ncRNA transcripts including four members of a ncRNA family in the asexual blood stage. These transcripts represent novel structured ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related chimp malaria parasite P. reichenowi. The high confirmation rate within a single parasite life cycle stage suggests that many more of the predictions may be expressed in other stages of the organism's life cycle.

    Funded by: Wellcome Trust

    Genome research 2008;18;2;281-92

  • Insights into the genome sequence of a free-living Kinetoplastid: Bodo saltans (Kinetoplastida: Euglenozoa).

    Jackson AP, Quail MA and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK. aj4@sanger.ac.uk

    Background: Bodo saltans is a free-living kinetoplastid and among the closest relatives of the trypanosomatid parasites, which cause such human diseases as African sleeping sickness, leishmaniasis and Chagas disease. A B. saltans genome sequence will provide a free-living comparison with parasitic genomes necessary for comparative analyses of existing and future trypanosomatid genomic resources. Various coding regions were sequenced to provide a preliminary insight into the bodonid genome sequence, relative to trypanosomatid sequences.

    Results: 0.4 Mbp of B. saltans genome was sequenced from 12 distinct regions and contained 178 coding sequences. As in trypanosomatids, introns were absent and %GC was elevated in coding regions, greatly assisting in gene finding. In the regions studied, roughly 60% of all genes had homologs in trypanosomatids, while 28% were Bodo-specific. Intergenic sequences were typically short, resulting in higher gene density than in trypanosomatids. Although synteny was typically conserved for those genes with trypanosomatid homologs, strict colinearity was rarely observed because gene order was regularly disrupted by Bodo-specific genes.

    Conclusion: The B. saltans genome contains both sequences homologous to trypanosomatids and sequences never seen before. Structural similarities suggest that its assembly should be solvable, and, although de novo assembly will be necessary, existing trypanosomatid projects will provide some guide to annotation. A complete genome sequence will provide an effective ancestral model for understanding the shared and derived features of known trypanosomatid genomes, but it will also identify those kinetoplastid genome features lost during the evolution of parasitism.

    BMC genomics 2008;9;594

  • Telomeric expression sites are highly conserved in Trypanosoma brucei.

    Hertz-Fowler C, Figueiredo LM, Quail MA, Becker M, Jackson A, Bason N, Brooks K, Churcher C, Fahkro S, Goodhead I, Heath P, Kartvelishvili M, Mungall K, Harris D, Hauser H, Sanders M, Saunders D, Seeger K, Sharp S, Taylor JE, Walker D, White B, Young R, Cross GA, Rudenko G, Barry JD, Louis EJ and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. chf@sanger.ac.uk

    Subtelomeric regions are often under-represented in genome sequences of eukaryotes. One of the best known examples of the use of telomere proximity for adaptive purposes are the bloodstream expression sites (BESs) of the African trypanosome Trypanosoma brucei. To enhance our understanding of BES structure and function in host adaptation and immune evasion, the BES repertoire from the Lister 427 strain of T. brucei were independently tagged and sequenced. BESs are polymorphic in size and structure but reveal a surprisingly conserved architecture in the context of extensive recombination. Very small BESs do exist and many functioning BESs do not contain the full complement of expression site associated genes (ESAGs). The consequences of duplicated or missing ESAGs, including ESAG9, a newly named ESAG12, and additional variant surface glycoprotein genes (VSGs) were evaluated by functional assays after BESs were tagged with a drug-resistance gene. Phylogenetic analysis of constituent ESAG families suggests that BESs are sequence mosaics and that extensive recombination has shaped the evolution of the BES repertoire. This work opens important perspectives in understanding the molecular mechanisms of antigenic variation, a widely used strategy for immune evasion in pathogens, and telomere biology.

    Funded by: NIAID NIH HHS: R01AI021729; Wellcome Trust: 095161

    PloS one 2008;3;10;e3527

  • Comparative genomic analysis of three Leishmania species that cause diverse human disease.

    Peacock CS, Seeger K, Harris D, Murphy L, Ruiz JC, Quail MA, Peters N, Adlem E, Tivey A, Aslett M, Kerhornou A, Ivens A, Fraser A, Rajandream MA, Carver T, Norbertczak H, Chillingworth T, Hance Z, Jagels K, Moule S, Ormond D, Rutter S, Squares R, Whitehead S, Rabbinowitsch E, Arrowsmith C, White B, Thurston S, Bringaud F, Baldauf SL, Faulconbridge A, Jeffares D, Depledge DP, Oyola SO, Hilley JD, Brito LO, Tosi LR, Barrell B, Cruz AK, Mottram JC, Smith DF and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. csp@sanger.ac.uk

    Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.

    Funded by: Medical Research Council: G0000508; Wellcome Trust: 076355, 085775

    Nature genetics 2007;39;7;839-47

  • Genome variation and evolution of the malaria parasite Plasmodium falciparum.

    Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann AC, Kyes S, Krishna S, Newbold C, Dermitzakis ET and Berriman M

    Informatics Division, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Hinxton, UK.

    Infections with the malaria parasite Plasmodium falciparum result in more than 1 million deaths each year worldwide. Deciphering the evolutionary history and genetic variation of P. falciparum is critical for understanding the evolution of drug resistance, identifying potential vaccine candidates and appreciating the effect of parasite variation on prevalence and severity of malaria in humans. Most studies of natural variation in P. falciparum have been either in depth over small genomic regions (up to the size of a small chromosome) or genome wide but only at low resolution. In an effort to complement these studies with genome-wide data, we undertook shotgun sequencing of a Ghanaian clinical isolate (with fivefold coverage), the IT laboratory isolate (with onefold coverage) and the chimpanzee parasite P. reichenowi (with twofold coverage). We compared these sequences with the fully sequenced P. falciparum 3D7 isolate genome. We describe the most salient features of P. falciparum polymorphism and adaptive evolution with relation to gene function, transcript and protein expression and cellular localization. This analysis uncovers the primary evolutionary changes that have occurred since the P. falciparum-P. reichenowi speciation and changes that are occurring within P. falciparum.

    Funded by: Wellcome Trust: 077046, 079643

    Nature genetics 2007;39;1;120-5

  • Common inheritance of chromosome Ia associated with clonal expansion of Toxoplasma gondii.

    Khan A, Böhme U, Kelly KA, Adlem E, Brooks K, Simmonds M, Mungall K, Quail MA, Arrowsmith C, Chillingworth T, Churcher C, Harris D, Collins M, Fosker N, Fraser A, Hance Z, Jagels K, Moule S, Murphy L, O'Neil S, Rajandream MA, Saunders D, Seeger K, Whitehead S, Mayr T, Xuan X, Watanabe J, Suzuki Y, Wakaguri H, Sugano S, Sugimoto C, Paulsen I, Mackey AJ, Roos DS, Hall N, Berriman M, Barrell B, Sibley LD and Ajioka JW

    Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.

    Toxoplasma gondii is a globally distributed protozoan parasite that can infect virtually all warm-blooded animals and humans. Despite the existence of a sexual phase in the life cycle, T. gondii has an unusual population structure dominated by three clonal lineages that predominate in North America and Europe, (Types I, II, and III). These lineages were founded by common ancestors approximately10,000 yr ago. The recent origin and widespread distribution of the clonal lineages is attributed to the circumvention of the sexual cycle by a new mode of transmission-asexual transmission between intermediate hosts. Asexual transmission appears to be multigenic and although the specific genes mediating this trait are unknown, it is predicted that all members of the clonal lineages should share the same alleles. Genetic mapping studies suggested that chromosome Ia was unusually monomorphic compared with the rest of the genome. To investigate this further, we sequenced chromosome Ia and chromosome Ib in the Type I strain, RH, and the Type II strain, ME49. Comparative genome analyses of the two chromosomal sequences revealed that the same copy of chromosome Ia was inherited in each lineage, whereas chromosome Ib maintained the same high frequency of between-strain polymorphism as the rest of the genome. Sampling of chromosome Ia sequence in seven additional representative strains from the three clonal lineages supports a monomorphic inheritance, which is unique within the genome. Taken together, our observations implicate a specific combination of alleles on chromosome Ia in the recent origin and widespread success of the clonal lineages of T. gondii.

    Funded by: NIAID NIH HHS: R01 AI059176; Wellcome Trust

    Genome research 2006;16;9;1119-25

  • Just one cross appears capable of dramatically altering the population biology of a eukaryotic pathogen like Toxoplasma gondii.

    Boyle JP, Rajasekar B, Saeij JP, Ajioka JW, Berriman M, Paulsen I, Roos DS, Sibley LD, White MW and Boothroyd JC

    Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.

    Toxoplasma gondii, an obligate intracellular protozoan of the phylum Apicomplexa, is estimated to infect over a billion people worldwide as well as a great many other mammalian and avian hosts. Despite this ubiquity, the vast majority of human infections in Europe and North America are thought to be due to only three genotypes. Using a genome-wide analysis of single-nucleotide polymorphisms, we have constructed a genealogy for these three lines. The data indicate that types I and III are second- and first-generation offspring, respectively, of a cross between a type II strain and one of two ancestral strains. An extant T. gondii strain (P89) appears to be the modern descendant of the non-type II parent of type III, making the full genealogy of the type III clonotype known. The simplicity of this family tree demonstrates that even a single cross can lead to the emergence and dominance of a new clonal genotype that completely alters the population biology of a sexual pathogen.

    Funded by: NIAID NIH HHS: AI045806, AI05093, AI21423, AI41014, F32AI60306, R01 AI036629

    Proceedings of the National Academy of Sciences of the United States of America 2006;103;27;10514-9

  • The genome of the African trypanosome Trypanosoma brucei.

    Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu DC, Lennard NJ, Caler E, Hamlin NE, Haas B, Böhme U, Hannick L, Aslett MA, Shallom J, Marcello L, Hou L, Wickstead B, Alsmark UC, Arrowsmith C, Atkin RJ, Barron AJ, Bringaud F, Brooks K, Carrington M, Cherevach I, Chillingworth TJ, Churcher C, Clark LN, Corton CH, Cronin A, Davies RM, Doggett J, Djikeng A, Feldblyum T, Field MC, Fraser A, Goodhead I, Hance Z, Harper D, Harris BR, Hauser H, Hostetler J, Ivens A, Jagels K, Johnson D, Johnson J, Jones K, Kerhornou AX, Koo H, Larke N, Landfear S, Larkin C, Leech V, Line A, Lord A, Macleod A, Mooney PJ, Moule S, Martin DM, Morgan GW, Mungall K, Norbertczak H, Ormond D, Pai G, Peacock CS, Peterson J, Quail MA, Rabbinowitsch E, Rajandream MA, Reitter C, Salzberg SL, Sanders M, Schobel S, Sharp S, Simmonds M, Simpson AJ, Tallon L, Turner CM, Tait A, Tivey AR, Van Aken S, Walker D, Wanless D, Wang S, White B, White O, Whitehead S, Woodward J, Wortman J, Adams MD, Embley TM, Gull K, Ullu E, Barry JD, Fairlamb AH, Opperdoes F, Barrell BG, Donelson JE, Hall N, Fraser CM, Melville SE and El-Sayed NM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. mb4@sanger.ac.uk

    African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

    Funded by: NIAID NIH HHS: AI43062; Wellcome Trust

    Science (New York, N.Y.) 2005;309;5733;416-22

  • Genome sequence of the human malaria parasite Plasmodium falciparum.

    Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM and Barrell B

    The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. gardner@tigr.org

    The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

    Funded by: NIAID NIH HHS: R01 AI028398-23; Wellcome Trust: 061524

    Nature 2002;419;6906;498-511

  • The architecture of variant surface glycoprotein gene expression sites in Trypanosoma brucei.

    Berriman M, Hall N, Sheader K, Bringaud F, Tiwari B, Isobe T, Bowman S, Corton C, Clark L, Cross GA, Hoek M, Zanders T, Berberof M, Borst P and Rudenko G

    The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Trypanosoma brucei evades the immune system by switching between Variant Surface Glycoprotein (VSG) genes. The active VSG gene is transcribed in one of approximately 20 telomeric expression sites (ESs). It has been postulated that ES polymorphism plays a role in host adaptation. To gain more insight into ES architecture, we have determined the complete sequence of Bacterial Artificial Chromosomes (BACs) containing DNA from three ESs and their flanking regions. There was variation in the order and number of ES-associated genes (ESAGs). ESAGs 6 and 7, encoding transferrin receptor subunits, are the only ESAGs with functional copies in every ES that has been sequenced until now. A BAC clone containing the VO2 ES sequences comprised approximately half of a 330 kb 'intermediate' chromosome. The extensive similarity between this intermediate chromosome and the left telomere of T. brucei 927 chromosome I, suggests that this previously uncharacterised intermediate size class of chromosomes could have arisen from breakage of megabase chromosomes. Unexpected conservation of sequences, including pseudogenes, indicates that the multiple ESs could have arisen through a relatively recent amplification of a single ES.

    Funded by: NIAID NIH HHS: AI21729; Wellcome Trust: 095161

    Molecular and biochemical parasitology 2002;122;2;131-40

2014 Publications

  • A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium.

    Sinha A, Hughes KR, Modrzynska KK, Otto TD, Pfander C, Dickens NJ, Religa AA, Bushell E, Graham AL, Cameron R, Kafsack BF, Williams AE, Llinás M, Berriman M, Billker O and Waters AP

    1] Wellcome Trust Centre for Molecular Parasitology, University of Glasgow, Glasgow G12 8QQ, UK [2].

    Commitment to and completion of sexual development are essential for malaria parasites (protists of the genus Plasmodium) to be transmitted through mosquitoes. The molecular mechanism(s) responsible for commitment have been hitherto unknown. Here we show that PbAP2-G, a conserved member of the apicomplexan AP2 (ApiAP2) family of DNA-binding proteins, is essential for the commitment of asexually replicating forms to sexual development in Plasmodium berghei, a malaria parasite of rodents. PbAP2-G was identified from mutations in its encoding gene, PBANKA_143750, which account for the loss of sexual development frequently observed in parasites transmitted artificially by blood passage. Systematic gene deletion of conserved ApiAP2 genes in Plasmodium confirmed the role of PbAP2-G and revealed a second ApiAP2 member (PBANKA_103430, here termed PbAP2-G2) that significantly modulates but does not abolish gametocytogenesis, indicating that a cascade of ApiAP2 proteins are involved in commitment to the production and maturation of gametocytes. The data suggest a mechanism of commitment to gametocytogenesis in Plasmodium consistent with a positive feedback loop involving PbAP2-G that could be exploited to prevent the transmission of this pernicious parasite.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0501670; NIAID NIH HHS: R01 AI076276; NIGMS NIH HHS: P50GM071508; Wellcome Trust: 083811/Z/07/Z, 085349, 098051

    Nature 2014;507;7491;253-7

  • A comprehensive evaluation of assembly scaffolding tools.

    Hunt M, Newbold C, Berriman M and Otto TD

    Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics.

    Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behavior of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data.

    Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.

    Genome biology 2014;15;3;R42

  • The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode.

    Cotton JA, Lilley CJ, Jones LM, Kikuchi T, Reid AJ, Thorpe P, Tsai IJ, Beasley H, Blok V, Cock PJ, Eves-van den Akker S, Holroyd N, Hunt M, Mantelin S, Naghra H, Pain A, Palomares-Rius JE, Zarowiecki M, Berriman M, Jones JT and Urwin PE

    Background: Globodera pallida is a devastating pathogen of potato crops, making it one of the most economically important plant parasitic nematodes. It is also an important model for the biology of cyst nematodes. Cyst nematodes and root-knot nematodes are the two most important plant parasitic nematode groups and together represent a global threat to food security.

    Results: We present the complete genome sequence of G. pallida, together with transcriptomic data from most of the nematode life cycle, particularly focusing on the life cycle stages involved in root invasion and establishment of the biotrophic feeding site. Despite the relatively close phylogenetic relationship with root-knot nematodes, we describe a very different gene family content between the two groups and in particular extensive differences in the repertoire of effectors, including an enormous expansion of the SPRY domain protein family in G. pallida, which includes the SPRYSEC family of effectors. This highlights the distinct biology of cyst nematodes compared to the root-knot nematodes that were, until now, the only sedentary plant parasitic nematodes for which genome information was available. We also present in-depth descriptions of the repertoires of other genes likely to be important in understanding the unique biology of cyst nematodes and of potential drug targets and other targets for their control.

    Conclusions: The data and analyses we present will be central in exploiting post-genomic approaches in the development of much-needed novel strategies for the control of G. pallida and related pathogens.

    Genome biology 2014;15;3;R43

  • Genomic confirmation of hybridisation and recent inbreeding in a vector-isolated Leishmania population.

    Rogers MB, Downing T, Smith BA, Imamura H, Sanders M, Svobodova M, Volf P, Berriman M, Cotton JA and Smith DF

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom ; Centre for Immunology and Infection, Department of Biology, University of York, York, United Kingdom.

    Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle.

    Funded by: Wellcome Trust: 076355, 085822, 098051

    PLoS genetics 2014;10;1;e1004092

  • WormBase 2014: new views of curated biology.

    Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Done J, Grove C, Howe K, Kishore R, Lee R, Li Y, Muller HM, Nakamura C, Ozersky P, Paulini M, Raciti D, Schindelman G, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Wong JD, Yook K, Schedl T, Hodgkin J, Berriman M, Kersey P, Spieth J, Stein L and Sternberg PW

    Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada, Genome Sequencing Center, Washington University, School of Medicine, St Louis, MO 63108, USA, Division of Biology and Biological Engineering 156-29, California Institute of Technology, Pasadena, CA 91125, USA, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Department of Genetics Campus, Washington University School of Medicine, St. Louis, MO 63110, USA, Genetics Unit, Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, CA 91125, USA.

    WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G070119; NHGRI NIH HHS: P41 HG002223, U41-HG002223

    Nucleic acids research 2014;42;Database issue;D789-93

2013 Publications

  • The peculiar epidemiology of dracunculiasis in Chad.

    Eberhard ML, Ruiz-Tiben E, Hopkins DR, Farrell C, Toe F, Weiss A, Withers PC, Jenks MH, Thiele EA, Cotton JA, Hance Z, Holroyd N, Cama VA, Tahir MA and Mounda T

    Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, Georgia; The Carter Center, Atlanta, Georgia; The Carter Center, N'Djamena, Chad; LifeSource Biomedical, Centreville, Virginia; The Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Ministry of Public Health, N'Djamena, Chad.

    Dracunculiasis was rediscovered in Chad in 2010 after an apparent absence of 10 years. In April 2012 active village-based surveillance was initiated to determine where, when, and how transmission of the disease was occurring, and to implement interventions to interrupt it. The current epidemiologic pattern of the disease in Chad is unlike that seen previously in Chad or other endemic countries, i.e., no clustering of cases by village or association with a common water source, the average number of worms per person was small, and a large number of dogs were found to be infected. Molecular sequencing suggests these infections were all caused by Dracunculus medinensis. It appears that the infection in dogs is serving as the major driving force sustaining transmission in Chad, that an aberrant life cycle involving a paratenic host common to people and dogs is occurring, and that the cases in humans are sporadic and incidental.

    Funded by: Wellcome Trust: 098051

    The American journal of tropical medicine and hygiene 2014;90;1;61-70

  • Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.

    Lemieux JE, Kyes SA, Otto TD, Feller AI, Eastman RT, Pinches RA, Berriman M, Su XZ and Newbold CI

    Weatherall Institute of Molecular Medicine, Headington, Oxford, OX3 9DS, UK; National Institute of Allergy and Infectious Disease, NIH, Rockville, MD, 20892, USA.

    Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites.

    Funded by: Wellcome Trust: 082130, 082130/Z/07/Z, 098051

    Molecular microbiology 2013;90;3;519-37

  • Vector transmission regulates immune control of Plasmodium virulence.

    Spence PJ, Jarra W, Lévy P, Reid AJ, Chappell L, Brugat T, Sanders M, Berriman M and Langhorne J

    Division of Parasitology, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK.

    Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents, primates or humans increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission, which is assumed to 'reset' Plasmodium to its original character. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we use mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi to interrogate regulation of parasite virulence. Analysis of SBP P. c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn modifies the elicited mammalian immune response, which in turn attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.

    Funded by: Medical Research Council: MC_U117584248, U.1175.02.004.00004(60507), U117584248; Wellcome Trust: 085775, 089553, 098051

    Nature 2013;498;7453;228-31

  • The genomes of four tapeworm species reveal adaptations to parasitism.

    Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PW, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Taenia solium Genome Consortium, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Tapeworms (Cestoda) cause neglected diseases that can be fatal and are difficult to treat, owing to inefficient drugs. Here we present an analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115- to 141-megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have specialized detoxification pathways, metabolism that is finely tuned to rely on nutrients scavenged from their hosts, and species-specific expansions of non-canonical heat shock proteins and families of known antigens. We identify new potential drug targets, including some on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.

    Funded by: Biotechnology and Biological Sciences Research Council: BBG0038151; Canadian Institutes of Health Research: MOP#84556; FIC NIH HHS: TW008588; Wellcome Trust: 085775, 098051

    Nature 2013;496;7443;57-63

  • Genes involved in host-parasite interactions can be revealed by their correlated expression.

    Reid AJ and Berriman M

    Parasite genomics group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ar11@sanger.ac.uk

    Molecular interactions between a parasite and its host are key to the ability of the parasite to enter the host and persist. Our understanding of the genes and proteins involved in these interactions is limited. To better understand these processes it would be advantageous to have a range of methods to predict pairs of genes involved in such interactions. Correlated gene expression profiles can be used to identify molecular interactions within a species. Here we have extended the concept to different species, showing that genes with correlated expression are more likely to encode proteins, which directly or indirectly participate in host-parasite interaction. We go on to examine our predictions of molecular interactions between the malaria parasite and both its mammalian host and insect vector. Our approach could be applied to study any interaction between species, for example, between a host and its parasites or pathogens, but also symbiotic and commensal pairings.

    Funded by: Wellcome Trust: 098051

    Nucleic acids research 2013;41;3;1508-18

  • The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery.

    Laing R, Kikuchi T, Martinelli A, Tsai IJ, Beech RN, Redman E, Holroyd N, Bartley DJ, Beasley H, Britton C, Curran D, Devaney E, Gilabert A, Hunt M, Jackson F, Johnston SL, Kryukov I, Li K, Morrison AA, Reid AJ, Sargison N, Saunders GI, Wasmuth JD, Wolstenholme A, Berriman M, Gilleard JS and Cotton JA

    Background: The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most economically important parasites of livestock and humans.

    Results: Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug target families, providing the most comprehensive understanding yet of the action of several important anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut that provide a rich source of vaccine and drug target candidates.

    Conclusions: The H. contortus genome and transcriptome provide an essential platform for postgenomic research in this and other important strongylid parasites.

    Genome biology 2013;14;8;R88

  • Comparative study of transcriptome profiles of mechanical- and skin-transformed Schistosoma mansoni schistosomula.

    Protasio AV, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosome infection begins with the penetration of cercariae through healthy unbroken host skin. This process leads to the transformation of the free-living larvae into obligate parasites called schistosomula. This irreversible transformation, which occurs in as little as two hours, involves casting the cercaria tail and complete remodelling of the surface membrane. At this stage, parasites are vulnerable to host immune attack and oxidative stress. Consequently, the mechanisms by which the parasite recognises and swiftly adapts to the human host are still the subject of many studies, especially in the context of development of intervention strategies against schistosomiasis infection. Because obtaining enough material from in vivo infections is not always feasible for such studies, the transformation process is often mimicked in the laboratory by application of shear pressure to a cercarial sample resulting in mechanically transformed (MT) schistosomula. These parasites share remarkable morphological and biochemical similarity to the naturally transformed counterparts and have been considered a good proxy for parasites undergoing natural infection. Relying on this equivalency, MT schistosomula have been used almost exclusively in high-throughput studies of gene expression, identification of drug targets and identification of effective drugs against schistosomes. However, the transcriptional equivalency between skin-transformed (ST) and MT schistosomula has never been proven. In our approach to compare these two types of schistosomula preparations and to explore differences in gene expression triggered by the presence of a skin barrier, we performed RNA-seq transcriptome profiling of ST and MT schistosomula at 24 hours post transformation. We report that these two very distinct schistosomula preparations differ only in the expression of 38 genes (out of ∼11,000), providing convincing evidence to resolve the skin vs. mechanical long-lasting controversy.

    Funded by: Wellcome Trust: WT 083931/Z/07/Z, WT 098051

    PLoS neglected tropical diseases 2013;7;3;e2091

  • REAPR: a universal tool for genome assembly evaluation.

    Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M and Otto TD

    Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.

    Genome biology 2013;14;5;R47

2012 Publications

  • Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

    Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, O'Brien J, Djimde A, Doumbo O, Zongo I, Ouedraogo JB, Michon P, Mueller I, Siba P, Nzila A, Borrmann S, Kiara SM, Marsh K, Jiang H, Su XZ, Amaratunga C, Fairhurst R, Socheat D, Nosten F, Imwong M, White NJ, Sanders M, Anastasi E, Alcock D, Drury E, Oyola S, Quail MA, Turner DJ, Ruano-Rubio V, Jyothi D, Amenga-Etego L, Hubbart C, Jeffreys A, Rowlands K, Sutherland C, Roper C, Mangano V, Modiano D, Tan JC, Ferdig MT, Amambua-Ngwa A, Conway DJ, Takala-Harrison S, Plowe CV, Rayner JC, Rockett KA, Clark TG, Newbold CI, Berriman M, MacInnis B and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 075491/Z/04, 077012/Z/05/Z, 082370, 089275, 090532, 090532/Z/09/Z, 090770, 090770/Z/09/Z, 092654, 098051

    Nature 2012;487;7407;375-9

  • A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

    Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M and Otto TD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.

    Funded by: Wellcome Trust: 098051

    Nature protocols 2012;7;7;1260-84

  • A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni.

    Protasio AV, Tsai IJ, Babbage A, Nichol S, Hunt M, Aslett MA, De Silva N, Velarde GS, Anderson TJ, Clark RC, Davidson C, Dillon GP, Holroyd NE, LoVerde PT, Lloyd C, McQuillan J, Oliveira G, Otto TD, Parker-Manuel SJ, Quail MA, Wilson RA, Zerlotini A, Dunne DW and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

    Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.

    Funded by: FIC NIH HHS: TW007012; PHS HHS: HHSN272201000009I; Wellcome Trust: 085775/Z/08/Z

    PLoS neglected tropical diseases 2012;6;1;e1455

  • Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy.

    Reid AJ, Vermont SJ, Cotton JA, Harris D, Hill-Cawthorne GA, Könen-Waisman S, Latham SM, Mourier T, Norton R, Quail MA, Sanders M, Shanmugam D, Sohal A, Wasmuth JD, Brunk B, Grigg ME, Howard JC, Parkinson J, Roos DS, Trees AJ, Berriman M, Pain A and Wastling JM

    Wellcome Trust Sanger Institute, Hinxton, Cambridgshire, United Kingdom.

    Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/B/08493; Canadian Institutes of Health Research; Wellcome Trust: 085775/Z/08/Z

    PLoS pathogens 2012;8;3;e1002567

  • Germline transgenesis and insertional mutagenesis in Schistosoma mansoni mediated by murine leukemia virus.

    Rinaldi G, Eckert SE, Tsai IJ, Suttiprapa S, Kines KJ, Tort JF, Mann VH, Turner DJ, Berriman M and Brindley PJ

    Department of Microbiology, Immunology & Tropical Medicine, School of Medicine & Health Sciences, The George Washington University, Washington, DC, United States of America.

    Functional studies will facilitate characterization of role and essentiality of newly available genome sequences of the human schistosomes, Schistosoma mansoni, S. japonicum and S. haematobium. To develop transgenesis as a functional approach for these pathogens, we previously demonstrated that pseudotyped murine leukemia virus (MLV) can transduce schistosomes leading to chromosomal integration of reporter transgenes and short hairpin RNA cassettes. Here we investigated vertical transmission of transgenes through the developmental cycle of S. mansoni after introducing transgenes into eggs. Although MLV infection of schistosome eggs from mouse livers was efficient in terms of snail infectivity, >10-fold higher transgene copy numbers were detected in cercariae derived from in vitro laid eggs (IVLE). After infecting snails with miracidia from eggs transduced by MLV, sequencing of genomic DNA from cercariae released from the snails also revealed the presence of transgenes, demonstrating that transgenes had been transmitted through the asexual developmental cycle, and thereby confirming germline transgenesis. High-throughput sequencing of genomic DNA from schistosome populations exposed to MLV mapped widespread and random insertion of transgenes throughout the genome, along each of the autosomes and sex chromosomes, validating the utility of this approach for insertional mutagenesis. In addition, the germline-transmitted transgene encoding neomycin phosphotransferase rescued cultured schistosomules from toxicity of the antibiotic G418, and PCR analysis of eggs resulting from sexual reproduction of the transgenic worms in mice confirmed that retroviral transgenes were transmitted to the next (F1) generation. These findings provide the first description of wide-scale, random insertional mutagenesis of chromosomes and of germline transmission of a transgene in schistosomes. Transgenic lines of schistosomes expressing antibiotic resistance could advance functional genomics for these significant human pathogens. DATABASE ACCESSION: Sequence data from this study have been submitted to the European Nucleotide Archive (http://www.ebi.ac.uk/embl) under accession number ERP000379.

    Funded by: NIAID NIH HHS: R01AI072773; PHS HHS: HHSN272201000005I; Wellcome Trust: 098051

    PLoS pathogens 2012;8;7;e1002820

2011 Publications

  • Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.

    Downing T, Imamura H, Decuypere S, Clark TG, Coombs GH, Cotton JA, Hilley JD, de Doncker S, Maes I, Mottram JC, Quail MA, Rijal S, Sanders M, Schönian G, Stark O, Sundar S, Vanaerschot M, Hertz-Fowler C, Dujardin JC and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.

    Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.

    Funded by: Wellcome Trust: 076355, 085775/Z/08/Z

    Genome research 2011;21;12;2143-56

  • RATT: Rapid Annotation Transfer Tool.

    Otto TD, Dillon GP, Degrave WS and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Nucleic acids research 2011;39;9;e57

2010 Publications

  • Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    Otto TD, Sanders M, Berriman M and Newbold C

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy.

    Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications.

    Availability: The software is available at http://icorn.sourceforge.net

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2010;26;14;1704-7

  • New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, Lemieux J, Barrell B, Pain A, Berriman M, Newbold C and Llinás M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

    Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z

    Molecular microbiology 2010;76;1;12-24

  • Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.

    Tsai IJ, Otto TD and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. jit@sanger.ac.uk

    Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Genome biology 2010;11;4;R41

2009 Publications

  • Plasmodium falciparum var gene expression is modified by host immunity.

    Warimwe GM, Keane TM, Fegan G, Musyoki JN, Newton CR, Pain A, Berriman M, Marsh K and Bull PC

    Kenya Medical Research Institute-Wellcome Trust Research Programme, 80108 Kilifi, Kenya.

    Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) is a potentially important family of immune targets, which play a central role in the host-parasite interaction by binding to various host molecules. They are encoded by a diverse family of genes called var, of which there are approximately 60 copies in each parasite genome. In sub-Saharan Africa, although P. falciparum infection occurs throughout life, severe malarial disease tends to occur only in childhood. This could potentially be explained if (i) PfEMP1 variants differ in their capacity to support pathogenesis of severe malaria and (ii) this capacity is linked to the likelihood of each molecule being recognized and cleared by naturally acquired antibodies. Here, in a study of 217 Kenyan children with malaria, we show that expression of a group of var genes "cys2," containing a distinct pattern of cysteine residues, is associated with low host immunity. Expression of cys2 genes was associated with parasites from young children, those with severe malaria, and those with a poorly developed antibody response to parasite-infected erythrocyte surface antigens. Cys-2 var genes form a minor component of all genomic var repertoires analyzed to date. Therefore, the results are compatible with the hypothesis that the genomic var gene repertoire is organized such that PfEMP1 molecules that confer the most virulence to the parasite tend also to be those that are most susceptible to the development of host immunity. This may help the parasite to adapt effectively to the development of host antibodies through modification of the host-parasite relationship.

    Funded by: Wellcome Trust: 076030, 077092, 084535

    Proceedings of the National Academy of Sciences of the United States of America 2009;106;51;21801-6

  • Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans.

    Jackson AP, Gamble JA, Yeomans T, Moran GP, Saunders D, Harris D, Aslett M, Barrell JF, Butler G, Citiulo F, Coleman DC, de Groot PW, Goodwin TJ, Quail MA, McQuillan J, Munro CA, Pain A, Poulter RT, Rajandream MA, Renauld H, Spiering MJ, Tivey A, Gow NA, Barrell B, Sullivan DJ and Berriman M

    Pathogen Genomics Group, Wellcome Trust Sanger Institute, Cambridge, United Kingdom. aj4@sanger.ac.uk

    Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.

    Funded by: Medical Research Council: G0400284; Wellcome Trust: WT085775/Z/08/Z

    Genome research 2009;19;12;2231-44

  • The genome of the blood fluke Schistosoma mansoni.

    Berriman M, Haas BJ, LoVerde PT, Wilson RA, Dillon GP, Cerqueira GC, Mashiyama ST, Al-Lazikani B, Andrade LF, Ashton PD, Aslett MA, Bartholomeu DC, Blandin G, Caffrey CR, Coghlan A, Coulson R, Day TA, Delcher A, DeMarco R, Djikeng A, Eyre T, Gamble JA, Ghedin E, Gu Y, Hertz-Fowler C, Hirai H, Hirai Y, Houston R, Ivens A, Johnston DA, Lacerda D, Macedo CD, McVeigh P, Ning Z, Oliveira G, Overington JP, Parkhill J, Pertea M, Pierce RJ, Protasio AV, Quail MA, Rajandream MA, Rogers J, Sajid M, Salzberg SL, Stanke M, Tivey AR, White O, Williams DL, Wortman J, Wu W, Zamanian M, Zerlotini A, Fraser-Liggett CM, Barrell BG and El-Sayed NM

    Wellcome Trust Sanger Institute, Cambridge CB10 1SD, UK. mb4@sanger.ac.uk

    Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

    Funded by: FIC NIH HHS: 5D43TW006580, 5D43TW007012-03; NIAID NIH HHS: AI054711-01A2, AI48828, U01 AI048828-01, U01 AI048828-02; NIGMS NIH HHS: R01 GM083873-07, R01 GM083873-08; NLM NIH HHS: R01 LM006845-08, R01 LM006845-09; Wellcome Trust: 086151, WT085775/Z/08/Z

    Nature 2009;460;7253;352-8

2008 Publications

  • Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

    Carver T, Berriman M, Tivey A, Patel C, Böhme U, Barrell BG, Parkhill J and Rajandream MA

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    Motivation: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences.

    Results: Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text.

    Availability: Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/

    Funded by: Wellcome Trust: 082372

    Bioinformatics (Oxford, England) 2008;24;23;2672-6

  • Genetics of mating and sex determination in the parasitic nematode Haemonchus contortus.

    Redman E, Grillo V, Saunders G, Packard E, Jackson F, Berriman M and Gilleard JS

    The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Genetic analysis of parasitic nematodes has been a neglected area of research and the basic genetics of this important group of pathogens are poorly understood. Haemonchus contortus is one of the most economically significant livestock parasites worldwide and is a key experimental model for the strongylid nematode group that includes many important human and animal pathogens. We have undertaken a study of the genetics and the mode of mating of this parasite using microsatellite markers. Inheritance studies with autosomal markers demonstrated obligate dioecious sexual reproduction and polyandrous mating that are reported here for the first time in a parasitic helminth and provide the parasite with a mechanism of increasing genetic diversity. The karyotype of the H. contortus, MHco3(ISE) isolate was determined as 2n = 11 or 12. We have developed a panel of microsatellite markers that are tightly linked on the X chromosome and have used them to determine the sex chromosomal karyotype as XO male and XX female. Haplotype analysis using the X-chromosomal markers also demonstrated polyandry, independent of the autosomal marker analysis, and enabled a more direct estimate of the number of male parental genotypes contributing to each brood. This work provides a basis for future forward genetic analysis on H. contortus and related parasitic nematodes.

    Funded by: Wellcome Trust: 067811/Z/02/Z

    Genetics 2008;180;4;1877-87

  • Genomic-scale prioritization of drug targets: the TDR Targets database.

    Agüero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, Carmona S, Carruthers IM, Chan AW, Chen F, Crowther GJ, Doyle MA, Hertz-Fowler C, Hopkins AL, McAllister G, Nwaka S, Overington JP, Pain A, Paolini GV, Pieper U, Ralph SA, Riechers A, Roos DS, Sali A, Shanmugam D, Suzuki T, Van Voorhis WC and Verlinde CL

    Instituto de Investigaciones Biotecnológicas, Universidad Nacional de General San Martín, San Martín 1650, Buenos Aires, Argentina. fernan@unsam.edu.ar

    The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens.

    Funded by: NIGMS NIH HHS: R01 GM054762-14

    Nature reviews. Drug discovery 2008;7;11;900-7

  • The genome of the simian and human malaria parasite Plasmodium knowlesi.

    Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, Mourier T, Mistry J, Pasini EM, Aslett MA, Balasubrammaniam S, Borgwardt K, Brooks K, Carret C, Carver TJ, Cherevach I, Chillingworth T, Clark TG, Galinski MR, Hall N, Harper D, Harris D, Hauser H, Ivens A, Janssen CS, Keane T, Larke N, Lapp S, Marti M, Moule S, Meyer IM, Ormond D, Peters N, Sanders M, Sanders S, Sargeant TJ, Simmonds M, Smith F, Squares R, Thurston S, Tivey AR, Walker D, White B, Zuiderwijk E, Churcher C, Quail MA, Cowman AF, Turner CM, Rajandream MA, Kocken CH, Thomas AW, Newbold CI, Barrell BG and Berriman M

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ap2@sanger.ac.uk

    Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.

    Funded by: Wellcome Trust: 085775

    Nature 2008;455;7214;799-803

  • Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.

    Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, Ivens A, Newbold C and Pain A

    Ancient DNA and Evolution Group, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark.

    We undertook a genome-wide search for novel noncoding RNAs (ncRNA) in the malaria parasite Plasmodium falciparum. We used the RNAz program to predict structures in the noncoding regions of the P. falciparum 3D7 genome that were conserved with at least one of seven other Plasmodium spp. genome sequences. By using Northern blot analysis for 76 high-scoring predictions and microarray analysis for the majority of candidates, we have verified the expression of 33 novel ncRNA transcripts including four members of a ncRNA family in the asexual blood stage. These transcripts represent novel structured ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related chimp malaria parasite P. reichenowi. The high confirmation rate within a single parasite life cycle stage suggests that many more of the predictions may be expressed in other stages of the organism's life cycle.

    Funded by: Wellcome Trust

    Genome research 2008;18;2;281-92

  • Insights into the genome sequence of a free-living Kinetoplastid: Bodo saltans (Kinetoplastida: Euglenozoa).

    Jackson AP, Quail MA and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK. aj4@sanger.ac.uk

    Background: Bodo saltans is a free-living kinetoplastid and among the closest relatives of the trypanosomatid parasites, which cause such human diseases as African sleeping sickness, leishmaniasis and Chagas disease. A B. saltans genome sequence will provide a free-living comparison with parasitic genomes necessary for comparative analyses of existing and future trypanosomatid genomic resources. Various coding regions were sequenced to provide a preliminary insight into the bodonid genome sequence, relative to trypanosomatid sequences.

    Results: 0.4 Mbp of B. saltans genome was sequenced from 12 distinct regions and contained 178 coding sequences. As in trypanosomatids, introns were absent and %GC was elevated in coding regions, greatly assisting in gene finding. In the regions studied, roughly 60% of all genes had homologs in trypanosomatids, while 28% were Bodo-specific. Intergenic sequences were typically short, resulting in higher gene density than in trypanosomatids. Although synteny was typically conserved for those genes with trypanosomatid homologs, strict colinearity was rarely observed because gene order was regularly disrupted by Bodo-specific genes.

    Conclusion: The B. saltans genome contains both sequences homologous to trypanosomatids and sequences never seen before. Structural similarities suggest that its assembly should be solvable, and, although de novo assembly will be necessary, existing trypanosomatid projects will provide some guide to annotation. A complete genome sequence will provide an effective ancestral model for understanding the shared and derived features of known trypanosomatid genomes, but it will also identify those kinetoplastid genome features lost during the evolution of parasitism.

    BMC genomics 2008;9;594

  • Telomeric expression sites are highly conserved in Trypanosoma brucei.

    Hertz-Fowler C, Figueiredo LM, Quail MA, Becker M, Jackson A, Bason N, Brooks K, Churcher C, Fahkro S, Goodhead I, Heath P, Kartvelishvili M, Mungall K, Harris D, Hauser H, Sanders M, Saunders D, Seeger K, Sharp S, Taylor JE, Walker D, White B, Young R, Cross GA, Rudenko G, Barry JD, Louis EJ and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. chf@sanger.ac.uk

    Subtelomeric regions are often under-represented in genome sequences of eukaryotes. One of the best known examples of the use of telomere proximity for adaptive purposes are the bloodstream expression sites (BESs) of the African trypanosome Trypanosoma brucei. To enhance our understanding of BES structure and function in host adaptation and immune evasion, the BES repertoire from the Lister 427 strain of T. brucei were independently tagged and sequenced. BESs are polymorphic in size and structure but reveal a surprisingly conserved architecture in the context of extensive recombination. Very small BESs do exist and many functioning BESs do not contain the full complement of expression site associated genes (ESAGs). The consequences of duplicated or missing ESAGs, including ESAG9, a newly named ESAG12, and additional variant surface glycoprotein genes (VSGs) were evaluated by functional assays after BESs were tagged with a drug-resistance gene. Phylogenetic analysis of constituent ESAG families suggests that BESs are sequence mosaics and that extensive recombination has shaped the evolution of the BES repertoire. This work opens important perspectives in understanding the molecular mechanisms of antigenic variation, a widely used strategy for immune evasion in pathogens, and telomere biology.

    Funded by: NIAID NIH HHS: R01AI021729; Wellcome Trust: 095161

    PloS one 2008;3;10;e3527

2007 Publications

  • Comparative genomic analysis of three Leishmania species that cause diverse human disease.

    Peacock CS, Seeger K, Harris D, Murphy L, Ruiz JC, Quail MA, Peters N, Adlem E, Tivey A, Aslett M, Kerhornou A, Ivens A, Fraser A, Rajandream MA, Carver T, Norbertczak H, Chillingworth T, Hance Z, Jagels K, Moule S, Ormond D, Rutter S, Squares R, Whitehead S, Rabbinowitsch E, Arrowsmith C, White B, Thurston S, Bringaud F, Baldauf SL, Faulconbridge A, Jeffares D, Depledge DP, Oyola SO, Hilley JD, Brito LO, Tosi LR, Barrell B, Cruz AK, Mottram JC, Smith DF and Berriman M

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. csp@sanger.ac.uk

    Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.

    Funded by: Medical Research Council: G0000508; Wellcome Trust: 076355, 085775

    Nature genetics 2007;39;7;839-47

  • Genome variation and evolution of the malaria parasite Plasmodium falciparum.

    Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann AC, Kyes S, Krishna S, Newbold C, Dermitzakis ET and Berriman M

    Informatics Division, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Hinxton, UK.

    Infections with the malaria parasite Plasmodium falciparum result in more than 1 million deaths each year worldwide. Deciphering the evolutionary history and genetic variation of P. falciparum is critical for understanding the evolution of drug resistance, identifying potential vaccine candidates and appreciating the effect of parasite variation on prevalence and severity of malaria in humans. Most studies of natural variation in P. falciparum have been either in depth over small genomic regions (up to the size of a small chromosome) or genome wide but only at low resolution. In an effort to complement these studies with genome-wide data, we undertook shotgun sequencing of a Ghanaian clinical isolate (with fivefold coverage), the IT laboratory isolate (with onefold coverage) and the chimpanzee parasite P. reichenowi (with twofold coverage). We compared these sequences with the fully sequenced P. falciparum 3D7 isolate genome. We describe the most salient features of P. falciparum polymorphism and adaptive evolution with relation to gene function, transcript and protein expression and cellular localization. This analysis uncovers the primary evolutionary changes that have occurred since the P. falciparum-P. reichenowi speciation and changes that are occurring within P. falciparum.

    Funded by: Wellcome Trust: 077046, 079643

    Nature genetics 2007;39;1;120-5

2006 Publications

  • Common inheritance of chromosome Ia associated with clonal expansion of Toxoplasma gondii.

    Khan A, Böhme U, Kelly KA, Adlem E, Brooks K, Simmonds M, Mungall K, Quail MA, Arrowsmith C, Chillingworth T, Churcher C, Harris D, Collins M, Fosker N, Fraser A, Hance Z, Jagels K, Moule S, Murphy L, O'Neil S, Rajandream MA, Saunders D, Seeger K, Whitehead S, Mayr T, Xuan X, Watanabe J, Suzuki Y, Wakaguri H, Sugano S, Sugimoto C, Paulsen I, Mackey AJ, Roos DS, Hall N, Berriman M, Barrell B, Sibley LD and Ajioka JW

    Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.

    Toxoplasma gondii is a globally distributed protozoan parasite that can infect virtually all warm-blooded animals and humans. Despite the existence of a sexual phase in the life cycle, T. gondii has an unusual population structure dominated by three clonal lineages that predominate in North America and Europe, (Types I, II, and III). These lineages were founded by common ancestors approximately10,000 yr ago. The recent origin and widespread distribution of the clonal lineages is attributed to the circumvention of the sexual cycle by a new mode of transmission-asexual transmission between intermediate hosts. Asexual transmission appears to be multigenic and although the specific genes mediating this trait are unknown, it is predicted that all members of the clonal lineages should share the same alleles. Genetic mapping studies suggested that chromosome Ia was unusually monomorphic compared with the rest of the genome. To investigate this further, we sequenced chromosome Ia and chromosome Ib in the Type I strain, RH, and the Type II strain, ME49. Comparative genome analyses of the two chromosomal sequences revealed that the same copy of chromosome Ia was inherited in each lineage, whereas chromosome Ib maintained the same high frequency of between-strain polymorphism as the rest of the genome. Sampling of chromosome Ia sequence in seven additional representative strains from the three clonal lineages supports a monomorphic inheritance, which is unique within the genome. Taken together, our observations implicate a specific combination of alleles on chromosome Ia in the recent origin and widespread success of the clonal lineages of T. gondii.

    Funded by: NIAID NIH HHS: R01 AI059176; Wellcome Trust

    Genome research 2006;16;9;1119-25

  • Just one cross appears capable of dramatically altering the population biology of a eukaryotic pathogen like Toxoplasma gondii.

    Boyle JP, Rajasekar B, Saeij JP, Ajioka JW, Berriman M, Paulsen I, Roos DS, Sibley LD, White MW and Boothroyd JC

    Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.

    Toxoplasma gondii, an obligate intracellular protozoan of the phylum Apicomplexa, is estimated to infect over a billion people worldwide as well as a great many other mammalian and avian hosts. Despite this ubiquity, the vast majority of human infections in Europe and North America are thought to be due to only three genotypes. Using a genome-wide analysis of single-nucleotide polymorphisms, we have constructed a genealogy for these three lines. The data indicate that types I and III are second- and first-generation offspring, respectively, of a cross between a type II strain and one of two ancestral strains. An extant T. gondii strain (P89) appears to be the modern descendant of the non-type II parent of type III, making the full genealogy of the type III clonotype known. The simplicity of this family tree demonstrates that even a single cross can lead to the emergence and dominance of a new clonal genotype that completely alters the population biology of a sexual pathogen.

    Funded by: NIAID NIH HHS: AI045806, AI05093, AI21423, AI41014, F32AI60306, R01 AI036629

    Proceedings of the National Academy of Sciences of the United States of America 2006;103;27;10514-9

2005 Publications

  • The genome of the African trypanosome Trypanosoma brucei.

    Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu DC, Lennard NJ, Caler E, Hamlin NE, Haas B, Böhme U, Hannick L, Aslett MA, Shallom J, Marcello L, Hou L, Wickstead B, Alsmark UC, Arrowsmith C, Atkin RJ, Barron AJ, Bringaud F, Brooks K, Carrington M, Cherevach I, Chillingworth TJ, Churcher C, Clark LN, Corton CH, Cronin A, Davies RM, Doggett J, Djikeng A, Feldblyum T, Field MC, Fraser A, Goodhead I, Hance Z, Harper D, Harris BR, Hauser H, Hostetler J, Ivens A, Jagels K, Johnson D, Johnson J, Jones K, Kerhornou AX, Koo H, Larke N, Landfear S, Larkin C, Leech V, Line A, Lord A, Macleod A, Mooney PJ, Moule S, Martin DM, Morgan GW, Mungall K, Norbertczak H, Ormond D, Pai G, Peacock CS, Peterson J, Quail MA, Rabbinowitsch E, Rajandream MA, Reitter C, Salzberg SL, Sanders M, Schobel S, Sharp S, Simmonds M, Simpson AJ, Tallon L, Turner CM, Tait A, Tivey AR, Van Aken S, Walker D, Wanless D, Wang S, White B, White O, Whitehead S, Woodward J, Wortman J, Adams MD, Embley TM, Gull K, Ullu E, Barry JD, Fairlamb AH, Opperdoes F, Barrell BG, Donelson JE, Hall N, Fraser CM, Melville SE and El-Sayed NM

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. mb4@sanger.ac.uk

    African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

    Funded by: NIAID NIH HHS: AI43062; Wellcome Trust

    Science (New York, N.Y.) 2005;309;5733;416-22

Publications before 2005

  • Genome sequence of the human malaria parasite Plasmodium falciparum.

    Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM and Barrell B

    The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. gardner@tigr.org

    The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

    Funded by: NIAID NIH HHS: R01 AI028398-23; Wellcome Trust: 061524

    Nature 2002;419;6906;498-511

  • The architecture of variant surface glycoprotein gene expression sites in Trypanosoma brucei.

    Berriman M, Hall N, Sheader K, Bringaud F, Tiwari B, Isobe T, Bowman S, Corton C, Clark L, Cross GA, Hoek M, Zanders T, Berberof M, Borst P and Rudenko G

    The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Trypanosoma brucei evades the immune system by switching between Variant Surface Glycoprotein (VSG) genes. The active VSG gene is transcribed in one of approximately 20 telomeric expression sites (ESs). It has been postulated that ES polymorphism plays a role in host adaptation. To gain more insight into ES architecture, we have determined the complete sequence of Bacterial Artificial Chromosomes (BACs) containing DNA from three ESs and their flanking regions. There was variation in the order and number of ES-associated genes (ESAGs). ESAGs 6 and 7, encoding transferrin receptor subunits, are the only ESAGs with functional copies in every ES that has been sequenced until now. A BAC clone containing the VO2 ES sequences comprised approximately half of a 330 kb 'intermediate' chromosome. The extensive similarity between this intermediate chromosome and the left telomere of T. brucei 927 chromosome I, suggests that this previously uncharacterised intermediate size class of chromosomes could have arisen from breakage of megabase chromosomes. Unexpected conservation of sequences, including pseudogenes, indicates that the multiple ESs could have arisen through a relatively recent amplification of a single ES.

    Funded by: NIAID NIH HHS: AI21729; Wellcome Trust: 095161

    Molecular and biochemical parasitology 2002;122;2;131-40

* quick link - http://q.sanger.ac.uk/cvbo7qsy