Malaria programme: Kwiatkowski group

The Malaria programme uses genomic and genetic approaches to discover molecular mechanisms of host-parasite interactions that may lead to new biological insights and improved strategies for disease prevention.

Within this programme, the Kwiatkowski group is investigating biological consequences of natural variation in the human and plasmodium genomes.

More information on the Malaria Programme.

[Susana Campino, Genome Research Limited]

Background

At the core of our research is the question: why, in areas where people are repeatedly infected with malaria, do some become gravely ill, while others show no signs of disease at all?

Underpinning this paradox are several important and fascinating issues in biology, evolution, and medicine: what differences in individuals' innate immune system confer resistance or susceptibility to malaria? Do genetic differences make some Plasmodium populations more virulent in nature? And can we detect changes in the parasite genome that confer resistance to anti-malarial drugs?

The Kwiatkowski group is combining large-scale epidemiological studies with high-throughput analysis of genome variation to systematically search the human and Plasmodium genomes for novel alleles that affect disease progression. Our goal is to use natural genomic diversity to discover molecular mechanisms of phenotypes such as protective immunity in the host or drug resistance in the pathogen. These insights will be critical for the control of antimalarial drug resitance, the development of new drugs, and ideally, an effective vaccine against this disease.

Our multidisciplinary team is divided between the Sanger Institute and the Wellcome Trust Centre for Human Genetics at the University of Oxford, and includes expertise in malaria biology, epidemiology, statistics, informatics, ethics, and programme management. At the Sanger Institute we are leveraging the cutting edge genome technologies and expertise of the Institute, while building unique informatics and experimental tools to analyse and share the vast data generated, to understand how natural genetic variation impacts malaria disease.

Research

Child with severe cerebral malaria.

Child with severe cerebral malaria.

Human genetic resistance to malaria

Currently we have two main approaches to understand the impact of genetic variation in host susceptibility to malaria. The first is large-scale genome-wide association studies (GWAS) looking for association of known genetic markers across the human genome with resistance or susceptibility to malaria. We are further examining how signals of genetic association are affected by diversity in African population structure, and identifying regions of the genome under recent positive selection in malaria-endemic populations by accurate haplotype construction using family-based GWA data. The second approach involves deep resequencing of candidate resistance genes to identify new genetic variants that correlate with innate immunity to malaria within and between African populations.

This work involves close collaboration with the Sanger Institute's genotyping teams led by Panos Deloukas and the Medical Re-sequencing team led by Aarno Palotie.

These studies are being performed as part of the MalariaGEN Consortium, a network of scientists in more than 20 countries, many in the most affected regions of the world, who share their clinical samples as well as their expertise about malaria. MalariaGEN is funded by the Grand Challenges in Global Health programme of the Bill and Melinda Gates Foundation. MalariaGEN projects include multi-centre case-control and family-based association studies of severe malaria, as well as large cohort studies examining the natural evolution of infection and immunity. The consortium also has ongoing genetic linkage studies in a number of populations, as well as investigations of ethnic groups that naturally have a high level of resistance to malaria.

Malaria parasite invading a red blood cell.

Malaria parasite invading a red blood cell.

zoom

Biological consequences of natural variation in the Plasmodium falciparum genome

Knowledge of the natural genomic diversity and population genetics of a single species of Plasmodium is crucial for understanding parasite's extraordinary ability to evade the immune system and to develop resistance to anti-malarial drugs.

To date, Plasmodium genome sequencing at the Sanger Institute and elsewhere has focused on laboratory-adapted parasites. We are now developing the experimental, epidemiological, and analytical tools to undertake characterisation of natural genome diversity in Plasmodium falciparum isolates from multiple malaria-endemic regions in Africa and Southeast Asia. Leveraging Solexa/Illumina high-throughput sequencing technology we hope to develop shotgun genotyping as a cost-effective method for genome-wide analysis of natural variation in Plasmodium falciparum.

Understanding the complex population genetic structures that arise under different conditions of malaria transmission will revolutionise malaria biology, serving as the foundation for large-scale epidemiological studies of genotype-phenotype correlation for example for drug resistance or immune evasion and other parasite phenotypes, and informing malaria monitoring and control strategies in the field. Accordingly, we will use natural P. falciparum variation data to inform functional analysis of parasite biology in the laboratory.

This work is being done in close collaboration with Chris Newbold (of Oxford University, and honorary Sanger Faculty) and the Parasite Genomics group lead by Matt Berriman, who head the resequencing and reannotation of the reference Plasmodium falciparum genome, 3D7.

Principal Components Analysis of Affymetrix 500K SNP chip data reveals genetic signatures of Gambian ethnic sub-populations (as indicated by colour).

Principal Components Analysis of Affymetrix 500K SNP chip data reveals genetic signatures of Gambian ethnic sub-populations (as indicated by colour).

zoom

Statistical analysis of genome-wide association and short-read sequence data

The biological framework of our research programme is underpinned by cutting edge statistical and informatics solutions for the analysis, handling, and sharing of large-scale sequence and genotype data.

New high-throughput sequencing and large-scale genotyping technologies drive the need for novel statistical methods for genetic data analysis. This need is further underscored by the complexities of the population genetic structures we study: for example, the rich haplotypic diversity and low linkage disequilibrium (LD) of African populations pose unique challenges for GWA study design and analysis. Likewise, new statistical tools will be required to use short read sequence data to identify with confidence polymorphisms and structural variants, patterns of LD, and differences between Plasmodium populations. This is particularly challenging because of the low LD and AT rich nature of the parasite genome, as well as the presence of multiple parasite genomes in clinical samples.

Our epidemiological studies of human resistance and suscetibility to malaria include case-control and family trio designs, with sample sizes currently exceeding 12,000 and 2,000, respectively. For GWA analysis we have implemented analysis pipelines that convert chip intensities to genotype calls, through to ultimately testing for associations and positive selection, correcting for population artifacts, to discover putative variants for follow-up in the laboratory. This process has involved developing and applying methods to call genotypes (Illuminus), to understand the relationship of population structure and ethnicity within and across study sites, to determine strategies for the selection of tagging SNPs and to determine genotype in populations with low LD.

Screenshot of LookSeq, a browser-based read alignment viewer. LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data.

Screenshot of LookSeq, a browser-based read alignment viewer. LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data.

zoom

Informatics solutions for analysis and sharing of large-scale sequence and genotype data

A major activity of our team is the development of informatic technologies to manage and analyse the remarkable volume of data generated by resequencing and genotyping projects and to establish effective ways to share these complex epidemiological and genetic datasets across the malaria research community.

We have produced an improved algorithm for genotype calling from the Illumina Bead Array platform (Illuminus) and detection of positive selection from haplotype information. We are also developing browser-based software packages for simplified presentation and browsing of linkage disequilibrium along chromosomes (Marker3) and for SNP-discovery and analysis in short-read sequence data (LookSeq).

Our team is a partner in the WorldWide Antimalarial Resistance Network (WARN), a global network of malaria researchers aiming to build a web-based global antimalarial efficacy and resistance database to track resistance to malaria drugs. The proposed database will provide free access to web-based, linked sets of data, as well as tools to help analyse and publish the data.

Understanding the diversity and dynamics of Anopheles populations

A keystone of malaria control is to prevent transmission by the Anopheles vector. Hopes of eventually eliminating malaria rely greatly on this, but the failure of previous efforts to eradicate malaria has taught us that it is not easily accomplished, particularly because of the ability of Anopheles populations to develop resistance to insecticides as their usage increases. New technologies for large-scale sequencing provide unprecedented opportunities to overcome this problem by real-time monitoring of genome variation in Anopheles populations, and using this information to develop early warning systems for the emergence of insecticide resistance, and for other practical applications in vector control.

The Sanger Malaria Programme, working closely with MalariaGEN, has established a community project on Anopheles gambiae Genome Variation, which is addressing a range of questions regarding the Anopheles gambiae species complex. At the same time, this multi-Centre collaboration is building a catalogue of A. gambiae genome variation as a scientific resource for the malaria research community as a whole. A first step in the process has been to sequence samples from the major colonies used by the community to study parasite refractoriness and insecticide resistance.

Martin Donnelly, who works between the Liverpool School of Tropical Medicine and the Sanger Institute, is driving forward many of our Anopheles activities.

Resources

LookSeq

LookSeq is a web-based application for alignment visualisation, browsing and analysis of genome sequence data.

LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualisation of putative single nucleotide and structural variation. The visible range, from whole chromosome to single base resolution, can be set manually or by scrolling or zooming the display with fast, on-the-fly rendering from the server-side alignment database. LookSeq uses a universal database for alignments of different sequencing technologies and algorithms. Sequence data from multiple sources can be viewed separately or aligned in a single display, facilitating direct comparison between datasets. LookSeq can also link to relevant external sites such as PubMed and other online analysis tools, via buttons or double-clicking on the displayed sequence annotation.

LookSeq requires no setup or installation, and is very intuitive to use.

  • LookSeq: a browser-based viewer for deep sequencing data.

    Manske HM and Kwiatkowski DP

    Genome research 2009;19;11;2125-32

Collaborations

MalariaGEN Genomic Epidemiology Network

MalariaGEN brings together research groups with different projects and scientific objectives to work together on large-scale investigations that depend on samples, data and expertise from multiple investigators.

Selected Publications

  • Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data.

    Auburn S, Campino S, Miotto O, Djimde AA, Zongo I, Manske M, Maslen G, Mangano V, Alcock D, MacInnis B, Rockett KA, Clark TG, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    PloS one 2012;7;2;e32891

  • Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    Oyola SO, Otto TD, Gu Y, Maslen G, Manske M, Campino S, Turner DJ, Macinnis B, Kwiatkowski DP, Swerdlow HP and Quail MA

    BMC genomics 2012;13;1

  • Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

    Magnus Manske, Olivo Miotto et al

    Nature 2012

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    PloS one 2011;6;7;e22213

  • Ethical issues in human genomics research in developing countries.

    de Vries J, Bull SJ, Doumbo O, Ibrahim M, Mercereau-Puijalon O, Kwiatkowski D and Parker M

    BMC medical ethics 2011;12;5

  • Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.

    Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG and Kwiatkowski DP

    PloS one 2011;6;6;e20251

  • Methodological challenges of genome-wide association analysis in Africa.

    Teo YY, Small KS and Kwiatkowski DP

    Nature reviews. Genetics 2010;11;2;149-60

  • Ethical data release in genome-wide association studies in developing countries.

    Parker M, Bull SJ, de Vries J, Agbenyega T, Doumbo OK and Kwiatkowski DP

    PLoS medicine 2009;6;11;e1000143

  • LookSeq: a browser-based viewer for deep sequencing data.

    Manske HM and Kwiatkowski DP

    Genome research 2009;19;11;2125-32

  • SNP-o-matic.

    Manske HM and Kwiatkowski DP

    Bioinformatics (Oxford, England) 2009;25;18;2434-5

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    Nature genetics 2009;41;6;657-65

  • Tumor necrosis factor and lymphotoxin-alpha polymorphisms and severe malaria in African populations.

    Clark TG, Diakite M, Auburn S, Campino S, Fry AE, Green A, Richardson A, Small K, Teo YY, Wilson J, Jallow M, Sisay-Joof F, Pinder M, Griffiths MJ, Peshu N, Williams TN, Marsh K, Molyneux ME, Taylor TE, Rockett KA and Kwiatkowski DP

    The Journal of infectious diseases 2009;199;4;569-75

  • TLR9 polymorphisms in African populations: no association with severe malaria, but evidence of cis-variants acting on gene expression.

    Campino S, Forton J, Auburn S, Fry A, Diakite M, Richardson A, Hull J, Jallow M, Sisay-Joof F, Pinder M, Molyneux ME, Taylor TE, Rockett K, Clark TG and Kwiatkowski DP

    Malaria journal 2009;8;44

  • A global network for investigating the genomic epidemiology of malaria.

    Malaria Genomic Epidemiology Network

    Nature 2008;456;7223;732-7

  • Host genetic factors in resistance and susceptibility to malaria.

    Kwiatkowski DP and Luoni G

    Parassitologia 2006;48;4;450-67

  • Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration.

    Chokshi DA, Parker M and Kwiatkowski DP

    Bulletin of the World Health Organization 2006;84;5;382-7

  • How malaria has affected the human genome and what human genetics can teach us about malaria.

    Kwiatkowski DP

    American journal of human genetics 2005;77;2;171-92

Team

Team members

Susana Campino
Senior Postdoctoral Scientist
Antoine Claessens
unknown
Olivia Cook
unknown
Eleanor Drury
Advanced Research Assistant
Jacob Almagro Garcia
unknown
Will Hamilton
Research Associate
Katja Kivinen
unknown
Bronwyn MacInnis
Senior Scientific Programme Manager
Cinzia Malangone
Senior Software Developer
Magnus Manske
Head of Informatics
Daniel Mead
Advanced Research Assistant

Susana Campino

- Senior Postdoctoral Scientist

Susana completed her PhD in Medical Biosciences "Genetic Analysis of Murine Malaria" at the University of Umea Sweden, in 2003, after graduating with a First class degree in BSc Biology, Genetics and Microbiology at the Univeristy of Lisbon. She has been awarded several fellowships including post-doctoral fellowships from Marie Curie and the FCT-Science and Technology Foundation, Portugal. She has worked on malaria research since 2000 and been part of key groups such as the Institute Pasteur in Paris and the Department of Immunogenetics at the Gulbenkian Institute of Science, Lisbon.

Susana joined the Sanger Institute Malaria Programme in 2007.

Research

Susana played a key role in initiating the Plasmodium Genome Variation project which aims to describe global genetic diversity in malaria. She is also involved in MalariaGEN and WWARN, developing laboratory protocols as well as coordinating and participating in fieldwork in malaria endemic countries with the aim of improving sample collection.

Susana is currently combining high throughput phenotyping methods and whole genome sequencing data to perform linkage analysis to identify the genes underlying complex P. falciparum traits such as erythrocyte invasion. The identification of molecular steps involved in these traits could lead to targets for new drugs or vaccines.

References

  • Using CF11 cellulose columns to inexpensively and effectively remove human DNA from Plasmodium falciparum-infected whole blood samples.

    Venkatesan M, Amaratunga C, Campino S, Auburn S, Koch O, Lim P, Uk S, Socheat D, Kwiatkowski DP, Fairhurst RM and Plowe CV

    Howard Hughes Medical Institute, University of Maryland School of Medicine, Baltimore, MD, USA.

    Background: Genome and transcriptome studies of Plasmodium nucleic acids obtained from parasitized whole blood are greatly improved by depletion of human DNA or enrichment of parasite DNA prior to next-generation sequencing and microarray hybridization. The most effective method currently used is a two-step procedure to deplete leukocytes: centrifugation using density gradient media followed by filtration through expensive, commercially available columns. This method is not easily implemented in field studies that collect hundreds of samples and simultaneously process samples for multiple laboratory analyses. Inexpensive syringes, hand-packed with CF11 cellulose powder, were recently shown to improve ex vivo cultivation of Plasmodium vivax obtained from parasitized whole blood. This study was undertaken to determine whether CF11 columns could be adapted to isolate Plasmodium falciparum DNA from parasitized whole blood and achieve current quantity and purity requirements for Illumina sequencing.

    Methods: The CF11 procedure was compared with the current two-step standard of leukocyte depletion using parasitized red blood cells cultured in vitro and parasitized blood obtained ex vivo from Cambodian patients with malaria. Procedural variations in centrifugation and column size were tested, along with a range of blood volumes and parasite densities.

    Results: CF11 filtration reliably produces 500 nanograms of DNA with less than 50% human DNA contamination, which is comparable to that obtained by the two-step method and falls within the current quality control requirements for Illumina sequencing. In addition, a centrifuge-free version of the CF11 filtration method to isolate P. falciparum DNA at remote and minimally equipped field sites in malaria-endemic areas was validated.

    Conclusions: CF11 filtration is a cost-effective, scalable, one-step approach to remove human DNA from P. falciparum-infected whole blood samples.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0600718, G19/9; Wellcome Trust: 089275, 090532, 098051

    Malaria journal 2012;11;41

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770

    PloS one 2011;6;7;e22213

  • Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.

    Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom. sc11@sanger.ac.uk

    The diversity in the Plasmodium falciparum genome can be used to explore parasite population dynamics, with practical applications to malaria control. The ability to identify the geographic origin and trace the migratory patterns of parasites with clinically important phenotypes such as drug resistance is particularly relevant. With increasing single-nucleotide polymorphism (SNP) discovery from ongoing Plasmodium genome sequencing projects, a demand for high SNP and sample throughput genotyping platforms for large-scale population genetic studies is required. Low parasitaemias and multiple clone infections present a number of challenges to genotyping P. falciparum. We addressed some of these issues using a custom 384-SNP Illumina GoldenGate assay on P. falciparum DNA from laboratory clones (long-term cultured adapted parasite clones), short-term cultured parasite isolates and clinical (non-cultured isolates) samples from East and West Africa, Southeast Asia and Oceania. Eighty percent of the SNPs (n = 306) produced reliable genotype calls on samples containing as little as 2 ng of total genomic DNA and on whole genome amplified DNA. Analysis of artificial mixtures of laboratory clones demonstrated high genotype calling specificity and moderate sensitivity to call minor frequency alleles. Clear resolution of geographically distinct populations was demonstrated using Principal Components Analysis (PCA), and global patterns of population genetic diversity were consistent with previous reports. These results validate the utility of the platform in performing population genetic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0600718, G19/9; NIAID NIH HHS: R37 AI048071; Wellcome Trust: 090532, 093956

    PloS one 2011;6;6;e20251

  • Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

    Robinson T, Campino SG, Auburn S, Assefa SA, Polley SD, Manske M, MacInnis B, Rockett KA, Maslen GL, Sanders M, Quail MA, Chiodini PL, Kwiatkowski DP, Clark TG and Sutherland CJ

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 077012/Z/05/Z, 090532

    PloS one 2011;6;8;e23204

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062

    Nature genetics 2009;41;6;657-65

  • TLR9 polymorphisms in African populations: no association with severe malaria, but evidence of cis-variants acting on gene expression.

    Campino S, Forton J, Auburn S, Fry A, Diakite M, Richardson A, Hull J, Jallow M, Sisay-Joof F, Pinder M, Molyneux ME, Taylor TE, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. sc11@sanger.ac.uk

    Background: During malaria infection the Toll-like receptor 9 (TLR9) is activated through induction with plasmodium DNA or another malaria motif not yet identified. Although TLR9 activation by malaria parasites is well reported, the implication to the susceptibility to severe malaria is not clear. The aim of this study was to assess the contribution of genetic variation at TLR9 to severe malaria.

    Methods: This study explores the contribution of TLR9 genetic variants to severe malaria using two approaches. First, an association study of four common single nucleotide polymorphisms was performed on both family- and population-based studies from Malawian and Gambian populations (n>6000 individual). Subsequently, it was assessed whether TLR9 expression is affected by cis-acting variants and if these variants could be mapped. For this work, an allele specific expression (ASE) assay on a panel of HapMap cell lines was carried out.

    Results: No convincing association was found with polymorphisms in TLR9 for malaria severity, in either Gambian or Malawian populations, using both case-control and family based study designs. Using an allele specific expression assay it was observed that TLR9 expression is affected by cis-acting variants, these results were replicated in a second experiment using biological replicates.

    Conclusion: By using the largest cohorts analysed to date, as well as a standardized phenotype definition and study design, no association of TLR9 genetic variants with severe malaria was found. This analysis considered all common variants in the region, but it is remains possible that there are rare variants with association signals. This report also shows that TLR9 expression is potentially modulated through cis-regulatory variants, which may lead to differential inflammatory responses to infection between individuals.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust

    Malaria journal 2009;8;44

  • A global network for investigating the genomic epidemiology of malaria.

    Malaria Genomic Epidemiology Network

    The University of Buea, PO Box 63, Buea, South West Province, Cameroon.

    Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consortial approach that brings together researchers from 21 countries.

    Funded by: Medical Research Council: G0200454, G0200454(62635), G0600230, G0600230(77610), G0600718, G19/9; Wellcome Trust: 076934, 077383, 077383/Z/05/Z

    Nature 2008;456;7223;732-7

  • Validating discovered Cis-acting regulatory genetic variants: application of an allele specific expression approach to HapMap populations.

    Campino S, Forton J, Raj S, Mohr B, Auburn S, Fry A, Mangano VD, Vandiedonck C, Richardson A, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. sc11@sanger.ac.uk

    Background: Localising regulatory variants that control gene expression is a challenge for genome research. Several studies have recently identified non-coding polymorphisms associated with inter-individual differences in gene expression. These approaches rely on the identification of signals of association against a background of variation due to other genetic and environmental factors. A complementary approach is to use an Allele-Specific Expression (ASE) assay, which is more robust to the effects of environmental variation and trans-acting genetic factors.

    Here we apply an ASE method which utilises heterozygosity within an individual to compare expression of the two alleles of a gene in a single cell. We used individuals from three HapMap population groups and analysed the allelic expression of genes with cis-regulatory regions previously identified using total gene expression studies. We were able to replicate the results in five of the six genes tested, and refined the cis- associated regions to a small number of variants. We also showed that by using multi-populations it is possible to refine the associated cis-effect DNA regions.

    We discuss the efficacy and drawbacks of both total gene expression and ASE approaches in the discovery of cis-acting variants. We show that the ASE approach has significant advantages as it is a cleaner representation of cis-acting effects. We also discuss the implication of using different populations to map cis-acting regions and the importance of finding regulatory variants which contribute to human phenotypic variation.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust

    PloS one 2008;3;12;e4105

  • Identification of common genetic variation that modulates alternative splicing.

    Hull J, Campino S, Rowlands K, Chan MS, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J and Kwiatkowski D

    University Department of Paediatrics, John Radcliffe Hospital, Oxford, United Kingdom. jeremy.hull@paediatrics.ox.ac.uk

    Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron-exon boundary, although the distance between these SNPs and the intron-exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.

    Funded by: Medical Research Council: G0600230, G19/9; Wellcome Trust: 074318

    PLoS genetics 2007;3;6;e99

  • Unique genetic variation revealed by a microsatellite polymorphism survey in ten wild-derived inbred strains.

    Campino S, Behrschmidt C, Bagot S, Guénet JL, Cazenave PA, Holmberg D and Penha-Gonçalves C

    Instituto Gulbenkian de Ciência, Rua da Quinta Grande, 6, Oeiras, Portugal.

    Here we report on a genome polymorphism survey using 254 microsatellite markers in ten recently wild-derived inbred strains. Allele size analysis showed that the rate of polymorphism of these wild-derived mouse strains when compared with any of the common laboratory strains is on average 79.8%. We found 632 wild-derived alleles that were not present in the common laboratory strains, representing a 61% increase over the genetic variation observed in the laboratory strains. We also found that on average 14.5% of the microsatellite alleles of any given wild-derived inbred strain were unique. Our results indicate that the recently wild-derived mouse strains represent repositories of unique naturally occurring genetic variability and may prove invaluable for the study of complex phenotypes and in the construction of new mouse models of human disease.

    Genomics 2002;79;5;618-20

Antoine Claessens

- unknown

Antoine graduated as a biochemist from the University of Liège (Belgium) in 2004. He then completed an MRes and a PhD at Edinburgh University in the group of Pr Alex Rowe. His research there focused on Plasmodium falciparum cytoadherence and var genes, in collaboration with Dr Zbynek Bozdech in Singapore. After a short contract with Dr Pierre Buffet in Paris to get experience with clinical malaria, he recently joined the Kwiatkowski group.

Research

Antoine started to work with Jason Wendler in Oxford on population genetics of P. falciparum field isolates, with particular focus on drug-resistance and vaccine target gene candidates. He is now also involved in the analysis of malaria parasite mutation rate with Will Hamilton. The starting point is to define the mutation rate in various lab strains using next-generation sequencing.

References

  • A subset of group A-like var genes encodes the malaria parasite ligands for binding to human brain endothelial cells.

    Claessens A, Adams Y, Ghumra A, Lindergard G, Buchan CC, Andisi C, Bull PC, Mok S, Gupta AP, Wang CW, Turner L, Arman M, Raza A, Bozdech Z and Rowe JA

    Centre for Immunity, Infection and Evolution, Institute of Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.

    Cerebral malaria is the most deadly manifestation of infection with Plasmodium falciparum. The pathology of cerebral malaria is characterized by the accumulation of infected erythrocytes (IEs) in the microvasculature of the brain caused by parasite adhesins on the surface of IEs binding to human receptors on microvascular endothelial cells. The parasite and host molecules involved in this interaction are unknown. We selected three P. falciparum strains (HB3, 3D7, and IT/FCR3) for binding to a human brain endothelial cell line (HBEC-5i). The whole transcriptome of isogenic pairs of selected and unselected parasites was analyzed using a variant surface antigen-supplemented microarray chip. After selection, the most highly and consistently up-regulated genes were a subset of group A-like var genes (HB3var3, 3D7_PFD0020c, ITvar7, and ITvar19) that showed 11- to >100-fold increased transcription levels. These var genes encode P. falciparum erythrocyte membrane protein (PfEMP)1 variants with distinct N-terminal domain types (domain cassette 8 or domain cassette 13). Antibodies to HB3var3 and PFD0020c recognized the surface of live IEs and blocked binding to HBEC-5i, thereby confirming the adhesive function of these variants. The clinical in vivo relevance of the HBEC-selected parasites was supported by significantly higher surface recognition of HBEC-selected parasites compared with unselected parasites by antibodies from young African children suffering cerebral malaria (Mann-Whitney test, P = 0.029) but not by antibodies from controls with uncomplicated malaria (Mann-Whitney test, P = 0.58). This work describes a binding phenotype for virulence-associated group A P. falciparum erythrocyte membrane protein 1 variants and identifies targets for interventions to treat or prevent cerebral malaria.

    Funded by: Wellcome Trust: 077092, 084226, 084535, 084538, 095831

    Proceedings of the National Academy of Sciences of the United States of America 2012;109;26;E1772-81

  • Selection of Plasmodium falciparum parasites for cytoadhesion to human brain endothelial cells.

    Claessens A and Rowe JA

    Centre for Immunity, Infection and Evolution, University of Edinburgh.

    Most human malaria deaths are caused by blood-stage Plasmodium falciparum parasites. Cerebral malaria, the most life-threatening complication of the disease, is characterised by an accumulation of Plasmodium falciparum infected red blood cells (iRBC) at pigmented trophozoite stage in the microvasculature of the brain(2-4). This microvessel obstruction (sequestration) leads to acidosis, hypoxia and harmful inflammatory cytokines (reviewed in (5)). Sequestration is also found in most microvascular tissues of the human body(2, 3). The mechanism by which iRBC attach to the blood vessel walls is still poorly understood. The immortalized Human Brain microvascular Endothelial Cell line (HBEC-5i) has been used as an in vitro model of the blood-brain barrier(6). However, Plasmodium falciparum iRBC attach only poorly to HBEC-5i in vitro, unlike the dense sequestration that occurs in cerebral malaria cases. We therefore developed a panning assay to select (enrich) various P. falciparum strains for adhesion to HBEC-5i in order to obtain populations of high-binding parasites, more representative of what occurs in vivo. A sample of a parasite culture (mixture of iRBC and uninfected RBC) at the pigmented trophozoite stage is washed and incubated on a layer of HBEC-5i grown on a Petri dish. After incubation, the dish is gently washed free from uRBC and unbound iRBC. Fresh uRBC are added to the few iRBC attached to HBEC-5i and incubated overnight. As schizont stage parasites burst, merozoites reinvade RBC and these ring stage parasites are harvested the following day. Parasites are cultured until enough material is obtained (typically 2 to 4 weeks) and a new round of selection can be performed. Depending on the P. falciparum strain, 4 to 7 rounds of selection are needed in order to get a population where most parasites bind to HBEC-5i. The binding phenotype is progressively lost after a few weeks, indicating a switch in variant surface antigen gene expression, thus regular selection on HBEC-5i is required to maintain the phenotype. In summary, we developed a selection assay rendering P. falciparum parasites a more "cerebral malaria adhesive" phenotype. We were able to select 3 out of 4 P. falciparum strains on HBEC-5i. This assay has also successfully been used to select parasites for binding to human dermal and pulmonary endothelial cells. Importantly, this method can be used to select tissue-specific parasite populations in order to identify candidate parasite ligands for binding to brain endothelium. Moreover, this assay can be used to screen for putative anti-sequestration drugs(7).

    Funded by: Wellcome Trust: 084226, 095831

    Journal of visualized experiments : JoVE 2012;59;e3122

  • Design of a variant surface antigen-supplemented microarray chip for whole transcriptome analysis of multiple Plasmodium falciparum cytoadherent strains, and identification of strain-transcendent rif and stevor genes.

    Claessens A, Ghumra A, Gupta AP, Mok S, Bozdech Z and Rowe JA

    Centre for Immunity, Infection and Evolution, Institute of Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, West Mains Rd, Edinburgh, EH9 3JT, UK.

    Background: The cytoadherence of Plasmodium falciparum is thought to be mediated by variant surface antigens (VSA), encoded by var, rif, stevor and pfmc-2tm genes. The last three families have rarely been studied in the context of cytoadherence. As most VSA genes are unique, the variability among sequences has impeded the functional study of VSA across different P. falciparum strains. However, many P. falciparum genomes have recently been sequenced, allowing the development of specific microarray probes for each VSA gene.

    Methods: All VSA sequences from the HB3, Dd2 and IT/FCR3 genomes were extracted using HMMer software. Oligonucleotide probes were designed with OligoRankPick and added to the 3D7-based microarray chip. As a proof of concept, IT/R29 parasites were selected for and against rosette formation and the transcriptomes of isogenic rosetting and non-rosetting parasites were compared by microarray.

    Results: From each parasite strain 50-56 var genes, 125-132 rif genes, 26-33 stevor genes and 3-8 pfmc-2tm genes were identified. Bioinformatic analysis of the new VSA sequences showed that 13 rif genes and five stevor genes were well-conserved across at least three strains (83-100% amino acid identity). The ability of the VSA-supplemented microarray chip to detect cytoadherence-related genes was assessed using P. falciparum clone IT/R29, in which rosetting is known to be mediated by PfEMP1 encoded by ITvar9. Whole transcriptome analysis showed that the most highly up-regulated gene in rosetting parasites was ITvar9 (19 to 429-fold up-regulated over six time points). Only one rif gene (IT4rifA_042) was up-regulated by more than four fold (five fold at 12 hours post-invasion), and no stevor or pfmc-2tm genes were up-regulated by more than two fold. 377 non-VSA genes were differentially expressed by three fold or more in rosetting parasites, although none was as markedly or consistently up-regulated as ITvar9.

    Conclusions: Probes for the VSA of newly sequenced P. falciparum strains can be added to the 3D7-based microarray chip, allowing the analysis of the entire transcriptome of multiple strains. For the rosetting clone IT/R29, the striking transcriptional upregulation of ITvar9 was confirmed, and the data did not support the involvement of other VSA families in rosette formation.

    Funded by: Wellcome Trust: 084226, 095831

    Malaria journal 2011;10;180

  • Putative DNA G-quadruplex formation within the promoters of Plasmodium falciparum var genes.

    Smargiasso N, Gabelica V, Damblon C, Rosu F, De Pauw E, Teulade-Fichou MP, Rowe JA and Claessens A

    Mass Spectrometry Laboratory, GIGA-Research, University of Liege, Liege, Belgium. nsmargiasso@ulg.ac.be

    Background: Guanine-rich nucleic acid sequences are capable of folding into an intramolecular four-stranded structure called a G-quadruplex. When found in gene promoter regions, G-quadruplexes can downregulate gene expression, possibly by blocking the transcriptional machinery. Here we have used a genome-wide bioinformatic approach to identify Putative G-Quadruplex Sequences (PQS) in the Plasmodium falciparum genome, along with biophysical techniques to examine the physiological stability of P. falciparum PQS in vitro.

    Results: We identified 63 PQS in the non-telomeric regions of the P. falciparum clone 3D7. Interestingly, 16 of these PQS occurred in the upstream region of a subset of the P. falciparum var genes (group B var genes). The var gene family encodes PfEMP1, the parasite's major variant antigen and adhesin expressed at the surface of infected erythrocytes, that plays a key role in malaria pathogenesis and immune evasion. The ability of the PQS found in the upstream regions of group B var genes (UpsB-Q) to form stable G-quadruplex structures in vitro was confirmed using 1H NMR, circular dichroism, UV spectroscopy, and thermal denaturation experiments. Moreover, the synthetic compound BOQ1 that shows a higher affinity for DNA forming quadruplex rather than duplex structures was found to bind with high affinity to the UpsB-Q.

    Conclusion: This is the first demonstration of non-telomeric PQS in the genome of P. falciparum that form stable G-quadruplexes under physiological conditions in vitro. These results allow the generation of a novel hypothesis that the G-quadruplex sequences in the upstream regions of var genes have the potential to play a role in the transcriptional control of this major virulence-associated multi-gene family.

    Funded by: Wellcome Trust: 067431, 084226

    BMC genomics 2009;10;362

Olivia Cook

- unknown

Olivia graduated from Cardiff University School of Biosciences with a First in Applied Biology in 2007. During her studies, she spent a year a working in industry conducting a DNA profiling research project at the Forensic Science Service, Birmingham, in association with the Sexual Offences team of the UK Police Force. She began her career working for an Asset Management firm in the City of London where she developed a wide range of transferrable business skills.

Research

Olivia joined the Wellcome Trust Sanger Institute as Research Administrator in 2011 and provides high-level administrative and organisational support to the Malaria Programme as well as being a part of the Resource Centre for the MalariaGEN Genomic Epidemiology Network.

References

  • The prevalence of mixed DNA profiles in fingernail samples taken from individuals in the general population.

    Cook O and Dixon L

    Research & Development, The Forensic Science Service, Birmingham Business Park, Solihull Parkway, Birmingham, United Kingdom.

    The fingernail hyponychium is an isolated area where biological material may accumulate and can provide a valuable source of evidential material in police investigations. DNA transfer between the victim and suspect frequently occurs during violent crimes and in court there is often reasonable doubt that a mixed DNA profile in a fingernail sample has originated from the assault as the profile may be attributed to previous contact between the two individuals. The purpose of this study was to assess background levels of foreign DNA under the fingernails of individuals from the general population in order to provide data that may help to determine whether DNA transfer occurred during or prior to the assault. Fingernail swabs sampled from 100 volunteers were processed by Qiagen extraction and amplified using AMPFlSTR SGM Plus to obtain DNA profiles. Foreign DNA was detected in 13% of samples, with only 6% of these giving reportable mixed DNA profiles, suggesting the incidence of foreign DNA under the fingernails was low. A significant proportion of the mixed DNA profiles came from male donors; the majority had experienced physical contact within the 24h time period prior to sampling.

    Forensic science international. Genetics 2007;1;1;62-8

Eleanor Drury

- Advanced Research Assistant

Eleanor graduated from Newcastle University in 2000 with a degree in Applied Biology. She began her scientific career working for Cambridge University in association with the Juvenile Diabetes Foundation and the Wellcome Trust, before moving into industry and running the RNA team at Pharmagene. Eleanor joined the Sanger Institute in 2004 and worked with the Medical Sequencing team, before joining the Malaria Programme in 2010.

Research

As an Advanced Research Assistant, Eleanor works on the Plasmodium Genome Variation project, which aims to describe the genetic diversity between malaria parasites from across the globe using high throughput genome sequencing. Her work involves sample reception, quantification and quality control, as well as laboratory support including parasite culture, protocol development and troubleshooting.

References

  • The GENCODE exome: sequencing the complete human exome.

    Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, Lehesjoki AE, Turner DJ, Hubbard TJ and Palotie A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing.

    Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT062023, WT077198, WT089062

    European journal of human genetics : EJHG 2011;19;7;827-31

  • Clustered coding variants in the glutamate receptor complexes of individuals with schizophrenia and bipolar disorder.

    Frank RA, McRae AF, Pocklington AJ, van de Lagemaat LN, Navarro P, Croning MD, Komiyama NH, Bradley SJ, Challiss RA, Armstrong JD, Finn RD, Malloy MP, MacLean AW, Harris SE, Starr JM, Bhaskar SS, Howard EK, Hunt SE, Coffey AJ, Ranganath V, Deloukas P, Rogers J, Muir WJ, Deary IJ, Blackwood DH, Visscher PM and Grant SG

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Current models of schizophrenia and bipolar disorder implicate multiple genes, however their biological relationships remain elusive. To test the genetic role of glutamate receptors and their interacting scaffold proteins, the exons of ten glutamatergic 'hub' genes in 1304 individuals were re-sequenced in case and control samples. No significant difference in the overall number of non-synonymous single nucleotide polymorphisms (nsSNPs) was observed between cases and controls. However, cluster analysis of nsSNPs identified two exons encoding the cysteine-rich domain and first transmembrane helix of GRM1 as a risk locus with five mutations highly enriched within these domains. A new splice variant lacking the transmembrane GPCR domain of GRM1 was discovered in the human brain and the GRM1 mutation cluster could perturb the regulation of this variant. The predicted effect on individuals harbouring multiple mutations distributed in their ten hub genes was also examined. Diseased individuals possessed an increased load of deleteriousness from multiple concurrent rare and common coding variants. Together, these data suggest a disease model in which the interplay of compound genetic coding variants, distributed among glutamate receptors and their interacting proteins, contribute to the pathogenesis of schizophrenia and bipolar disorders.

    Funded by: Chief Scientist Office: CZB/4/505, ETM/55; Medical Research Council: MC_U127592696; Wellcome Trust

    PloS one 2011;6;4;e19011

  • An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies.

    Day-Williams AG, McLay K, Drury E, Edkins S, Coffey AJ, Palotie A and Zeggini E

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

    Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.

    Funded by: Wellcome Trust: WT088885/Z/09/Z

    PloS one 2011;6;11;e26279

  • Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.

    Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF, Giannoulatou E, Holmes C, Marchini JL, Stirrups K, Tobin MD, Wain LV, Yau C, Aerts J, Ahmad T, Andrews TD, Arbury H, Attwood A, Auton A, Ball SG, Balmforth AJ, Barrett JC, Barroso I, Barton A, Bennett AJ, Bhaskar S, Blaszczyk K, Bowes J, Brand OJ, Braund PS, Bredin F, Breen G, Brown MJ, Bruce IN, Bull J, Burren OS, Burton J, Byrnes J, Caesar S, Clee CM, Coffey AJ, Connell JM, Cooper JD, Dominiczak AF, Downes K, Drummond HE, Dudakia D, Dunham A, Ebbs B, Eccles D, Edkins S, Edwards C, Elliot A, Emery P, Evans DM, Evans G, Eyre S, Farmer A, Ferrier IN, Feuk L, Fitzgerald T, Flynn E, Forbes A, Forty L, Franklyn JA, Freathy RM, Gibbs P, Gilbert P, Gokumen O, Gordon-Smith K, Gray E, Green E, Groves CJ, Grozeva D, Gwilliam R, Hall A, Hammond N, Hardy M, Harrison P, Hassanali N, Hebaishi H, Hines S, Hinks A, Hitman GA, Hocking L, Howard E, Howard P, Howson JM, Hughes D, Hunt S, Isaacs JD, Jain M, Jewell DP, Johnson T, Jolley JD, Jones IR, Jones LA, Kirov G, Langford CF, Lango-Allen H, Lathrop GM, Lee J, Lee KL, Lees C, Lewis K, Lindgren CM, Maisuria-Armer M, Maller J, Mansfield J, Martin P, Massey DC, McArdle WL, McGuffin P, McLay KE, Mentzer A, Mimmack ML, Morgan AE, Morris AP, Mowat C, Myers S, Newman W, Nimmo ER, O'Donovan MC, Onipinla A, Onyiah I, Ovington NR, Owen MJ, Palin K, Parnell K, Pernet D, Perry JR, Phillips A, Pinto D, Prescott NJ, Prokopenko I, Quail MA, Rafelt S, Rayner NW, Redon R, Reid DM, Renwick, Ring SM, Robertson N, Russell E, St Clair D, Sambrook JG, Sanderson JD, Schuilenburg H, Scott CE, Scott R, Seal S, Shaw-Hawkins S, Shields BM, Simmonds MJ, Smyth DJ, Somaskantharajah E, Spanova K, Steer S, Stephens J, Stevens HE, Stone MA, Su Z, Symmons DP, Thompson JR, Thomson W, Travers ME, Turnbull C, Valsesia A, Walker M, Walker NM, Wallace C, Warren-Perry M, Watkins NA, Webster J, Weedon MN, Wilson AG, Woodburn M, Wordsworth BP, Young AH, Zeggini E, Carter NP, Frayling TM, Lee C, McVean G, Munroe PB, Palotie A, Sawcer SJ, Scherer SW, Strachan DP, Tyler-Smith C, Brown MA, Burton PR, Caulfield MJ, Compston A, Farrall M, Gough SC, Hall AS, Hattersley AT, Hill AV, Mathew CG, Pembrey M, Satsangi J, Stratton MR, Worthington J, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand W, Parkes M, Rahman N, Todd JA, Samani NJ and Donnelly P

    Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.

    Funded by: Arthritis Research UK: 17552; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0400874, G0500115, G0501942, G0600329, G0600705, G0700491, G0701003, G0701420, G0701810, G0701810(85517), G0800383, G0800759, G19/9, G90/106, G9521010, MC_UP_A390_1107; Wellcome Trust: 061858, 083948, 089989

    Nature 2010;464;7289;713-20

  • Target-enrichment strategies for next-generation sequencing.

    Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J and Turner DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have not yet reached a point at which routine sequencing of large numbers of whole eukaryotic genomes is feasible, and so it is often necessary to select genomic regions of interest and to enrich these regions before sequencing. There are several enrichment approaches, each with unique advantages and disadvantages. Here we describe our experiences with the leading target-enrichment technologies, the optimizations that we have performed and typical results that can be obtained using each. We also provide detailed protocols for each technology so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.

    Funded by: NHGRI NIH HHS: 5R21HG004749, R21 HG004749; NHLBI NIH HHS: 5R01HL094976, R01 HL094976; NIGMS NIH HHS: T32 GM007266; Wellcome Trust: WT079643

    Nature methods 2010;7;2;111-8

  • Esophageal atresia, hypoplasia of zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation--new MCA/MR syndrome in two affected sibs and a mildly affected mother?

    Wieczorek D, Shaw-Smith C, Kohlhase J, Schmitt W, Buiting K, Coffey A, Howard E, Hehr U and Gillessen-Kaesbach G

    Institut für Humangenetik, Universitätsklinikum Essen, Germany, and Department of Medical Genetics, Addenbrooke's Hospital, Cambridge, UK. dagmar.wieczorek@uni-due.de

    The previously undescribed combination of esophageal atresia, hypoplasia of the zygomatic complex, microcephaly, cup-shaped ears, congenital heart defect, and mental retardation was diagnosed in two siblings of different sexes, with the brother being more severely affected. The mother presented with zygomatic arch hypoplasia of the right side only. We discuss major differential diagnoses: Goldenhar, Feingold, CHARGE, and Treacher Collins syndromes show a few overlapping clinical features, but these diagnoses are unlikely as the clinical findings are unusual for Goldenhar syndrome and mutational screening of the MYCN, the CHD7, and the TCOF1 genes did not reveal any abnormalities. Autosomal recessive oto-facial syndrome, hypomandibular faciocranial dysostosis, and Ozkan syndromes were clinically excluded. A microdeletion 22q11.2 was excluded by FISH analysis, a microdeletion 2p23-p24 by microsatellite analyses, a subtelomeric chromosomal aberration by MLPA, and a small genomic deletion/duplication by CGH array. As X-inactivation studies did not show skewed X-inactivation in the mother, we consider X-chromosomal recessive inheritance of this condition less likely. We discuss autosomal dominant inheritance with variable expressivity or mosaicism in the mother as the likely genetic mechanism in this new multiple congenital anomaly/mental retardation (MCA/MR) syndrome.

    American journal of medical genetics. Part A 2007;143A;11;1135-42

Jacob Almagro Garcia

- unknown

Jacob started his DPhil at Oxford University in Autumn 2011. He has a background in Computer Science and completed his Masters in Grid Computing and e-Engineering at Cranfield University. Jacob previously worked for the Malaria Programme Kwaitkowski Group at the Wellcome Trust Sanger Institute, but now works with the Kwaitkowski Group at the Wellcome Trust Centre for Human Genetics at Oxford University.

Research

Jacob's principal role at the Sanger Institute was to develop and optimize software to process, analyse, and share massive DNA sequence datasets. He is interested in population genetics, statistics, algorithm design, bio-inspired algorithms, grid computing, bioinformatics and software development.

Will Hamilton

- Research Associate

Will Hamilton is part of the University of Cambridge MB-PhD programme, combining a research PhD with training in clinical medicine. He started his PhD in 2011 at the Mahidol Oxford Tropical Medicine Research Unit (MORU) in Bangkok, Thailand, before moving to Sanger.

Will's research background is mainly in laboratory work, investigating innate defence against retroviruses such as HIV in the Mothes Laboratory at Yale, and protein expression in Trypanosoma brucei, the parasite responsible for African sleeping sickness, in the Field Laboratory in Cambridge. His clinical interests include infectious disease, tropical medicine and global public health.

Research

Will investigates how the Plasmodium falciparum genome changes (mutates) over time. Genetic mutations are the driving force behind evolutionary change, and this diversity is an important factor in how P. falciparum develops antimalarial drug resistance and evades the human immune system. Whole genome sequencing provides a uniquely overarching view of parasite evolution and identifies highly dynamic regions of the genome that are of evolutionary and pathophysiological interest. Will is also investigating how DNA repair processes impact on Plasmodium genome evolution, using experimental genetic approaches.

Katja Kivinen

- unknown

Katja has over 16 years experience in Life Sciences with a BSc and MSc in Genetics from the University of Helsinki, and a joint PhD in Bioinformatics from the University of Cambridge and European Molecular Biology Laboratory. Katja's PhD work consisted of genome-wide analysis gene expression and regulation in yeasts carried out at the European Bioinformatics Institute (EBI).

In her free time, Katja is also involved with pre-eclampsia research through "FINNPEC" and "InterPregGen" consortia and various collaboration agreements that aim to identify gene variants and biomarkers which increase risk of pre-eclampsia.

Research

Katja joined the Sanger Institute Malaria Programme in 2007 and currently works on the Human Genome Variation project. She is involved in all stages of human malaria projects, from strategic planning to data analysis, but her main focus is on managing 35,000 irreplaceable human DNA samples, overseeing their progress through genomic pipelines and acting as an interface between the pipeline teams and the MalariaGEN coordination centre.

References

  • Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.

    Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom. sc11@sanger.ac.uk

    The diversity in the Plasmodium falciparum genome can be used to explore parasite population dynamics, with practical applications to malaria control. The ability to identify the geographic origin and trace the migratory patterns of parasites with clinically important phenotypes such as drug resistance is particularly relevant. With increasing single-nucleotide polymorphism (SNP) discovery from ongoing Plasmodium genome sequencing projects, a demand for high SNP and sample throughput genotyping platforms for large-scale population genetic studies is required. Low parasitaemias and multiple clone infections present a number of challenges to genotyping P. falciparum. We addressed some of these issues using a custom 384-SNP Illumina GoldenGate assay on P. falciparum DNA from laboratory clones (long-term cultured adapted parasite clones), short-term cultured parasite isolates and clinical (non-cultured isolates) samples from East and West Africa, Southeast Asia and Oceania. Eighty percent of the SNPs (n = 306) produced reliable genotype calls on samples containing as little as 2 ng of total genomic DNA and on whole genome amplified DNA. Analysis of artificial mixtures of laboratory clones demonstrated high genotype calling specificity and moderate sensitivity to call minor frequency alleles. Clear resolution of geographically distinct populations was demonstrated using Principal Components Analysis (PCA), and global patterns of population genetic diversity were consistent with previous reports. These results validate the utility of the platform in performing population genetic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0600718, G19/9; NIAID NIH HHS: R37 AI048071; Wellcome Trust: 090532, 093956

    PloS one 2011;6;6;e20251

  • Genetic evidence of multiple loci in dystocia--difficult labour.

    Algovik M, Kivinen K, Peterson H, Westgren M and Kere J

    Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden. michael.algovik@ltkalmar.se

    Background: Dystocia, difficult labour, is a common but also complex problem during childbirth. It can be attributed to either weak contractions of the uterus, a large infant, reduced capacity of the pelvis or combinations of these. Previous studies have indicated that there is a genetic component in the susceptibility of experiencing dystocia. The purpose of this study was to identify susceptibility genes in dystocia.

    Methods: A total of 104 women in 47 families were included where at least two sisters had undergone caesarean section at a gestational length of 286 days or more at their first delivery. Study of medical records and a telephone interview was performed to identify subjects with dystocia. Whole-genome scanning using Affymetrix genotyping-arrays and non-parametric linkage (NPL) analysis was made in 39 women exhibiting the phenotype of dystocia from 19 families. In 68 women re-sequencing was performed of candidate genes showing suggestive linkage: oxytocin (OXT) on chromosome 20 and oxytocin-receptor (OXTR) on chromosome 3.

    Results: We found a trend towards linkage with suggestive NPL-score (3.15) on chromosome 12p12. Suggestive linkage peaks were observed on chromosomes 3, 4, 6, 10, 20. Re-sequencing of OXT and OXTR did not reveal any causal variants.

    Conclusions: Dystocia is likely to have a genetic component with variations in multiple genes affecting the patient outcome. We found 6 loci that could be re-evaluated in larger patient cohorts.

    BMC medical genetics 2010;11;105

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062

    Nature genetics 2009;41;6;657-65

  • Association of psoriasis to PGLYRP and SPRR genes at PSORS4 locus on 1q shows heterogeneity between Finnish, Swedish and Irish families.

    Kainu K, Kivinen K, Zucchelli M, Suomela S, Kere J, Inerot A, Baker BS, Powles AV, Fry L, Samuelsson L and Saarialho-Kere U

    Department of Medical Genetics, University of Helsinki, Helsinki, Finland.

    A susceptibility locus for psoriasis, PSORS4, has been mapped to chromosome 1q21 in the region of the epidermal differentiation complex. The region has been refined to a 115 kb interval around the loricrin (LOR) gene. However, no evidence of association between polymorphisms in the LOR gene and psoriasis has been found. Therefore, we have analysed association to three candidate gene clusters of the region, the S100, small proline-rich protein (SPRR) and PGLYRP (peptidoglycan recognition protein) genes, which all contain functionally interesting psoriasis candidate genes. In previous studies, the SPRR and S100 genes have shown altered expression in psoriasis. Also polymorphisms in the PGLYRP genes have shown to be associated with psoriasis. We genotyped altogether 29 single nucleotide polymorphisms (SNPs) in 255 Finnish psoriasis families and analysed association with psoriasis using transmission disequilibrium test. A five-SNP haplotype of PGLYRP SNPs associated significantly with psoriasis. There was also suggestive evidence of association to SPRR gene locus in Finnish families. To confirm the putative associations, selected SNPs were genotyped also in a family collection of Swedish and Irish patients. The families supported association to the two gene regions, but there was also evidence of allelic heterogeneity.

    Experimental dermatology 2009;18;2;109-15

  • Identification of MAMDC1 as a candidate susceptibility gene for systemic lupus erythematosus (SLE).

    Hellquist A, Zucchelli M, Lindgren CM, Saarialho-Kere U, Järvinen TM, Koskenmies S, Julkunen H, Onkamo P, Skoog T, Panelius J, Räisänen-Sokolowski A, Hasan T, Widen E, Gunnarson I, Svenungsson E, Padyukov L, Assadi G, Berglind L, Mäkelä VV, Kivinen K, Wong A, Cunningham Graham DS, Vyse TJ, D'Amato M and Kere J

    Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.

    Background: Systemic lupus erythematosus (SLE) is a complex autoimmune disorder with multiple susceptibility genes. We have previously reported suggestive linkage to the chromosomal region 14q21-q23 in Finnish SLE families.

    Genetic fine mapping of this region in the same family material, together with a large collection of parent affected trios from UK and two independent case-control cohorts from Finland and Sweden, indicated that a novel uncharacterized gene, MAMDC1 (MAM domain containing glycosylphosphatidylinositol anchor 2, also known as MDGA2, MIM 611128), represents a putative susceptibility gene for SLE. In a combined analysis of the whole dataset, significant evidence of association was detected for the MAMDC1 intronic single nucleotide polymorphisms (SNP) rs961616 (P -value = 0.001, Odds Ratio (OR) = 1.292, 95% CI 1.103-1.513) and rs2297926 (P -value = 0.003, OR = 1.349, 95% CI 1.109-1.640). By Northern blot, real-time PCR (qRT-PCR) and immunohistochemical (IHC) analyses, we show that MAMDC1 is expressed in several tissues and cell types, and that the corresponding mRNA is up-regulated by the pro-inflammatory cytokines tumour necrosis factor alpha (TNF-alpha) and interferon gamma (IFN-gamma) in THP-1 monocytes. Based on its homology to known proteins with similar structure, MAMDC1 appears to be a novel member of the adhesion molecules of the immunoglobulin superfamily (IgCAM), which is involved in cell adhesion, migration, and recruitment to inflammatory sites. Remarkably, some IgCAMs have been shown to interact with ITGAM, the product of another SLE susceptibility gene recently discovered in two independent genome wide association (GWA) scans.

    Significance: Further studies focused on MAMDC1 and other molecules involved in these pathways might thus provide new insight into the pathogenesis of SLE.

    PloS one 2009;4;12;e8037

  • A global network for investigating the genomic epidemiology of malaria.

    Malaria Genomic Epidemiology Network

    The University of Buea, PO Box 63, Buea, South West Province, Cameroon.

    Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consortial approach that brings together researchers from 21 countries.

    Funded by: Medical Research Council: G0200454, G0200454(62635), G0600230, G0600230(77610), G0600718, G19/9; Wellcome Trust: 076934, 077383, 077383/Z/05/Z

    Nature 2008;456;7223;732-7

  • The human GIMAP5 gene has a common polyadenylation polymorphism increasing risk to systemic lupus erythematosus.

    Hellquist A, Zucchelli M, Kivinen K, Saarialho-Kere U, Koskenmies S, Widen E, Julkunen H, Wong A, Karjalainen-Lindsberg ML, Skoog T, Vendelin J, Cunninghame-Graham DS, Vyse TJ, Kere J and Lindgren CM

    Department of Biosciences at Novum, Karolinska Institute, Stockholm, Sweden.

    Background: Several members of the GIMAP gene family have been suggested as being involved in different aspects of the immune system in different species. Recently, a mutation in the GIMAP5 gene was shown to cause lymphopenia in a rat model of autoimmune insulin-dependent diabetes. Thus it was hypothesised that genetic variation in GIMAP5 may be involved in susceptibility to other autoimmune disorders where lymphopenia is a key feature, such as systemic lupus erythematosus (SLE).

    To investigate this, seven single nucleotide polymorphisms in GIMAP5 were analysed in five independent sets of family-based SLE collections, containing more than 2000 samples.

    Result: A significant increase in SLE risk associated with the most common GIMAP5 haplotype was found (OR 1.26, 95% CI 1.02 to 1.54, p = 0.0033). In families with probands diagnosed with trombocytopenia, the risk was increased (OR 2.11, 95% CI 1.09 to 4.09, p = 0.0153). The risk haplotype bears a polymorphic polyadenylation signal which alters the 3' part of GIMAP5 mRNA by producing an inefficient polyadenylation signal. This results in higher proportion of non-terminated mRNA for homozygous individuals (p<0.005), a mechanism shown to be causal in thalassaemias. To further assess the functional effect of the polymorphic polyadenylation signal in the risk haplotype, monocytes were treated with several cytokines affecting apoptosis. All the apoptotic cytokines induced GIMAP5 expression in two monocyte cell lines (1.5-6 times, p<0.0001 for all tests).

    Conclusion: Taken together, the data suggest the role of GIMAP5 in the pathogenesis of SLE.

    Journal of medical genetics 2007;44;5;314-21

  • Evaluation of STOX1 as a preeclampsia candidate gene in a population-wide sample.

    Kivinen K, Peterson H, Hiltunen L, Laivuori H, Heino S, Tiala I, Knuutila S, Rasi V and Kere J

    Department of Biosciences at Novum, Karolinska Institutet, Stockholm, Sweden.

    Preeclampsia is a common, pregnancy-specific vascular disorder characterised by hypertension and proteinuria. A recent report suggested association of the STOX1 gene on chromosome 10q22.1 with preeclampsia in the Dutch population. Here, we present a comprehensive assessment of STOX1 as a candidate gene for preeclampsia in the Finnish population by re-examining our previous genetic linkage analysis results for both chromosome 10 and paralogous loci, by genotyping representative markers in a nationwide data set, and by studying STOX1 expression in placentas from preeclamptic and uncomplicated pregnancies. In conclusion, we are unable to validate STOX1 as a common preeclampsia susceptibility gene.

    European journal of human genetics : EJHG 2007;15;4;494-7

  • Distinct sets of developmentally regulated genes that are expressed by human oocytes and human embryonic stem cells.

    Zhang P, Kerkelä E, Skottman H, Levkov L, Kivinen K, Lahesmaa R, Hovatta O and Kere J

    Department of Biosciences and Nutrition at NOVUM, Karolinska Institutet, Huddinge, Sweden.

    Objective: To identify genes that are expressed differently during final oocyte maturation and early embryonic development in humans.

    Design: Comparison of gene expression profiles of human germinal vesicle oocytes (hGVO), human embryonic stem cells (hESC) and human foreskin fibroblasts.

    Setting: Research centers and a fertility unit in a university hospital.

    Fifty-five healthy women donated 76 hGVO.

    None.

    Gene expression profiles were analyzed and compared with the use of microarray and reverse-transcription polymerase chain reaction.

    Altogether, 10,183 genes were expressed in hGVO, and 45% of these genes were unclassified by biologic function. Four oocyte-specific genes (MATER, ZAR1, NPM2 and FIGLA) were detected in hGVO for the first time. Known components of 4 signaling pathways (MOS-MPF, transforming growth factor-beta, WNT, and NOTCH) were also found expressed in hGVO, with some components detected in hGVO for the first time. Distinct sets of genes that were revealed by comparison of expression profiles between hGVO, hESC, and human foreskin fibroblasts appear to be involved in oocyte maturation and early embryonic development.

    We obtained, for the first time, a large amount of information on gene expression of hGVO as compared with hESC. These data, from a unique research material-human oocytes, can now be used to understand the molecular mechanisms of early human development.

    Fertility and sterility 2007;87;3;677-90

  • Heterogeneity-based genome search meta-analysis for preeclampsia.

    Zintzaras E, Kitsios G, Harrison GA, Laivuori H, Kivinen K, Kere J, Messinis I, Stefanidis I and Ioannidis JP

    Department of Biomathematics, University of Thessaly School of Medicine, Papakyriazi 22, Larissa, 41222, Greece. zintza@med.uth.gr

    Preeclampsia is a pregnancy-related disorder that causes maternal and fetal morbidity and mortality. Its exact inheritance pattern is still unknown, and genome searches for identifying susceptibility loci for preeclampsia have thus far produced inconclusive or inconsistent results. We performed a heterogeneity-based genome search meta-analysis (HEGESMA) that synthesized the available genome scan data on preeclampsia. HEGESMA identifies genetic regions (bins) that rank highly on average in terms of linkage statistics across genome scans (searches). The significance of each bin's average rank and heterogeneity across scans was calculated using Monte Carlo tests. The meta-analysis involved four genome-scans on general preeclampsia and five scans on severe preeclampsia. In general preeclampsia, 13 bins had significantly high average rank (Prank< 0.05) by either unweighted or weighted analyses, while four of them (2p11.2-2q21.1, 9q21.32-9q31.2, 2p15-2p11.2, 2q32.1-2q35) were formally significant by both analyses. Heterogeneity of bin 2.8 (2q32.1-2q35) was significantly low in both unweighted and weighted analysis (PQ< 0.01). In severe preeclampsia, 10 bins had significantly high average rank by either unweighted or weighted analyses and five of them (3q11.1-3q21.2, 2q37.1-2q37.3, 18p11.32-18p11.22, 2p15-2p11.2, 7q34-7q36.3) were significant by both analyses. Bin 2q37.1-2q37.3 showed marginal low heterogeneity in unweighted and weighted analysis (PQ= 0.06). Results should be interpreted with caution as the p values were modest. Further investigation of these regions by genotyping with additional markers and families may help to direct the identification of candidate genes for preeclampsia.

    Human genetics 2006;120;3;360-70

Bronwyn MacInnis

- Senior Scientific Programme Manager

Bronwyn completed her PhD under Bob Campenot at the University of Alberta and continued her training as a postdoc with Miriam Goodman at Stanford University. Her research focus was in the genetics of behaviour, learning and memory. She joined the Sanger Institute in 2007, and is now a key member of management for the Sanger Malaria Programme and MalariaGEN.

Research

Bronwyn is a Senior Scientific Manager for the Sanger Malaria Programme. Her role is to provide strategic and operational management to help ensure the sucessful delivery of programme goals. She is also responsible for building and maintaining external scientific partnerships, as well as overseeing internal sequencing operations.

She is also a key member of the MalariaGEN Resource Centre and works to establish MalariaGEN's role within the scientific community, through the strategic development of key MalariaGEN projects and synergies with other research initiatives and programmes, as well as providing a point-of-contact for partners.

References

  • Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data.

    Auburn S, Campino S, Miotto O, Djimde AA, Zongo I, Manske M, Maslen G, Mangano V, Alcock D, MacInnis B, Rockett KA, Clark TG, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Our understanding of the composition of multi-clonal malarial infections and the epidemiological factors which shape their diversity remain poorly understood. Traditionally within-host diversity has been defined in terms of the multiplicity of infection (MOI) derived by PCR-based genotyping. Massively parallel, single molecule sequencing technologies now enable individual read counts to be derived on genome-wide datasets facilitating the development of new statistical approaches to describe within-host diversity. In this class of measures the F(WS) metric characterizes within-host diversity and its relationship to population level diversity. Utilizing P. falciparum field isolates from patients in West Africa we here explore the relationship between the traditional MOI and F(WS) approaches. F(WS) statistics were derived from read count data at 86,158 SNPs in 64 samples sequenced on the Illumina GA platform. MOI estimates were derived by PCR at the msp-1 and -2 loci. Significant correlations were observed between the two measures, particularly with the msp-1 locus (P = 5.92×10(-5)). The F(WS) metric should be more robust than the PCR-based approach owing to reduced sensitivity to potential locus-specific artifacts. Furthermore the F(WS) metric captures information on a range of parameters which influence out-crossing risk including the number of clones (MOI), their relative proportions and genetic divergence. This approach should provide novel insights into the factors which correlate with, and shape within-host diversity.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 089275, 090532, 090770

    PloS one 2012;7;2;e32891

  • Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    Oyola SO, Otto TD, Gu Y, Maslen G, Manske M, Campino S, Turner DJ, Macinnis B, Kwiatkowski DP, Swerdlow HP and Quail MA

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. so1@sanger.ac.uk

    Background: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.

    Results: We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates.

    Conclusion: We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 079355/Z/06/Z, 090532

    BMC genomics 2012;13;1

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770

    PloS one 2011;6;7;e22213

  • Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

    Robinson T, Campino SG, Auburn S, Assefa SA, Polley SD, Manske M, MacInnis B, Rockett KA, Maslen GL, Sanders M, Quail MA, Chiodini PL, Kwiatkowski DP, Clark TG and Sutherland CJ

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 077012/Z/05/Z, 090532

    PloS one 2011;6;8;e23204

  • Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay.

    Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Hinxton, United Kingdom. sc11@sanger.ac.uk

    The diversity in the Plasmodium falciparum genome can be used to explore parasite population dynamics, with practical applications to malaria control. The ability to identify the geographic origin and trace the migratory patterns of parasites with clinically important phenotypes such as drug resistance is particularly relevant. With increasing single-nucleotide polymorphism (SNP) discovery from ongoing Plasmodium genome sequencing projects, a demand for high SNP and sample throughput genotyping platforms for large-scale population genetic studies is required. Low parasitaemias and multiple clone infections present a number of challenges to genotyping P. falciparum. We addressed some of these issues using a custom 384-SNP Illumina GoldenGate assay on P. falciparum DNA from laboratory clones (long-term cultured adapted parasite clones), short-term cultured parasite isolates and clinical (non-cultured isolates) samples from East and West Africa, Southeast Asia and Oceania. Eighty percent of the SNPs (n = 306) produced reliable genotype calls on samples containing as little as 2 ng of total genomic DNA and on whole genome amplified DNA. Analysis of artificial mixtures of laboratory clones demonstrated high genotype calling specificity and moderate sensitivity to call minor frequency alleles. Clear resolution of geographically distinct populations was demonstrated using Principal Components Analysis (PCA), and global patterns of population genetic diversity were consistent with previous reports. These results validate the utility of the platform in performing population genetic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute; Medical Research Council: G0600718, G19/9; NIAID NIH HHS: R37 AI048071; Wellcome Trust: 090532, 093956

    PloS one 2011;6;6;e20251

  • SnoopCGH: software for visualizing comparative genomic hybridization data.

    Almagro-Garcia J, Manske M, Carret C, Campino S, Auburn S, Macinnis BL, Maslen G, Pain A, Newbold CI, Kwiatkowski DP and Clark TG

    Wellcome Trust Sanger Institute, Hinxton, The Weatherall Institute of Molecular Medicine and Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. jg10@sanger.ac.uk

    Unlabelled: Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data.

    SnoopCGH is written in java and is available from http://snoopcgh.sourceforge.net/

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;20;2732-3

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062

    Nature genetics 2009;41;6;657-65

  • Malaria genomics meets drug-resistance phenotyping in the field.

    Hunt P, Macinnis B and Roper C

    Institute for Immunology and Infection Research, University of Edinburgh, Ashworth Laboratory, Kings Buildings, Edinburgh EH9 3JT, UK. Paul.Hunt@ed.ac.uk

    A report of the 2nd Wellcome Trust Conference on Genomic Epidemiology of Malaria, Hinxton, UK, 14-17 June 2009.

    Funded by: Medical Research Council: G0400476

    Genome biology 2009;10;8;314

  • A global network for investigating the genomic epidemiology of malaria.

    Malaria Genomic Epidemiology Network

    The University of Buea, PO Box 63, Buea, South West Province, Cameroon.

    Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consortial approach that brings together researchers from 21 countries.

    Funded by: Medical Research Council: G0200454, G0200454(62635), G0600230, G0600230(77610), G0600718, G19/9; Wellcome Trust: 076934, 077383, 077383/Z/05/Z

    Nature 2008;456;7223;732-7

Cinzia Malangone

- Senior Software Developer

Cinzia graduated with a Masters degree in Computer Science from the University of Turin in 2003. She worked for several years in the Insurance and Retail sectors, before starting her biological career working on a cardiovascular applications project at the University of Turin.

Upon joining the Wellcome Trust Sanger Institute in 2010, Cinzia worked with the Production Software Development team on the development of Sequencescape, a LIMs based sample management system for tracking samples through the Illumina sequencing pipeline.

Research

Cinzia joined the Malaria Programme in April 2011 and works on the design and development of web applications to explore catalogues of genetic variation, the sequencing analysis pipeline, and on ExplorerCat, a web-based application for publishing genotyping data.

Magnus Manske

- Head of Informatics

After graduating from the University of Cologne in 2006 with a PhD in Biochemistry and magna cum laude, Magnus joined the Kwiatkowski Group of the Sanger Institute in 2007 as a Senior Computer Programmer and currently holds the position of Head of Informatics. Magnus has also worked voluntarily for many years as an author and programmer for Wikipedia. He is the original author of MediaWiki, the software that powers Wikipedia and many other wiki-based sites.

Research

Magnus works on the analysis of sequencing and genotyping data for MalariaGEN projects as well as the group's Plasmodium Genome Variation project, and also writes interactive data visualization applications for the web.

http://www.linkedin.com/profile/view?id=58317594

References

  • Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    Oyola SO, Otto TD, Gu Y, Maslen G, Manske M, Campino S, Turner DJ, Macinnis B, Kwiatkowski DP, Swerdlow HP and Quail MA

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. so1@sanger.ac.uk

    Background: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.

    Results: We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates.

    Conclusion: We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 079355/Z/06/Z, 090532

    BMC genomics 2012;13;1

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770

    PloS one 2011;6;7;e22213

  • An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations.

    Tan JC, Miller BA, Tan A, Patel JJ, Cheeseman IH, Anderson TJ, Manske M, Maslen G, Kwiatkowski DP and Ferdig MT

    The Eck Institute for Global Health, University of Notre Dame, 100 Galvin Life Sciences, Notre Dame, IN 46556, USA.

    We present an optimized probe design for copy number variation (CNV) and SNP genotyping in the Plasmodium falciparum genome. We demonstrate that variable length and isothermal probes are superior to static length probes. We show that sample preparation and hybridization conditions mitigate the effects of host DNA contamination in field samples. The microarray and workflow presented can be used to identify CNVs and SNPs with 95% accuracy in a single hybridization, in field samples containing up to 92% human DNA contamination.

    Funded by: Medical Research Council: G19/9; NCRR NIH HHS: RR013556; NIAID NIH HHS: AI072517, AI075145; Wellcome Trust: 090532

    Genome biology 2011;12;4;R35

  • Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

    Robinson T, Campino SG, Auburn S, Assefa SA, Polley SD, Manske M, MacInnis B, Rockett KA, Maslen GL, Sanders M, Quail MA, Chiodini PL, Kwiatkowski DP, Clark TG and Sutherland CJ

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity.

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust: 077012/Z/05/Z, 090532

    PloS one 2011;6;8;e23204

  • Ten simple rules for editing Wikipedia.

    Logan DW, Sandal M, Gardner PP, Manske M and Bateman A

    PLoS computational biology 2010;6;9

  • SnoopCGH: software for visualizing comparative genomic hybridization data.

    Almagro-Garcia J, Manske M, Carret C, Campino S, Auburn S, Macinnis BL, Maslen G, Pain A, Newbold CI, Kwiatkowski DP and Clark TG

    Wellcome Trust Sanger Institute, Hinxton, The Weatherall Institute of Molecular Medicine and Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. jg10@sanger.ac.uk

    Unlabelled: Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data.

    SnoopCGH is written in java and is available from http://snoopcgh.sourceforge.net/

    Funded by: Medical Research Council: G0600718, G19/9; Wellcome Trust

    Bioinformatics (Oxford, England) 2009;25;20;2732-3

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062

    Nature genetics 2009;41;6;657-65

  • A global network for investigating the genomic epidemiology of malaria.

    Malaria Genomic Epidemiology Network

    The University of Buea, PO Box 63, Buea, South West Province, Cameroon.

    Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consortial approach that brings together researchers from 21 countries.

    Funded by: Medical Research Council: G0200454, G0200454(62635), G0600230, G0600230(77610), G0600718, G19/9; Wellcome Trust: 076934, 077383, 077383/Z/05/Z

    Nature 2008;456;7223;732-7

  • The RNA WikiProject: community annotation of RNA families.

    Daub J, Gardner PP, Tate J, Ramsköld D, Manske M, Scott WG, Weinberg Z, Griffiths-Jones S and Bateman A

    The online encyclopedia Wikipedia has become one of the most important online references in the world and has a substantial and growing scientific content. A search of Google with many RNA-related keywords identifies a Wikipedia article as the top hit. We believe that the RNA community has an important and timely opportunity to maximize the content and quality of RNA information in Wikipedia. To this end, we have formed the RNA WikiProject (http://en.wikipedia.org/wiki/Wikipedia:WikiProject_RNA) as part of the larger Molecular and Cellular Biology WikiProject. We have created over 600 new Wikipedia articles describing families of noncoding RNAs based on the Rfam database, and invite the community to update, edit, and correct these articles. The Rfam database now redistributes this Wikipedia content as the primary textual annotation of its RNA families. Users can, therefore, for the first time, directly edit the content of one of the major RNA databases. We believe that this Wikipedia/Rfam link acts as a functioning model for incorporating community annotation into molecular biology databases.

    Funded by: Howard Hughes Medical Institute; Wellcome Trust: 077044

    RNA (New York, N.Y.) 2008;14;12;2462-4

Daniel Mead

- Advanced Research Assistant

Daniel graduated from the University of Aberdeen with a BSc in Genetics and Immunology in 2004, before taking a year out to help the fami1y pub business in Yorkshire. He began his scientific career working as a Forensic DNA Analysis and Robot Support Technician at the Forensic Science Service in Huntington for 3 years, before joining the Kwiatkowski Group in 2008.

Research

As an Advanced Research Assistant working on the Plasmodium and Anopheles sequencing pipeline at Sanger, Daniel's work involves troubleshoorting instrument protocols, supporting Senior Scientists and optimising DNA handling protocls, sample collection and extraction techniques.

Daniel is also involved with the MalariaGEN Consortial Projects 1 and 3.

References

  • An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    Auburn S, Campino S, Clark TG, Djimde AA, Zongo I, Pinches R, Manske M, Mangano V, Alcock D, Anastasi E, Maslen G, Macinnis B, Rockett K, Modiano D, Newbold CI, Doumbo OK, Ouédraogo JB and Kwiatkowski DP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. sa3@sanger.ac.uk

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum.

    Funded by: Howard Hughes Medical Institute: 55005502; Medical Research Council: G0600718, G19/9; Wellcome Trust: 090532, 090770

    PloS one 2011;6;7;e22213

  • Genome-wide and fine-resolution association analysis of malaria in West Africa.

    Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, Sirugo G, Sisay-Joof F, Usen S, Auburn S, Bumpstead SJ, Campino S, Coffey A, Dunham A, Fry AE, Green A, Gwilliam R, Hunt SE, Inouye M, Jeffreys AE, Mendy A, Palotie A, Potter S, Ragoussis J, Rogers J, Rowlands K, Somaskantharajah E, Whittaker P, Widden C, Donnelly P, Howie B, Marchini J, Morris A, SanJoaquin M, Achidi EA, Agbenyega T, Allen A, Amodu O, Corran P, Djimde A, Dolo A, Doumbo OK, Drakeley C, Dunstan S, Evans J, Farrar J, Fernando D, Hien TT, Horstmann RD, Ibrahim M, Karunaweera N, Kokwaro G, Koram KA, Lemnge M, Makani J, Marsh K, Michon P, Modiano D, Molyneux ME, Mueller I, Parker M, Peshu N, Plowe CV, Puijalon O, Reeder J, Reyburn H, Riley EM, Sakuntabhai A, Singhasivanon P, Sirima S, Tall A, Taylor TE, Thera M, Troye-Blomberg M, Williams TN, Wilson M, Kwiatkowski DP, Wellcome Trust Case Control Consortium and Malaria Genomic Epidemiology Network

    MRC Laboratories, Fajara, Banjul, Gambia.

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Howard Hughes Medical Institute; Medical Research Council: G0600230, G0600230(77610), G0600329, G0600718, G0800759, G19/9, G9828345, MC_U190081977, MC_U190081993; NIAID NIH HHS: U19 AI065683, U19 AI065683-04; Wellcome Trust: 061858, 064890, 076113, 076934, 077011, 077383, 077383/Z/05/Z, 081682, 089062

    Nature genetics 2009;41;6;657-65

  • A global network for investigating the genomic epidemiology of malaria.

    Malaria Genomic Epidemiology Network

    The University of Buea, PO Box 63, Buea, South West Province, Cameroon.

    Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consortial approach that brings together researchers from 21 countries.

    Funded by: Medical Research Council: G0200454, G0200454(62635), G0600230, G0600230(77610), G0600718, G19/9; Wellcome Trust: 076934, 077383, 077383/Z/05/Z

    Nature 2008;456;7223;732-7

* quick link - http://q.sanger.ac.uk/mal-kwia