Roberto Amato, Wellcome Sanger Institute

Open data on malaria genomes will help combat drug resistance

The release represents the world’s largest resource of genomic data on malaria parasite evolution and drug resistance

Genome variation data on more than 7,000 malaria parasites from 28 endemic countries is released today (24 February) in Wellcome Open Research. It has been produced by MalariaGEN, a data-sharing network of groups around the world who are working together to build high-quality data resources for malaria research and disease control.

This open data release represents the world’s largest resource of genomic data on malaria parasite evolution and drug resistance. It provides benchmark data on parasite genome variation that is needed in the search for new drugs and vaccines, and in the development of surveillance tools for malaria control and elimination.

Malaria is a major global health problem causing an estimated 409,000 deaths in 2019, with 67 per cent of deaths occurring in children under five years of age*. This data resource focuses on Plasmodium falciparum, the species of malaria parasite that is responsible for the most common and deadliest form of the disease.

The Malaria Genomic Epidemiology Network (MalariaGEN) provides researchers and control programmes in malaria-endemic countries with access to DNA sequencing technologies and tools for genomic analysis. Founded in 2005, MalariaGEN now has partners in 39 countries, each leading their own studies into different aspects of malaria biology and epidemiology, with the common goal of finding ways to improve malaria control.

This latest publication, which is awaiting peer review, represents the work of 49 partner studies at 73 locations in Africa, Asia, South America and Oceania, who together contributed 7,113 samples of P. falciparum for genome sequencing. At the Wellcome Sanger Institute, each sample was analysed for over 3 million genetic variants and the data were carefully curated before returning to partners for use in their own research. This paper brings together the data from all the partner studies to provide an open data resource for the wider scientific community.

“We have created a data resource that is ‘analysis ready’ for anyone to use, including those without specialist genetics training. Each annotated dataset sample includes key features that are relevant to malaria control, such as resistance to six major antimalarial drugs, and whether it carries particular structural changes that cause diagnostic malaria tests to fail. Like the Human Genome Project was a resource for the analyses of human genome sequence data, we hope this will be one of the main resources for malaria research.”

Dr Richard Pearson, co-author from the Wellcome Sanger Institute

One of MalariaGEN’s core principles is to provide clear attribution and recognition of all the groups that have contributed to a data resource. In this dataset, each sample is listed against the partner study that it belongs to, with a description of the scientific aims of the study and the local investigators that led the work.

“It has been a huge privilege to collaborate with our MalariaGEN partners around the world to build this data resource. We are proud to see these genomic data being used in publications by our colleagues in malaria-endemic studies and others in the malaria research community. We hope that the new features in this data release will make it accessible to an even wider audience, and our team is now hard at work to produce the next version.”

Professor Dominic Kwiatkowski, co-author from the Wellcome Sanger Institute and the Big Data Institute at the University of Oxford

“A quantitative assessment of how malaria parasites respond to public health interventions is key for a successful and sustainable elimination campaign. Over time, this openly available resource will facilitate research into the malaria parasite’s evolutionary processes, which will ultimately inform effective and sustainable malaria control and elimination strategies that will be key in ending this devastating disease.”

Professor Abdoulaye Djimde, co-author from the University of Science, Techniques and Technologies of Bamako, Mali

More information

*For more information on these figures and the state of malaria in general, see the World Health Organisation website:

For more information on MalariaGEN, visit:


MalariaGEN and multiple co-authors. (2021) An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples [version 1; peer review: awaiting peer review]. Wellcome Open Research. DOI: 10.12688/wellcomeopenres.16168.1


This work was supported by Wellcome and the MRC Centre for Genomics and Global Health which is jointly funded by the Medical Research Council and the Department for International Development. For full funding information, please see the publication.

Selected websites

  • The Big Data Institute, University of Oxford

    The Big Data Institute is located in the Li Ka Shing Centre for Health Informatics and Discovery at the University of Oxford. It is an interdisciplinary research centre that focuses on the analysis of large, complex data sets for research into the causes, consequences, prevention and treatment of disease. Research is conducted in areas such as genomics, population health, infectious disease surveillance and the development of new analytic methods. The Big Data Institute is supported by funding from the Medical Research Council, the UK Research Partnership Investment Fund, the National Institute for Health Research Oxford Biomedical Research Centre, and philanthropic donations from the Li Ka Shing and Robertson Foundations. Further details are available at

  • University of Science, Techniques and Technologies of Bamako, Mali 

    The Malaria Research and Training Center (MRTC) within the University of Science, Techniques and Technologies of Bamako, is a renowned African-led research institution which is divided into six research units, including the Genomics and Molecular Epidemiology Unit, B-cell Laboratory within Immunology Group, Cellular immunology laboratory within Immunology Group, Molecular Epidemiology and Drug Resistance Unit, Clinical Laboratory, Data Management and Analysis Group, and Diagnostic Laboratory. During the past 20 years, MRTC in collaboration with NIH, University of Maryland, EDCTP, Wellcome, African Academy of Sciences, WHO and others has built a state-of-the-art facility including parasite culture facilities, Insectaries, genomic data storage and Bioinformatics facilities. There are five established clinical trial sites for vaccine and seven for drug trials and epidemiological studies and numerous satellite field research sites.

  • The Wellcome Sanger Institute

    The Wellcome Sanger Institute is a world leading genomics research centre. We undertake large-scale research that forms the foundations of knowledge in biology and medicine. We are open and collaborative; our data, results, tools and technologies are shared across the globe to advance science. Our ambition is vast – we take on projects that are not possible anywhere else. We use the power of genome sequencing to understand and harness the information in DNA. Funded by Wellcome, we have the freedom and support to push the boundaries of genomics. Our findings are used to improve health and to understand life on Earth. Find out more at or follow us on Twitter, Facebook, LinkedIn and on our Blog.

  • About Wellcome

    Wellcome exists to improve health by helping great ideas to thrive. We support researchers, we take on big health challenges, we campaign for better science, and we help everyone get involved with science and health research. We are a politically and financially independent foundation.