Wellcome Sanger Institute, Genome Research Limited

New computational method unravels single-cell data from multiple people

Souporcell could assist personalised medicine and malaria research

A new computational method for assigning the donor in single cell RNA sequencing experiments provides an accurate way to unravel data from a mixture of people. The Souporcell method, created by Wellcome Sanger Institute researchers and their collaborators could help study how genetic variants in different people affect which genes are expressed during infection or response to drugs.

Published this week in Nature Methods, the software could increase efficiency of single-cell experiments assisting research into transplants, personalised medicine and malaria.

Single-cell RNA sequencing (RNAseq) can reveal exactly which genes are switched on in each individual cell, revealing cell types and what they do. Pooling multiple people’s cells into a single cell RNAseq experiment helps to identify how different genomes affect this gene expression. However it is essential to be able to separate the resulting data by individual, which can be very difficult.

The authors tested Souporcell* against three other computational methods using placental cells, pluripotent stem cell lines** and malaria parasites.

“Our method, called Souporcell, is able to separate mixtures of individuals’ cells in scRNAseq experiments without knowing each individual’s full genome sequence beforehand, unlike previous methods. One of the key features of the method is that it estimates the amount of background RNA from dead cells, which is often referred to as the soup. This then allows the removal of that source of noise and hence the name souporcell.”

Haynes Heaton, the first author from the Wellcome Sanger Institute

Being able to combine the cells into a single experiment increases the accuracy, enabling more information to be found, and also reduces the cost of these experiments.

“The exact genetic sequence of each person can affect their response to infections, or to drug treatments. The new method enables single cell expression data from multiple people to be analysed, to show links between genotype and phenotype, in diseases and in the presence of drugs. This will have implications for personalised medicine.”

Dr Martin Hemberg, a senior author from the Wellcome Sanger Institute

In addition, some samples inherently have a mix of cells with different genomes, including samples from transplant patients who have their original cells and cells from the donor, or populations of parasites, such as malaria, from an infected individual.

“This method is helping us understand malaria. People get infected with multiple strains of malaria at once, but we don’t know how these strains are competing with each other to reproduce. To even ask the question we have to be able to split out cells of different malaria strains, and Souporcell is enabling this.”

Dr Mara Lawniczak, a senior author from the Wellcome Sanger Institute

More information

Souporcell is freely available under an MIT open-source license at https://github.com/wheaton5/souporcell.

**HiPSci cell lines from the Human induced Pluripotent Stem cell initiative https://www.sanger.ac.uk/collaboration/hipsci


Haynes Heaton et al. (2020) Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nature Methods. DOI: s41592-020-0820-1


This work was supported by Wellcome, the Medical Research Council and other funders. Please see the paper for the full list of funders.