Largest study of CRISPR-Cas9 mutations creates prediction tool for gene editing

Prediction resource could make CRISPR-Cas9 editing more reliable

Email newsletter

News and blog updates

Sign up

The largest study of CRISPR action to date has developed a method to predict the exact mutations CRISPR-Cas9 gene editing can introduce to a cell. Researchers at the Wellcome Sanger Institute edited 40,000 different pieces of DNA and analysed a thousand million resulting DNA sequences to reveal the effects of the gene editing and develop a machine learning predictive tool of the outcomes. This will assist researchers who are using CRISPR-Cas9 to research disease mechanisms and drug targets.

Reported today in Nature Biotechnology (27th November), the new resource will enable scientists to predict the best sequences to target to make CRISPR-Cas9 gene editing more reliable, and therefore cheaper and more efficient.

CRISPR-Cas9* is a gene editing technology that enables researchers to cut DNA at any position in the genome, to create mutations and switch off specific genes. This vital technology is used by scientists worldwide to study which genes are important for various conditions, from cancer to rare diseases. It is also now being trialled therapeutically to correct harmful mutations in people’s genes.

FORECasT (Favoured Outcomes of Repair Events at Cas9 Targets) – a computational prediction of editing outcomes resulting from Cas9-induced double strand breaks designed by Sanger researchers and freely available online at:
A specific guide RNA binds to an exact sequence of target DNA, guiding the Cas9 ‘scissors’ to cut the DNA at the right place. However, it is difficult to predict exactly what the final mutations will be, as further changes often take place when the cell repairs the break, rejoining the two cut ends of the DNA.

To study this, the researchers created over 40,000 pairs of different target DNA and guide RNA, and carried out CRISPR-Cas9 gene editing. By deep sequencing of each pair in different cells, they were able to analyse in detail how the DNA was cut and rejoined. They found that the repair depended on the exact sequence of DNA and guide and discovered that it was reproducible within the same sequence.

The researchers then used the huge quantity of sequence data to create a machine learning computational tool, which created general rules to determine the outcome of the repair. This programme – called FORECasT – enabled them to predict the repaired sequence, using the targeted DNA sequence alone.

“We have carried out the largest, most comprehensive study on CRISPR-Cas9 action to date, and analysed more than a thousand million DNA sequences to allow us to study the process. We demonstrated that specific target sequences were repaired by the cell in the same way, proving that the action of the cell mechanisms is reproducible.”

Dr Luca Crepaldi Joint first author on the study from the Wellcome Sanger Institute

“The discovery of reproducible DNA repair after CRISPR-Cas9 editing, combined with the vast amount of sequence data we generated, meant that we could create a predictive tool using machine learning methods. Our resource can predict the exact mutations resulting from CRISPR-Cas9 gene editing, just from the sequence of the target DNA. It will save time and resources for future CRISPR-Cas9 applications, and is openly available for use by all researchers using gene editing to study health and disease.”

Dr Felicity Allen Joint first author from the Wellcome Sanger Institute

“CRISPR-Cas9 is an extremely important system for introducing mutations into DNA for research, and prospective therapeutic purposes. Our research allows scientists to understand its workings much better, and our transformational method enables people to predict the effects of each CRISPR-Cas9 edit in a cell. This allows better design of editing experiments, and may lead to future therapeutic applications.”

Dr Leopold Parts Senior author on the paper from the Wellcome Sanger Institute

More information


Felicity Allen & Luca Crepaldi et al. (2018) Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nature Biotechnology. DOI: 10.1038/nbt.4317

*What is Crispr-Cas9?

CRISPR-Cas9 is a unique technology that enables geneticists and medical researchers to edit parts of the genome by removing, adding or altering sections of the DNA sequence. It is currently the simplest, most versatile and precise method of genetic manipulation

For more information see:

Prediction tool:

FORECasT – Favoured Outcomes of Repair Events at Cas9 Targets.


This work was supported by Wellcome, the Estonian Research Council , a Royal Commission, Cancer Research UK, Marie Curie funding and AstraZeneca.

Selected websites

  • Wellcome Sanger Institute

    The Wellcome Sanger Institute is one of the world’s leading genome centres. Through its ability to conduct research at scale, it is able to engage in bold and long-term exploratory projects that are designed to influence and empower medical science globally. Institute research findings, generated through its own research programmes and through its leading role in international consortia, are being used to develop new diagnostics and treatments for human disease. To celebrate its 25th year in 2018, the Institute is sequencing 25 new genomes of species in the UK. Find out more at or follow @sangerinstitute on Twitter, Facebook and LinkedIn.

  • Wellcome

    Wellcome exists to improve health for everyone by helping great ideas to thrive. We’re a global charitable foundation, both politically and financially independent. We support scientists and researchers, take on big problems, fuel imaginations and spark debate.