Wellcome Sanger Institute

Predicting and engineering biology in new research programme

The Wellcome Sanger Institute launches a new research programme that will combine large-scale genomic data generation with machine learning to predict the impacts of mutations and engineer biological systems.

Email newsletter

News and blog updates

Sign up

The Wellcome Sanger Institute today (5 October) launches the Generative and Synthetic Genomics Programme – an additional, sixth programme of research at the Institute.

In a world first, Generative and Synthetic Genomics will bring together computational and experimental scientists to understand and predict the effects of editing each and every one of the building blocks of DNA, and engineer biological systems.

The teams will generate genomic data on huge scales and design computational models that use machine learning and artificial intelligence (AI) to make predictions in molecular biology, such as predicting the impacts of mutations on disease.

Plus, researchers will develop the technologies to write and edit genomes at scale and speed.

They aim to lay the foundations for predictive and programmable molecular biology, and the routine synthesis and engineering of genomes.

Supported by additional funding from Wellcome, Generative and Synthetic Genomics will have a transformative impact on medicine, agriculture and biotechnology.

Genomics and molecular biology have enabled researchers to extensively describe biological systems and models, and generate an understanding of how they function and what happens to cause disease.

However, scientists still struggle to predict how biological systems respond to perturbations – mutations – and engineering biology remains difficult. It is not as simple to engineer cells or organisms as it is to engineer software or machinery.

This is because the fundamental question of how DNA sequences determine the properties and regulation of proteins and RNAs* remains practically unsolved.

As the Sanger Institute enters its 30th year, researchers are presented with an opportunity to solve these questions.

Revolutions in DNA sequencing, synthesis and editing technologies now enable millions of experiments to be performed in parallel. Plus, the revolution in machine learning and artificial intelligence (AI) allows highly predictive and generative models to be developed for complex tasks – with enough data.

Enabled by the Sanger Institute’s world-leading capabilities in large-scale genomic data generation and analysis, and bioinformatics expertise, scientists can now shift from using genomics to sequence and describe their subject of study, to predictable and programmable biology – using genomics, big data and AI to make predictions about the effects of individual and combinations of mutations on functional biology.

Using these accurate predictions, researchers aim to design the properties, activities, regulation and expression of proteins and RNAs from scratch. The goal of Generative and Synthetic Genomics is not only to accurately predict and engineer but also to understand how, mechanistically, this prediction and engineering works.

The initial focus of the programme will be the individual components of biological systems – the protein machines that build our bodies and how they are controlled. The goal is to achieve a level of understanding to make it much easier to engineer proteins as therapeutics and for clean biotechnology. This will also lay the foundations for the longer-term vision of generating models for engineering gene pathways, and entire cells and tissues for medical and biotechnological applications.

To achieve this, the teams aim to understand, predict and engineer the effects of editing every nucleotide – the building blocks of DNA – in every genome.

The researchers will use the Sanger Institute’s ability to generate genomic data on huge scales to produce the datasets and design the computational models needed to make predictions in molecular biology, such as the effects of mutations in disease and understanding different responses to treatments.

Finally, the teams will develop the technologies to write and edit genomes at scale and speed. This will make it much easier for scientists to understand how the genomes work and also allow them to engineer, in a considered and responsible manner, the genomes of simpler organisms such as yeast and bacteria as non-polluting factories to make useful products.

Generative and Synthetic Genomics will have a transformative impact on medicine, agriculture and biotechnology. It will empower the interpretation of human genome sequences and will accelerate the development of therapeutics. It will underpin the engineering of clean biological solutions to replace polluting industry, and it will facilitate the rapid engineering of cellsadapt agriculture to a rapidly changing climate.

The new programme, Generative and Synthetic Genomics will build upon the Sanger Institute’s unparalleled 30-year track record of ensuring that genomic science is open, shared and democratised. These new capabilities for engineering biology will also come with important responsibilities to consider and explore the ethical, legal and social implications. The Institute’s Policy Team has already carried out initial research with international stakeholders, to consider the ethical implications of creating synthetic genomes. Sanger Institute researchers working with the Policy Team will build on this work to proactively consider the implications of this new programme of work and develop processes for responsible governance and wider engagement.

“I am incredibly excited to see the launch of Generative and Synthetic Genomics. Biology has accelerated to a point where a PhD student today can perform more experiments on genes and proteins than the entire global research effort could a decade ago. Plus we can develop highly predictive models that use artificial intelligence. It will be the combination of these technologies that will enable us to solve the fundamental question of how genetic sequence determines the properties and regulation of proteins. To do this we require huge amounts of data, and the Sanger Institute’s capabilities of large-scale data generation and genomics expertise make it the natural place for us to undertake this ambitious research.

“We believe that the transformation of biology into a programmable engineering science will be the most important technological revolution of this century, and that Generative and Synthetic Genomics will open up unprecedented possibilities for industry, agriculture, the environment, and medicine.”

Professor Ben Lehner, Head of Generative and Synthetic Genomics at the Wellcome Sanger Institute

“As the Sanger Institute enters its 30th year, we are launching a tremendously exciting endeavour to bring together computational and experimental research groups to form the world’s first programme for Generative and Synthetic Genomics. We aim to attract talent from across the globe and build teams with diverse expertise to solve fundamental questions in Biology. In addition to catalysing our understanding of how biology works at a molecular level, Generative and Synthetic Genomics has huge translational potential and will propel a quantum leap in our ability to use genomic information in medical care. This new programme will also train and inspire a new generation of life science innovators and entrepreneurs to build new companies to take forward, in a responsible manner, the boundless possibilities afforded by new capabilities in engineering biology.”

Professor Matthew Hurles, Director of the Wellcome Sanger Institute

“We have made huge progress on research into proteins, genes, and the science of life itself. Yet we still know so little about how our genetics and variations in our DNA shape our fundamental biological systems. However, with the unprecedented acceleration of new technologies such as machine learning and AI, we are at an exciting crossroads, with the ambitious task of predicting – and even engineering – gene function now a genuine reality.

“We are delighted to support the Wellcome Sanger Institute’s bold Generative and Synthetic Genomics programme. Harnessing these new technologies and integrating them with world-leading genomics research matches Wellcome’s vision for discovery research, taking on the big questions that will advance our understanding of life, health, and wellbeing.”

Michael Dunn, Director of Discovery Research at Wellcome

More information

*Ribonucleic acid (RNA) is a molecule that is present in the majority of living organisms and viruses.


The establishment of Generative and Synthetic Genomics is supported by supplemental funding from Wellcome.