500,000 whole human genomes will be a game-changer for research into human diseases

Following on from a successful pilot at the Sanger Institute, we are leading a project to sequence the genomes of all UK Biobank volunteers to power the next wave of genetic and health research

500,000 whole human genomes will be a game-changer for research into human diseases


In a major advance for public health and for the UK’s global leadership in genomics, a £200m project involving the government, charity, researchers and four leading pharmaceutical companies, was announced today (11 September). The Whole Genome Sequencing (WGS) project will become a game-changing resource accessible to the global scientific community to understand, diagnose, treat and prevent life-changing diseases such as cancer and dementia.

The genetic code of all 500,000 UK Biobank* volunteer participants will be sequenced by researchers at the Wellcome Sanger Institute in the UK and deCODE genetics in Iceland, using the Illumina sequencing platform.

This project is the single most ambitious sequencing programme in the world undertaken by a public-private partnership. Supported for over 16 years with public funding and charity investment, UK Biobank has already created a uniquely rich data resource that has dramatically increased the understanding of the factors that contribute to the development of disease.

Funding for the project comes from the government’s research and innovation agency, UK Research and Innovation (UKRI) with £50m through the Industrial Strategy Challenge Fund, £50m from Wellcome and a further £100m in total from Amgen, AstraZeneca, GlaxoSmithKline (GSK) and Johnson & Johnson**.

The total amount of genetic data generated will be vast, roughly equivalent to around 5000 billion pages of text and will require unique technical expertise to store and analyse. Data will be linked to the other detailed clinical and lifestyle data for each volunteer in the UK Biobank programme. The end result will be an encyclopaedia of genetic information, linked with comprehensive clinical characterisation, appropriately de-identified and protected, that will help to provide a unique insight into why some people develop particular diseases and others do not.

This project follows the successful initiation of a pilot programme at the Sanger Institute, known as the Vanguard Project, which involves sequencing the genomes of 10 per cent – 50,000 individuals – of the UK Biobank participants. Funding for this pilot programme was led by the Medical Research Council (MRC), through the Industrial Strategy Challenge Fund.

Building on the work of the pilot programme, the plan is to complete sequencing of the remaining 450,000 participants in two tranches. After both phases industry partners will have preferential access to the data for nine months. At the end of this period the requests to access the whole genome sequence data will be managed in the same way as all requests to work with datasets held by UK Biobank and subject to a Material Transfer Agreement (MTA) with the approval of the UK Biobank Access Sub-committee.

The first tranche of data is expected to comprise of up to 125,000 whole genome sequences, anticipated to be accessible to all in Spring 2021, and at the same time the 50,000 Vanguard sequences will be available. 
The expectation is that sequence data for the entire cohort of UK Biobank participants would become generally accessible by early 2023.

“We are thrilled to be contributing to the UK Biobank project by sequencing 225,000 whole human genomes. Together with deCODE in Iceland, we will read and assemble the whole genome sequences of 500,000 volunteers, and this data will transform the way we carry out research into human health and disease. A dataset of this magnitude will be incredibly powerful for understanding the genetic architecture that contributes to disease and we are one of only a few institutes in the world with the technical and scientific expertise to undertake a project of this scale.”

Dr Cordelia Langford, Director of Scientific Operations at the Wellcome Sanger Institute 

 “Genomics is transforming our understanding of human health and disease. The UK Biobank Whole Genome Sequencing project is an exemplar of science at scale and we are proud to be a part of this initiative. The rich encyclopaedia of genomic data that will become available as a result of this ambitious effort combined with the incredibly detailed information already collated in UK Biobank will accelerate discoveries in diagnosing and ultimately treating diseases such as cardiovascular disease and cancer.”

Professor Sir Mike Stratton, Director of the Wellcome Sanger Institute

“This exciting new project will help scientists and doctors develop new ways of preventing, diagnosing and treating a range of life changing diseases such as cancer and dementia. By sequencing the genomes of the UK Biobank participants, the research community will have an unprecedented resource to gain new insights into human disease. This work would not be possible without the generous support of the 500,000 participants of the UK Biobank who, without any direct benefit to themselves, have allowed their lives to be studied through blood tests, body scans and information from their medical records all in the hope that it will benefit others.”

Sara Marshall, Head of Clinical Research and Physiological Sciences at Wellcome

Notes to Editors

* The 500,000 participants to UK Biobank project were recruited between 2006 and 2010 and have consented to their medical records being linked to a range of physical measurements and biological samples collected at the recruitment. Participants were recruited between the ages of 40 and 69 and the focus of the project is to investigate the factors that lead to a range late-onset conditions. The scale of the project allows the interplay of genetic and environmental factors to be evaluated and the prospective nature of the study means that it will allow the identification of early indicators of disease prior to clinical diagnosis.

** Contract entered by Janssen Biotech Inc., one of the Pharmaceutical Companies of Johnson & Johnson; collaboration facilitated by the Johnson & Johnson EMEA Innovation center in London, UK

Further reading:

World's largest genetics project to tackle deadly diseases launches - UK Government Department for Business, Energy and Industrial Strategy press release

World-leading genomics project to give insights into health and disease -  UKRI media narrative

Selected Websites
Powering discovery in blood disorders and depressionSanger SciencePowering discovery in blood disorders and depression
The wealth of information available to researchers through UK Biobank is powering studies into human health and disease

Sanger’s super-sized sequencing scales new heightsSanger ScienceSanger’s super-sized sequencing scales new heights
We’re celebrating: we’ve just read the same amount of DNA in one year as we achieved in the previous 25 years combined. This dizzying speed offers unprecedented possibilities to unlock …

What if medicine could be tailored ‘just for you’?Sanger ScienceWhat if medicine could be tailored ‘just for you’?
The information in our genomes is helping doctors diagnose and treat disease. But how far can personalised medicine go?

Contact the Press Office

Emily Mobley, Media Manager

Tel +44 (0)1223 496 851

Dr Samantha Wynne, Media Officer

Tel +44 (0)1223 492 368

Dr Matthew Midgley, Media Officer

Tel +44 (0)1223 494 856

Wellcome Sanger Institute,
CB10 1SA,

Mobile +44 (0) 7748 379849

Recent News

Origins of immune system mapped, opening doors for new cancer immunotherapies
Cell atlas of human thymus could help engineer improved therapeutic T cells
Otter genome to help understand genetic legacy of pollution crisis and secure species’ future
Genome will unlock wealth of data stored in DNA archives in bid to understand response to environmental changes
Gut bacteria’s interactions with immune system mapped
Cell atlas could reveal why some gut diseases affect specific areas