Comprehensive study finds mutations in non-coding genome are infrequent drivers of cancer

Findings suggest efforts to develop new cancer treatments should primarily focus on protein-coding genes

Comprehensive study finds mutations in non-coding genome are infrequent drivers of cancer

Lung cancer cellsAnne Weston, Francis Crick Institute

A clearer picture of how DNA changes lead to cancer has emerged, following the most comprehensive evaluation of non-coding driver mutations to date by researchers at the Wellcome Sanger Institute, the Broad Institute of MIT and Harvard, Massachusetts General Hospital (MGH), Aarhus University Hospital and their collaborators.

The study, published today (5 February) in Nature as part of a global Pan-Cancer Project*, discovered several new cancer drivers in non-coding genes. The overall conclusion, however, reaffirms that the vast majority of cancer drivers occur in protein-coding regions of the human genome. This knowledge will help to focus efforts on discovering new causes and treatments for cancer.

Also published today in Nature and related journals, are 22 further studies from the Pan-Cancer Project. The project represents an unprecedented international exploration of 2,600 cancer genomes, which significantly improves our fundamental understanding of cancer and zeros-in on mechanisms of cancer development.

Driver mutations are DNA changes that ‘drive’ cells down the path towards cancer. Depending on the type of cancer, anywhere from one to ten driver mutations are required for cancer to develop**.

Most large-scale genomic studies of cancer to date have focused on detecting driver mutations in protein-coding genes. As these coding sequences represent less than two per cent of the human genome, investigations into the remaining 98 per cent of the ‘non-coding’ genome*** have taken place in recent years. In 2013, driver mutations were discovered in the non-coding TERT gene across many cancer types, raising the possibility that there may be numerous non-coding driver mutations in the ‘dark matter’ of the genome.

This study is the most comprehensive evaluation of the extent of non-coding driver mutations in cancer to date, in terms of the number of methods employed, number of samples analysed, and the number of cancer, genome region and mutation types studied. Overall, 2,600 genomes of 38 different tumour types were analysed.

The team identified a number of new non-coding cancer-driving mutations, such as non-coding mutations in the 5’ untranslated region of the TP53 gene, which are associated with this gene being less strongly expressed, or ‘turned off’.

The results concluded, however, that mutations in the regulatory sequences surrounding cancer genes are relatively rare. Excluding mutations in the TERT gene, the number of non-coding driver mutations identified equated to around one (or fewer) in every 100 tumours. In comparison, protein-coding regions often harbour several driver mutations per tumour. Some non-coding drivers identified in previous studies were found to be the result of less accurate methodologies or the result of previously uncharacterised hyper-mutation processes.

“The fact that our results contrast so strongly with other studies is largely down to how rigorous our analysis has been. Despite using numerous methods, the largest dataset currently available and surveying a wide range of non-coding regions of the genome, we found very few genuine driver mutations outside protein-coding genes.”

Dr Federico Abascal, of the Wellcome Sanger Institute

“The non-coding driver mutations we identified, such as in the TP53 gene, add to the short list of non-coding driver mutations that already includes TERT, FOXA1 and a few other genes. By rigorously analysing the mechanisms that contribute to increased mutation rates, we were not only able to find new drivers but also raise doubts about previously reported ones that are affected by local mutational processes and artefacts uncovered in our study. We hope that our analysis will serve as the basis for future cancer genome studies.”

Dr Gad Getz, of the Broad Institute and MGH

This unexpected result has important implications for the treatment of cancer. While technological advancements and larger cohorts will undoubtedly lead to the discovery of more non-coding driver mutations, it is unlikely that the ratio of coding to non-coding drivers will change significantly. This implies that efforts to develop new cancer treatments should primarily focus on protein-coding genes.

“Overall, our study suggests that while increasingly large datasets will continue to yield new coding and non-coding driver mutations, the vast majority of cancer drivers occur in the two per cent of the genome that codes for proteins. To us, this was an unexpected and important result. For cancer patients, this means that the vast majority of clinically-relevant mutations in a cancer are likely to be found in protein-coding sequences, which will simplify efforts for the clinical use of genome sequencing in cancer.”

Dr Inigo Martincorena, of the Wellcome Sanger Institute

Notes to Editors

*The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG), known as the Pan-Cancer Project, is the largest and most comprehensive study of whole cancer genomes yet. The collaboration involving more than 1,300 scientists and clinicians from 37 countries, analysed more than 2,600 genomes of 38 different tumour types, and has created a huge resource of primary cancer genomes, available to researchers worldwide to advance cancer research. https://dcc.icgc.org/pcawg

Main findings from the Pan-Cancer project:

  • The cancer genome is finite and knowable, but enormously complicated. By combining sequencing of the whole cancer genome with a suite of analysis tools, we can characterise every genetic change found in a cancer, all the processes that have generated those mutations, and even the order of key events during a cancer’s life history.
  • We are close to cataloguing all of the biological pathways involved in cancer and having a fuller picture of their actions in the genome. At least one causal mutation was found in virtually all of the cancers analysed and the processes that generate mutations were found to be hugely diverse -- from changes in single DNA letters to the reorganization of whole chromosomes. Multiple novel regions of the genome controlling how genes switch on and off were identified as targets of cancer-causing mutations.
  • Through a new method of “carbon dating”, the Pan-Cancer Project discovered that we can identify mutations which occurred years, sometimes even decades, before the tumour appears. This opens, theoretically, a window of opportunity for early cancer detection.
  • Tumour types can be identified accurately according to the patterns of genetic changes seen throughout the genome, potentially aiding the diagnosis of a patient’s cancer where conventional clinical tests could not identify its type. Knowledge of the exact tumour type could also help tailor treatments.

For access to all the open tier data in the Pan-Cancer project, go to https://dcc.icgc.org/

**For more information on driver mutations in different types of cancer, see the Sanger Institute website https://www.sanger.ac.uk/news/view/1-10-mutations-are-needed-drive-cancer-scientists-find

***More information on protein-coding and non-coding genes is available at: https://www.yourgenome.org/facts/what-does-dna-do

Publication:

Esther Rheinbay, Morten Muhlig Nielsen and Federico Abascal et al. (2019). Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. DOI: https://doi.org/10.1038/s41586-020-1965-x

The Nature collection landing page with all PanCancer publications will go live when the papers publish: https://www.nature.com/collections/pcawg/

Funding:

This research was funded by GDAC, the Broad Institute of MIT and Harvard, Independent Research Fund Denmark, The Danish Cancer Society, National Institutes of Health and Wellcome.

Selected Websites
Genomics in the cloudSanger ScienceGenomics in the cloud
The huge, international Pan-Cancer project is the first large-scale use of distributed cloud computing in genomics. As genomics becomes a big data science, it is likely to be the first of many

Contact the Press Office

Emily Mobley, Media Manager

Tel +44 (0)1223 496 851

Dr Samantha Wynne, Media Officer

Tel +44 (0)1223 492 368

Dr Matthew Midgley, Media Officer

Tel +44 (0)1223 494 856

Wellcome Sanger Institute,
Hinxton,
Cambridgeshire,
CB10 1SA,
UK

Mobile +44 (0) 7748 379849

Recent News

Origins of immune system mapped, opening doors for new cancer immunotherapies
Cell atlas of human thymus could help engineer improved therapeutic T cells
Otter genome to help understand genetic legacy of pollution crisis and secure species’ future
Genome will unlock wealth of data stored in DNA archives in bid to understand response to environmental changes
Gut bacteria’s interactions with immune system mapped
Cell atlas could reveal why some gut diseases affect specific areas