What is COSMIC?
All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes. COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
Some key features of COSMIC are:
- Contains information on publications, samples and mutations. Includes samples which have been found to be negative for mutations during screening therefore enabling frequency data to be calculated for mutations in different genes in different cancer types.
- Samples entered include benign neoplasms and other benign proliferations, in situ and invasive tumours, recurrences, metastases and cancer cell lines.
The mutation data and associated information is extracted from the primary literature and entered into the COSMIC database. In order to provide a consistent view of the data a histology and tissue ontology has been created and all mutations are mapped to a single version of each gene. The data can be queried by tissue, histology or gene and displayed as a graph, as a table or exported in various formats.
How does COSMIC work?
We have assembled a list of genes that are somatically mutated in human cancer (Futreal et al, 2004). From this list we are selecting genes for entry in to COSMIC with an emphasis on genes for which there are no existing databases.
All of the mutations in COSMIC are mapped to a single version of each gene sequence. The gene sequences are held in COSMIC and available in the Download section below.
Selecting papers from the literature
To identify papers reporting somatic mutations PubMed is broadly searched for papers containing relevant mutation data (example search: (ras OR genes, ras) AND human AND mutation). Those identified from their abstracts to include somatic mutation information relating to cancer or pre-cancerous conditions are then selected for curating. After examination of the information in the full text of the paper, the sample and mutation data are extracted. Any papers containing incomplete data (e.g. mutations that are reported but not fully described) or data of insufficient quality (e.g. errors identified in the data) are not fully curated but are added to a list of "additional references containing somatic mutation information".
A central aim of COSMIC is to provide somatic mutation frequencies. These are available in the main display windows. However, it is important to understand how they are calculated and possible limitations of the data.
Has the sample been screened before?
There are examples where the same data is reported twice, perhaps in a follow-up study with reference to further data or as a positive control, for example using cell lines with known mutations. Where possible we have noted sample names and within papers have removed any redundancy. However between papers it is not possible to confirm two samples with the same name are indeed the same sample. We have therefore included both samples and both results in COSMIC. If you want to review this information the sample name, mutation and paper reference are displayed in the Mutation Details view.
What mutation detection method was employed?
Mutation screening methods differ in their sensitivity and the sensitivity of a particular method can vary from laboratory to laboratory. Most methods identify all classes of small intragenic mutation (base substitutions and small insertions/deletions). However, the protein truncation test will not detect mutations that cause missense amino acid substitutions.
Was the whole gene screened?
Some genes are characterised by mutation hot spots, for example BRAF, RAS and TP53. These genes are often screened for somatic mutations only in the region most likely to contain mutations. This strategy will obviously miss mutations located elsewhere in the gene and hence will provide a distorted view of the distribution of mutations in the gene and perhaps underestimate the frequency of mutations.
Are all the mutations real?
For many putative somatic mutations that have been reported in the published literature, definitive evidence that they are somatically acquired (through demonstration of their absence in normal DNA from the same individual as the tumour) is not available. Therefore, occasional germline variants may have inadvertently been represented in publications as somatic mutations and entered in the database. In addition, simple laboratory errors which result in an incorrect normal DNA sample (ie from a different individual) being analysed as a control for a particular tumour sample may provide apparently persuasive, but misleading, evidence of somatic origin. Finally, DNA amplification methods have an intrinsic error rate, and these errors may subsequently be interpreted as somatic mutations. There is some evidence that this may be a particular problem in analyses of archival formalin-fixed, paraffin embedded material.
The classification of tumour types and subtypes with somatic mutations in the published literature is extremely variable. Classification systems and terminologies differ between reports and indeed may have changed over time. Rather than simply entering a neoplasm using the term employed in the published report, COSMIC uses its own internal classification system to provide tissue and histology consistency within the database and reduce redundancy. The tissue and histology information in the reviewed papers is translated using the COSMIC classification system before entry into the database. It is possible that in some instances we have misunderstood terminology and hence misclassified mutations. Moreover, some users may not favour our classification. In general, however, we have aimed to retain as much useful information as possible, whilst providing a relatively simple classification with generally understood terminology.
The COSMIC classification system is available as a tab delimited text or Excel file in the Download section below. Every sample is defined by both tissue and histology. The example below shows how a paper definition would be translated into a COSMIC definition.
The COSMIC classification system was created in close collaboration with Adrienne Flanagan and Ahmet Dogan from the Royal Free and University College Medical School.
Our most recent publication: COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. (Forbes et al. 2011)
COSMIC detailed description : The Catalogue of Somatic Mutations in Cancer (COSMIC). (Forbes et al. 2008)
|Excel Spreadsheet:||Tab delimited text file:|
Mutation information and fasta files for each of the genes in COSMIC can be downloaded from our ftp site.