Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
Genomics & Genetics
  • Overview
  • CGP

  • COSMIC
  • CGP
  • COSMIC
  • Disclaimer
  • Team
  • Help

Additional Information


What is COSMIC?

All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes. COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.

Some key features of COSMIC are:

  • Contains information on publications, samples and mutations. Includes samples which have been found to be negative for mutations during screening therefore enabling frequency data to be calculated for mutations in different genes in different cancer types.
  • Samples entered include benign neoplasms and other benign proliferations, in situ and invasive tumours, recurrences, metastases and cancer cell lines.

The mutation data and associated information is extracted from the primary literature and entered into the COSMIC database. In order to provide a consistent view of the data a histology and tissue ontology has been created and all mutations are mapped to a single version of each gene. The data can be queried by tissue, histology or gene and displayed as a graph, as a table or exported in various formats.


How does COSMIC work?

Gene selection

We have assembled a list of genes that are somatically mutated in human cancer (Futreal et al, 2004). From this list we are selecting genes for entry in to COSMIC with an emphasis on genes for which there are no existing databases.


Gene sequences

All of the mutations in COSMIC are mapped to a single version of each gene sequence. The gene sequences are held in COSMIC and available in the Download section below.


Selecting papers from the literature

To identify papers reporting somatic mutations PubMed is broadly searched for papers containing relevant mutation data (example search: (ras OR genes, ras) AND human AND mutation). Those identified from their abstracts to include somatic mutation information relating to cancer or pre-cancerous conditions are then selected for curating. After examination of the information in the full text of the paper, the sample and mutation data are extracted. Any papers containing incomplete data (e.g. mutations that are reported but not fully described) or data of insufficient quality (e.g. errors identified in the data) are not fully curated but are added to a list of "additional references containing somatic mutation information".


Mutation frequency

A central aim of COSMIC is to provide somatic mutation frequencies. These are available in the main display windows. However, it is important to understand how they are calculated and possible limitations of the data.


Has the sample been screened before?
There are examples where the same data is reported twice, perhaps in a follow-up study with reference to further data or as a positive control, for example using cell lines with known mutations. Where possible we have noted sample names and within papers have removed any redundancy. However between papers it is not possible to confirm two samples with the same name are indeed the same sample. We have therefore included both samples and both results in COSMIC. If you want to review this information the sample name, mutation and paper reference are displayed in the Mutation Details view.
 
What mutation detection method was employed?
Mutation screening methods differ in their sensitivity and the sensitivity of a particular method can vary from laboratory to laboratory. Most methods identify all classes of small intragenic mutation (base substitutions and small insertions/deletions). However, the protein truncation test will not detect mutations that cause missense amino acid substitutions.
 
Was the whole gene screened?
Some genes are characterised by mutation hot spots, for example BRAF, RAS and TP53. These genes are often screened for somatic mutations only in the region most likely to contain mutations. This strategy will obviously miss mutations located elsewhere in the gene and hence will provide a distorted view of the distribution of mutations in the gene and perhaps underestimate the frequency of mutations.
 
Are all the mutations real?
For many putative somatic mutations that have been reported in the published literature, definitive evidence that they are somatically acquired (through demonstration of their absence in normal DNA from the same individual as the tumour) is not available. Therefore, occasional germline variants may have inadvertently been represented in publications as somatic mutations and entered in the database. In addition, simple laboratory errors which result in an incorrect normal DNA sample (ie from a different individual) being analysed as a control for a particular tumour sample may provide apparently persuasive, but misleading, evidence of somatic origin. Finally, DNA amplification methods have an intrinsic error rate, and these errors may subsequently be interpreted as somatic mutations. There is some evidence that this may be a particular problem in analyses of archival formalin-fixed, paraffin embedded material.
 
 

Classification system

The classification of tumour types and subtypes with somatic mutations in the published literature is extremely variable. Classification systems and terminologies differ between reports and indeed may have changed over time. Rather than simply entering a neoplasm using the term employed in the published report, COSMIC uses its own internal classification system to provide tissue and histology consistency within the database and reduce redundancy. The tissue and histology information in the reviewed papers is translated using the COSMIC classification system before entry into the database. It is possible that in some instances we have misunderstood terminology and hence misclassified mutations. Moreover, some users may not favour our classification. In general, however, we have aimed to retain as much useful information as possible, whilst providing a relatively simple classification with generally understood terminology.

The COSMIC classification system is available as a tab delimited text or Excel file in the Download section below. Every sample is defined by both tissue and histology. The example below shows how a paper definition would be translated into a COSMIC definition.

  Paper Definition COSMIC Definition
Site primary colon large intestine
Site subtype 1 descending colon
Site subtype 2 NS descending
Histology carcinoma carcinoma
Histology subtype 1 polypoid type adenocarcinoma
Histology subtype 2 with adenoma NS
Histology subtype 3 NS NS

The COSMIC classification system was created in close collaboration with Adrienne Flanagan and Ahmet Dogan from the Royal Free and University College Medical School.


Cite us

Our most recent publication: COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. (Forbes et al. 2011)
COSMIC detailed description : The Catalogue of Somatic Mutations in Cancer (COSMIC). (Forbes et al. 2008)



Downloads


Classification Information
Excel Spreadsheet:           Tab delimited text file:

FTP site

Mutation information and fasta files for each of the genes in COSMIC can be downloaded from our ftp site.

ftp://ftp.sanger.ac.uk/pub/CGP/cosmic

Information Projects Other Services
Sanger Home
Sitemap
Site Search
Information
Careers
Press
News
Seminars
Workshops
Publications
Staff Theses
Travel Directions
Research Teams
Research Faculty
Personnel Search
Human Genetics
Model Organism Genetics
Pathogen Genetics
Bioinformatics
Sequencing
Library
Helpdesk
Webmail
VPN Access
Sign In
SSO Pass. Reset

webmaster@sanger.ac.uk

Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

Last Modified Tue Jan 29 09:48:05 2013

Genome Research Limited is a charity registered in England with number 1021457

Help | Contact us | Legal | Cookies policy | Data sharing