|Back << | Index | About Help | Close ||
Recording Structural Variations in COSMIC
A. Format of COSMIC Stuctural Variant & Breakpoint Tables
The accurate description and annotation of structural variants can be complex. This is due to the different resolution that variants are reported from traditional cytogenetic coordinates down to the actual base pair positions. Furthermore, multiple rearrangements in a single area of the genome can make cataloguing and interpreting their effects challenging. In order to overcome these issues, information on COSMIC structural variants are summarised in two tables, the Structural Variant table and the Sample Breakpoint table. The Structural Variant table gives a summary list of the variants described for a given sample by interpreting the breakpoint information. The Sample Breakpoint table contains more details and summarises the Structural Variants along with the actual breakpoint information which was used to derive each variant. The breakpoint information can be thought as the raw data with minimum interpretation made of them. You can toggle between the two tables by using the buttons on the navigation bar above the table (Figure 1 & 2). Both tables can be exported as a html, excel, or tab-delimited file formats by clicking on the Export button.
The Structural Variant table gives summary information for each variant and has the following fields, (COMSIC) Mutation ID, Mutation Description and Annotation (Name) (Figure 1). The Mutation Description is a short textual description of the variant (e.g. tandem duplication, deletion, translocation) and a controlled Ontology of "Mutation Descriptions" are available below. The Annotation Name gives a detailed description of the variant and its location (e.g. chr11:g.36585230_76606619del, a deletion of roughly 40Mb on chromosome 11). Syntax is based on HGVS mutation nomenclature recommendations (http://www.hgvs.org/rec.html). A link is provided to more details for each variant on the right-hand side of this table.
The Breakpoint Summary table describes the one or more breakpoints which make up a structural variant (Figure 2). A breakpoint is defined as a region or point where the sample sequence has altered from the reference sequence. Minimum interpretation is made of this data. One variant event can consist of one or multiple breakpoints. Each structural variant is made up of a summary row, (the row is light blue colour to denote this) and gives the start and stop position of the variant along with the Mutation Description. Underneath each summary record is the list of breakpoints which make up the variant (have no Mutation ID and white background). Below is a summary of the fields in this table.
B. Mutation Description Ontology
In order to help with the interpretation of structural variants in COSMIC, each variant is assigned a Mutation Description and Annotation Name. When the assignment takes place there is an interpretation of the data and the currently known breakpoints in the region. If not all breakpoints have been characterised then the mutation may not be fully characterised. Below is a description of the Mutation Description Ontology with associated Annotation Names.
B.1 Tandem Duplication
A Tandem Duplication is characterised by a duplication of a segment of the genome which is adjacent to the original sequence. The Annotation Name takes the following format, chr2:g.124629221_125036287dup where chr2: denotes the chromosome involved, g. genomic coordinates used, 124629221_125036287 start and end of the variant, dup indicates tandem duplication.
The breakpoint information characterising the variant is in the breakpoint table. For a tandem duplication the breakpoint is characterised by upstream sequence mapping downstream to where it should map on the genome. So in this case position 125036287 is mapping before 124629221 which is the signature of a tandem duplication.
The Annotation Name takes the following format chr11:g.36585231_76606618del where chr11: denotes the chromosome involved g. for genomic coordinates, 36585231 for the deletion start point, 76606618 for deletion end point and del indicates a deletion event.
The breakpoint information characterising the variant is in the breakpoint table. For a deletion the breakpoint is characterised by 2 distant points in the genome being next to each other. In this example position 36585230 is next to 76606619 in the genome. The region between these points is assumed to be deleted. The coordinates of the deletion are +1 and -1 as the breakpoint gives the last observed nucleotides, so the range of the deletion is from 36585231 to 76606618.
An inversion indicates the reversal of a piece of genome sequence. The Annotation Name takes the following format chr1:g.115340245_115346449inv where chr1: denotes the chromosome involved g. genomic coordinates used, 115340245_115346449 the range of the inversion, and inv indicates an inversion.
The breakpoint information characterising the variant is in the breakpoint table. Two breakpoints can be detected for this mutation although only one is required to fully characterise the mutation.
A Translocation is characterised by the fusion of 2 chromosomes. The Annotation Name takes the following format chr8:g.63669858_chr14:22298219trans[?] where chr8:g.63669858 denotes the breakpoint on one chromosome, and chr14:22298219 on the other chromosome, trans indicates a translocation event, [?] indicates if there is any change in copy number associated with the mutation. [?] indicates not known. The strand information is often given in the annotation name to describe which end of each chromosome actually forms the translocation (see section E).
The breakpoint information characterising the variant is in the breakpoint table.
B.5 Complex Substitution
A Complex Substitution is defined as a region which been deleted and replaced with another region of the genome. The Annotation Name takes the following format chr8:g.55512043_63659930>chr13:22017510_22017585 where chr8: denotes the chromosome involved g. for genomic coordinates, 55512043_63659930 indicates the region deleted, > represents replaced with, chr13:22017510_22017585 indicates the region inserted.
B.6 Complex Amplicon
A Complex Amplicon is a region of a genome which has been amplified and undergone multiple rearrangements. Due to the complexity of these regions the amplicon breakpoints are listed but no interpretation is made of the data.
The Annotation Name gives the range of the amplicon where the multiple rearrangements are occurring. An example is chr8:g.(61857345-?_129022677+?)[(10-40)] where chr8: denotes the chromosome involved g. for genomic coordinates, 61857345-?_129022677+? indicates the range of the amplicon with -? and +? indicating the precise position of the start (-?) and end (+?) are not currently known, [(10-40)] indicates the approximate copy number of this region, between 10 and 40 copies in this case.
B.7 Amplicon Breakpoint(s)
An amplicon breakpoint is defined as a breakpoint within an amplified region with unknown boundaries so accurate interpretation of the mutation cannot be made. In these cases the breakpoint is simply described. The Annotation Name takes the following format chr14:g.28412748_chr14:28419493bkpt where chr14: denotes the chromosome involved g. for genomic coordinates, 28412748 is the end of the sequence to the left of the breakpoint and 28419493 is the sequence coordinate to the right of the breakpoint, bkpt indicates a breakpoint, and  the approximate copy number in the area.
C. Sequence Fragment
Structural variants can have additional sequence from elsewhere in the genome. The example below is a translocation with 2 additional fragments from chromosome 12, one is 21 base pairs and the other 335 base pairs.
D. Copy Number Information
Approximate Copy Number data is given when non-diploid and is available. The mutation description is prefixed with "amplified" or "amplicon" if there is an variation in copy number. For example chr8:g.63669858_chr14:22298219trans[11-26] denotes a translocation with a copy number increase of approximately 11-26. A value of  would indicate diploid normality.
E. Strand Information
In certain situations it is important to provide strand information to describe a variant. The HGVS "o" indentifier denoting opposite strand is used to denote this. Diagram below shows the use of this operator in describing translocations.
Any More Questions?