PICNIC (Predicting Integral Copy Numbers In Cancer) is an algorithm designed to identify copy number segments and genotypes in cancer using a SNP6 'cel' file as input.

All PICNIC code has been made available under a BSD license and shall continue to be developed under this agreement. This code requires Matlab. To use the algorithm without Matlab, use picnic_gui_full. To just normalize a .CEL file, use picnic_gui_short.

PICNIC has now been updated to cater for primary tissues that contain normal contamination in addition to cell lines.

[Genome Research Limited]


Download Description
picnic_gui_full README file
picnic_gui_short README file
PICNIC download Public domain ftp repository
picnic_gui_full download Public domain ftp repository
picnic_gui_short download Public domain ftp repository
figViewer README file
figViewer download (unix) Public domain ftp repository
figViewer download (windows) Public domain ftp repository
PICNIC_Primaries README file
PICNIC_Primaries Matlab Version (ftp download)
PICNIC_Primaries Matlab Version (ftp download)

CGP Software License

Copyright © 2006 Genome Research Ltd.
Author: Cancer Genome Project, cgpit@sanger.ac.uk

This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License.

Any redistribution or derivation in whole or in part including any substantial portion of this code must include this copyright and permission notice.

Algorithm description

This array contains over 906,600 SNPs together with 946,000 copy number probes, interrogating over 1.85 million loci in a single experiment. For each SNP on the array there are six/eight features, three or four features each for allele A and B. The features for each allele are technical replicates. The non-polymorphic copy number probes are designed to known copy number variations (202,000) with the remainder (744,000) being evenly distributed across the genome. These loci are represented by a single feature. A more detailed description of the array design can be obtained from the Affymetrix website.

An algorithm has been written specifically for use with the Affymetrix SNP6 data (PICNIC - Predicting Integral Copy Numbers In Cancer). This algorithm provides a more refined analysis than has previously been applied to the Affymetrix 10K data. This includes improved normalisation of the data together with determination of underlying copy number for each segment by genome wide analysis of allele ratio and signal strength data. The data is subsequently rescaled and plotted onto its predicted underlying integer value and segmentation applied (it should be noted that rescaling the raw data to the underlying absolute copy number can affect the spread of the data points).

Analysis of the data in this way also allows for assignment of a genotype to each SNP. Because such genotypes are based on the ratio for each allele they can be more complex than the traditional AA, BB, AB assignment; potentially including such genotypes as AAB etc. Regions of loss of heterozygosity (LOH) can also be determined.

Three plots are available for SNP6 data from the CGH Viewer webpage :-

  • Absolute copy number: This plot shows the normalised data (grey dots) for each genomic locus on the array together with segmentation information. The normalised data is rescaled to the underlying copy number with dark blue lines indicating total copy number for each genomic region and light blue giving the predicted copy number of the minor allele. Minor allele values of zero are indicative of loss of heterozygosity (LOH).
  • Probability: This plot shows the probability of a change in state for copy number, heterozygosity or both.
  • Genotype intensity: This plot shows the ratio of the two allelic intensities for SNPs on the array. Equal heterozygote's give a ratio value of 0.5, while homozygous calls give values of ~0.8 (AA) and ~0.2 (BB). Skewed allele ratios can result in up to four bands on the genotype intensity plot. The data is again segmented with black lines indicating regions of heterozygosity and red lines indicating regions of homozygosity (loss of heterozygosity, LOH).

For example, the following plot represents Chromosome 9 of sample CMK. The information for four didactic segments labeled A, B, C and D is described below.

[Genome Research Limited]


Segment A: 0 - 5 Mb, Total copy number (Dark blue) 4, Minor copy number (Light blue) 2. That is, each parental allele has been duplicated. The state change probability plot indicates the end of the segment. There are three black lines in the genotype intensity plot. SNPs with points near these lines have genotypes AAAA, AABB and BBBB, going down the plot respectively.

Segment B: 5 - 21 Mb, Total copy number two, Minor copy number 0. That is, one parental allele has been lost (LOH) and the other has been duplicated. The genotype intensities have two lines, corresponding to genotypes AA or BB.

Segment C: 21 - 23 Mb, Total copy number 0, Minor copy number 0. That is, both parental alleles have been lost resulting in a homozygous deletion. The genotype intensity has a single line at 0.5, resulting from equal signal intensity from both alleles due to background hybridisation.

Segment D: 24 - 27 Mb, Total copy number 6, Minor copy number 2. That is, one parental allele has been copied to give two copies, the other duplicated to give four copies, with a total copy number of six. The genotype intensities have four lines, corresponding to genotypes AAAAAA, AAAABB, AABBBB or BBBBBB.

* quick link - http://q.sanger.ac.uk/689qj5zn