Global CNV assessment (Redon et al, 2006)

Analysis of copy number variation in the HapMap samples using array-based comparative genome hybridization with a genome-wide Whole Genome TilePath (WGTP) array consisting of ~27000 large-insert clones.

[Matt Hurles, Genome Research Limited]

We have produced a dataset of array-based comparative genome hybridization from EBV-transformed lymphoblastoid cell lines from all 270 HapMap individuals used in the phase I and phase II of the project (populations: CEU, CHB, JPT and YRI).

Our groups aim to perform various types of global analysis in this dataset such as:

  • generating a genome-wide map of copy number variation
  • mapping the genomic-wide CNV map onto functional annotation of the genome
  • associations to SNP and haplotype variation
  • associations to gene expression variation
  • quantify population differentiation for copy number variation

The data were generated using the WGTP array with dye-swap for each individual and using a single male reference: HapMap individual NA10851.

Browse data

Browse Data by Individual

Individual chromosome genome assembly browser

Individual chromosome genome assembly browser

How to browse a sample

Choose the sample, the chromosome, the genome assembly and the browser that you would like to use, then click on "Browse".

This will redirect you to the first 1Mb of the chosen chromosome.

UCSC

The UCSC browser will show up to four tracks:

  • The clones on the WGTP array (Track "Sanger_WGTP_clones")
  • The log2ratio of copy number at each clone (Track "DyeSwap Ratios for ...")
  • The clones called as losses relative to the reference DNA (Track "Loss calls ...")
  • The clones called as gains relative to the reference DNA (Track "Gain calls ...")
  • The distribution of log2ratios across all 270 HapMap samples can be visualised for each clone on the array by clicking on the desired clone in the track "Sanger_WGTP_Clones" to show the details page for that clone and then clicking on the link entitled "Outside Link".

For help about the UCSC browser use, please visit UCSC help

Ensembl

Choosing to display the CNVs for an individual in the Ensembl browser will show four tracks. These will be :

  • The clones on the WGTP array (Track "Sanger_WGTP_clones" )
  • A histogram track of the probe/clone intensity log2ratios.
  • The clones called as losses relative to the reference DNA (Track "cnvs_loss_NAxxxxx")
  • The clones called as gains relative to the reference DNA (Track "cnvs_gain_NAxxxxxx")
  • By clicking on any of the clone features in the Sanger_WGTP_clones you can follow a link (click 'details') to view a distribution of the log2ratio scores for each clone across the whole dataset of 270 HapMap individuals.

Note that due to software issues with the Ensembl website, when viewing CNVs on the NCBI35 assembly the histogram track of log2ratios will appear as a track of vertical bars. When you mouse over these the log2ratio will be displayed as a tool tip. Ideally, use the NCBI36 assembly display as it is much more illustrative.

For help about the Ensembl browser use, please visit Ensembl help

Browse Copy Number Variable regions identified within the 270 HapMap samples

Hapmap CNVs genome assembly browser

Hapmap CNVs genome assembly browser

  • WGTP_CNVs - CNV identified using the WGTP array among all 270 HapMap samples
  • 500KEA_CNVs - CNV identified using the Affymetrix 500k GeneChip Early Access arrays among all 270 HapMap samples
  • Redon_CNVs - merged CNVs from both 500KEA and WGTP platforms for the 270 HapMap Samples

Data download

Data access summary

The data can be downloaded in four formats:

  • A single text file containing the mean dye-swap intensity for each clone in each individual
  • Raw extracted intensities for each image (BlueFuse format) in Excel-compatible text files
  • Normalized extracted intensities, with low intensity spots removed and log2ratios calculated.
  • Some sample image files for dye-swap experiment and a mapping file for use with the image files

Download data

The data were generated using the WGTP array with dye-swap for 269 HapMap individuals and using a single male reference : HapMap individual NA10851.

WGTP Intensities Data

Alternatively you can download all the pre and post-processed intensities for all of the 269 individuals:

Or you can download the individual's intensities of your interest:

Hapmap sample

Download WGTP Individual Sample CNVs

Hapmap CNVs (sample level)

Download Copy Number Variable regions identified within the 270 HapMap samples

Hapmap CNVs genome assembly

To browse data within a genome browser, please visit the 'Browse Data' tab.

Related files

Validation data

Two types of experiments are available: replicates and add_in experiments.

Their pre-processed and post-processed intensities (as defined in data description) are fully available. Only samples of raw scanner images files are available, for any interest in downloading the Raw Scanner images please contact Matthew Hurles.

Validation data:

All the validation datafiles are in the same format as the one provided for the WGTP Array.

Pre & Post processed intensities data
Replicates experiments All replicates data (~322Mo) FTP
Add_in experiments Human chromosome validation (~380Mo) Hamster, Mouse & self experiments validation (~118Mo) FTP
Replicates Raw Images
NA15510A reference_red reference_green sample_green sample_red FTP
NA12144A reference_red reference_green sample_green sample_red FTP
self_A reference_red reference_green sample_green sample_red FTP
Add_in experiments Raw Images
chrom10 FTP
chrom10 exp 1489-16 chr10_red_1489-16 chr10_green_1489-16
chrom10 exp 1489-17 chr10_red_1489-17 chr10_green_1489-17
chrom10 exp 1489-18 chr10_red_1489-18 chr10_green_1489-18
chrom10 exp 1489-19 chr10_red_1489-19 chr10_green_1489-19
chrom11 FTP
chrom11 exp 1489-20 chr11_red_1489-20 chr11_green_1489-20
chrom11 exp 1489-21 chr11_red_1489-21 chr11_green_1489-21
chrom11 exp 1489-22 chr11_red_1489-22 chr11_green_1489-22
chrom11 exp 1489-23 chr11_red_1489-23 chr11_green_1489-23

Data release

Sample

It corresponds to the HapMap Individual that has been used for the WGTP array.

Mapping files

The 2 GAL file are for the mapping on the WGTP array

WGTP_array_map1.gal is to use with experiments until 23/09/2005 and WGTP_array_map2.gal with experiments from 27/09/2005.

The text file WGTP_clone_map_NCBIxx.txt contains the mapping of the Human clones ( for both assembly NCBI35 (May 2004) and NCBI36 (March 2006) ).

Log2ratios Intensities for the 269 Hapmap Individuals

This single text file contains the mean dye-swap intensity for each clone in each individual.

All experimental artefacts, as provided in this list, have been removed from this file.

Pre processed ( Raw ) intensities ( from BlueFuse)

Foreach individual there are 2 files: one per intensity signal ( red, green )

Fluorescence intensities and log2 ratio values were extracted using the Bluefuse software (Bluegnome Ltd) from the scanner (raw) images.

Post processed intensities ( from BlueFuse )

For each individual there are 2 files : one per intensity signal ( red, green ).

These excel files are derived from the Raw intensities excel file after a post processing from the software BlueFuse.

This post processing consisted in:

  • Any spot giving low signal intensities ("amplitude"<100 in both channels) or inconsistent fluorescence patterns ("confidence" < 0.5 or "quality" = 0) was excluded from further analysis.
  • Log2 ratio values were then normalised by median block values, still using Bluefuse capabilities.

Raw images green/red

For each individual there are 2 files : one per intensity signal ( red, green ).

These images are the raw output from the laser scanner (Agilent Technologies) after the experiment.

Examples of raw image files raw scanner WGTP image files:

Sample raw scanner images directory
NA07019 sample_green reference_red sample_red reference_green FTP
NA07022 sample_green reference_red sample_red reference_green FTP
NA07029 sample_green reference_red sample_red reference_green FTP
NA12057 reference_red sample_green reference_green sample_red FTP
NA18500 reference_red sample_green reference_green sample_red FTP
NA18501 reference_red sample_green reference_green sample_red FTP
NA18502 reference_red sample_green reference_green sample_red FTP
NA18547 reference_red sample_green reference_green sample_red FTP
NA18558 reference_red sample_green reference_green sample_red FTP
NA18555 reference_red sample_green reference_green sample_red FTP

For any interest in downloading some other image files, please contact Matthew Hurles.

Validation data

All the validation datafiles are in the same format as the one provided for the WGTP Array.

Contact

This project was a collaborative effort of the groups of:

Acknowledgements

Nigel Carter, Richard Redon, Heike Fiegler, Lyndal Montgomery, Matthew Hurles, Chris Tyler-Smith, Tatiana Zerjal, Daniel Andrews, Armand Valsesia, Fengtang Yang, Dimitrios Kalaitzopoulos, Charles Lee, Steve Scherer

Funding was provided by the Wellcome Trust.

Publication

  • Global variation in copy number in the human genome.

    Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW and Hurles ME

    Nature 2006;444;7118;444-54

CNV project pages

Software

  • CNVFinder - an algorithm designed to detect copy number variants (CNVs) in the human population from large-insert clone DNA microarray
  • CNVTools - a collection of packages useful in the analysis of copy number variants (CNV).
* quick link - http://q.sanger.ac.uk/ob6zjg05