Contact WTSI Webmaster Printer friendly format Login to WTSI resources WTSI RSS feed
Scientific Divisions
  • Human Genetics
  • Model Organisms
  • Pathogens
  • Bioinformatics
  • Sequencing
  • CNV Project Information
  • News
  • Home
  • Help/FAQ

  • CNV Download
  • Data_Home
  • Data_Access
  • Data_Browse
  • Software
  • Website Search
  • People Search
  • Library Services
  • Site Map
  • Feedback / Help
The Copy Number Variation (CNV) Project FAQ


CNV project related questions :
  • What is the CNV project?
  • What is the data release policy?
  • How do I contact the CNV project researchers?

CNV data browse related questions :
  • Where can I download/browse CNV data?
  • Where can I found data description?
  • Which internet browsers have been tested for CNV data display?
  • What is UCSC?
  • What is Ensembl?
  • What is the version of the human genome assembly on which the data are displayed?
  • What is the list of artefacts?

Technical related questions :
  • Why has my internet browser timed out when browsing the CNV map?
  • Why has my internet browser timed out when retrieving the PDF histogram: "distribution of copy number among all 270 HapMap samples"?
  • Why can I not view the quantitative clone log2ratio intensities on Ensembl?
  • Why is there no data for a chromosome of a given sample?
  • Why does UCSC keep displaying previously uploaded tracks?
  • Why can I not browse all chromosomes for a sample on a genome browser without going back to the CNV page?
  • How can I view the PDF histogram: "distribution of copy number among all 270 HapMap samples"? and what does it represent ?
  • How do I add/compare my data with the CNV data from the genome browser?

What is the CNV project? ^

Wellcome Trust Sanger Institute researchers, as part of an international collaboration (see below), have generated the most complete map yet of regions within the human genome that vary in copy number between apparently healthy individuals. Today, this map is released to the research community to accelerate our understanding of genome function and the genetic basis of disease.

The human genome consists of about 3 billion bases and it is the order, or sequence, of these bases that contains the genetic information (genes) to make proteins which in turn carry out all biological functions. Both the detailed sequence of the genome, and its larger-scale organisation vary among all humans, and it is these variants that cause us to differ from one another both in appearance, and in susceptibility to diseases. Copy number variation is a form of structural variation in which a segment of DNA, at least 500 bases in length is found at different copy numbers in different people, as a result of deletion or duplication events.

"We already know that Copy Number Variation can cause, or affect our risk of getting diseases; however, our understanding of this phenomenon has been inhibited by a lack of knowledge about which regions of the human genome vary in this manner," said Dr Matthew Hurles , one of the lead researchers on the project. "By looking at the location of these variants with regard to known genes, and interrogating their evolutionary history, we can prioritise those variants most likely to have an impact on human health and disease."

Copy Number Variation was detected in the genomes of 270 individuals (the HapMap collection) with ancestry in Europe, Africa and East Asia. The results have been analysed in detail and a scientific report will be published in the coming months.

Additional information about the project can be found at: http://www.sanger.ac.uk/humgen/cnv/

Data access is available at http://www.sanger.ac.uk/humgen/cnv/data/



What is the data release policy? ^

The release of pre-publication data from large resource-generating scientific projects was the subject of a meeting held in January 2003, the "Fort Lauderdale meeting", sponsored by the Wellcome Trust, one of the Project funders.
The report from that meeting can be viewed at http://www.wtccc.org.uk/docs/wtd003207.pdf

The recommendations of the Fort Lauderdale meeting address the roles and responsibilities of data producers, data users, and funders of "community resource projects", with the aim of establishing and maintaining an appropriate balance between the interests of data users in rapid access to data and the needs of data producers to receive recognition for their work.

The conclusion of the attendees at the meeting was that responsible use of the data is necessary to ensure that first-rate data producers will continue to participate in such projects and produce and quickly release valuable large-scale data sets.
"Responsible use" was defined as allowing the data producers to have the opportunity to publish the initial global analyses of the data, as articulated at the outset of the project.
Doing so also will ensure that the data generated are fully described.



How do I contact the CNV project researchers? ^

For any questions regarding the CNV project data and the data release policy please contact:

- Nigel Carter npc@sanger.ac.uk
- Matthew Hurles meh@sanger.ac.uk


For website bug reports, data problems please check this FAQ.
If you cannot find an answer, you can contact our CNV webmaster Armand Valsesia using this form

Your comments and feedbacks are very welcome also using this form




Where can I download/browse CNV data? ^


WGTP data are available from http://www.sanger.ac.uk/humgen/cnv/data/cnv_data/ .

This includes :

  • WGTP Raw & Processed Data
  • WGTP CNV data

CNV Data can be browsed from http://www.sanger.ac.uk/humgen/cnv/data/cnv_data/display/


And Affy 500KEA raw data are available either from :
  • Hapmap website
  • GEO with accession id : GSE5013 and GSE5173



Where can I found data description?^


The files available for download from the website are described here.
Description for all files available from our FTP site can be found here.



Which internet browsers have been tested for CNV data display? ^

Internet Explorer (6.0)
Mozilla (1.7.8)
Mozilla Firefox (1.5)
Safari (2.0.3)



What is UCSC? ^

UCSC is a genome browser developed at the Genome Bioinformatics Group of University California Santa Cruz.
UCSC website is at http://genome.ucsc.edu/



What is Ensembl? ^

Ensembl is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. Ensembl is primarily funded by the Wellcome Trust.
Ensembl website is at http://www.ensembl.org/



What is the version of the human genome assembly on which the data are displayed? ^

Both genome assemblies NCBI35, May 2004 and NCBI36, March 2006 are available.



Why has my internet browser timed out when browsing the CNV map? ^

Because we are displaying genome scale data, we split the data by sample and by
chromosome to speed up the upload onto the genome browsers.

On the UCSC genome browser data files are uploaded in compressed form and the default tracks displayed are limited to clone coverage, CNVs classified as losses or gains relative to the reference DNA and the quantitative log2ratio intensities for individual clones.

On the Ensembl genome browser certain tracks cannot be hidden (like Vega genes, repeats ...) when using a URL-file based upload. To improve the upload speed, the user can specify his/her Ensembl default preferences (see Ensembl User documentation).

Also it might be worth checking with your system/network administrator your internet browser settings and proxies/firewall configuration to improve the data display speed.



Why has my internet browser timed out when retrieving the PDF histogram:
"distribution of copy number among all 270 HapMap samples"? ^

You would need to check with your system/network administrator your internet browser settings and proxies/firewall. Please check that
You are allowing popup windows and cookies for the CNV data website.
Also as these PDF files are retrieved from our FTP site, you will need an up-to-date version of your internet browser compatible with FTP download or integrating an FTP explorer application.
If the above tips are not helping, you will find all these histograms on the CNV project FTP site: ftp://ftp.sanger.ac.uk/pub/cnv_project/intensities/histograms/
Each sub-directory corresponds to specific chromosome(s) (indicated by directory name), one can retrieve directly a specific clone histogram (as clone_name.pdf) or get all the clones for this (these) chromosome(s) using the compressed file ( Histograms_chromosome-name.zip ).



Why can I not view the quantitative clone log2ratio intensities on Ensembl? ^

The quantitative (wiggle) plot as implemented on the UCSC browser is not currently available on the Ensembl browser.
The Ensembl web team is currently implementing it and this feature will be available very soon.



Why is there no data for a chromosome of a given sample? ^

When a display chromosome for a sample, seems to contain no data, several cases are possible :

  • This chromosome has no predicted CNVs, therefore the tracks Hapmap deletions and/or Hapmap duplications are empty
  • This chromosome has no CNVs and no clone coverage, then it means that this specific sample had experimental artefacts on this ( maybe some other ) chromosome, so data have been removed for analysis.
    It should be listed into the artefact list here.



What is the list of artefacts? ^

The exclusion list can be downloaded from here and is also included in to the FTP README here.



Why does UCSC keep displaying previously uploaded tracks? ^

This is a feature of the UCSC browser, not the upload mechanism, and enables multiple tracks to be compared. However, one can remove previously uploaded tracks by either clicking on "Manage custom track" then removing unwanted tracks or by going to http://genome.ucsc.edu/cgi-bin/hgGateway and clicking on "Click here to reset" to reset the browser interface to default.



Why can I not browse all chromosomes for a sample on a genome browser without going back to the CNV page? ^

Due to technical upload limitations, we have had to split data by sample and by chromosome.
To do otherwise, for example by uploading all chromosomes for a given sample or all chromosomes for all samples, is not possible on UCSC without a very rapid internet connection and is not possible at all on Ensembl.





How can I view the PDF histogram: "distribution of copy number among all 270 HapMap samples"?
and what does it represent ? ^


The "distribution of copy number among all 270 HapMap samples" is a histogram that represents for a given clone, the log2ratio intensities for all 270 HapMap samples.

On UCSC, the distribution of copy number among all 270 HapMap samples can be visualised for each clone on the array by clicking on the desired clone in the track "Sanger_WGTP_Clones" to show the details page for that clone and then clicking on the link entitled "Outside Link".

On Ensembl, the distribution of copy number among all 270 HapMap samples can be visualised for each clone on the array by clicking on the desired clone in the track "Sanger_WGTP_Clones" and then clicking on the link "details".

For any problem browsing this histogram, please see :
Why has my internet browser timed out when retrieving the PDF histogram: "distribution of copy number among all 270 HapMap samples"?



How do I add/compare my data with the CNV data from the genome browser? ^



In both Ensembl/UCSC browsers, the user can add their own data.

UCSC :
Adding your data can be done by clicking on "Manage custom tracks" then click on "Add custom tracks" that will allow the upload of your data files.
Please check out UCSC documentation, about creating custom tracks : http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks.

You can take advantage of the UCSC capabilities to compare your data with the CNV data, for example by generating the union or intersection of the two tracks.
This is possible using the UCSC table browser (through "Manage custom tracks" then "access in table browser" or directly from the genome browser by clicking on "Tables"). You can also export data using it.
More information about using the UCSC table browser is available at http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html

Ensembl :
By clicking on the "DAS source" menu, you can either add your own DAS track (if you have a DAS server) or you can add URL based information with "URL based data". There is also a link (left hand side) : "upload your own data".
To compare data, you can use BioMart ( link "DataMining[BioMart]" ) which also allows you to export data.

More information can be found at :
BioMart page http://www.ensembl.org/Multi/martview
Displaying custom data in Ensembl http://www.ensembl.org/info/data/index.html#import
setting a DAS with Ensembl http://www.ensembl.org/info/data/external_data/das/index.html


Human Genetics Model Organisms Pathogen Biology Bioinformatics Sequencing
Section Home
Cancer Genome Project
COSMIC
Statistical Genetics
Human Genome Project
Case-Control Consortium
Section Home
Mouse
Zebrafish
C. elegans
S. pombe
Section Home
Bacteria
Protozoa
Helminths
Section Home
Software
Databases
Blast
Ensembl
Vega
GeneDB
Section Home
Sequencing Projects
sequencing Information

webmaster@sanger.ac.uk

Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK  Tel:+44 (0)1223 834244

Last Modified Thu Nov 6 17:17:12 2008

Genome Research Limited is a charity registered in England with number 1021457

Data Sharing Policy | Conditions of Use | Copyright