Project description
The Genome Structural Variation Consortium has conducted a CNV discovery project to identify common CNVs greater than 500bp in size using array-Comparative Genome Hybridisation at tiling resolution on isothermal oligonucleotide arrays. We analysed 20 females with European ancestry and 20 females with African ancestry, against a single male reference sample.
We analysed 20 CEU HapMap samples, 20 YRI HapMap samples and one Polymorphism Discovery Resource sample for CNVs by array-CGH using a set of NimbleGen arrays that tile across the assayable portion of the genome with approximately 42 million probes spread across twenty 2.1 million probe (HD2) arrays.
Samples used were: NA06985, NA07037, NA07045, NA11894, NA11931, NA11993, NA11995, NA12004, NA12006, NA12044, NA12156, NA12239, NA12287, NA12414, NA12489, NA12749, NA12776, NA12828, NA12878, NA15510, NA18502, NA18505, NA18508, NA18511, NA18517, NA18523, NA18858, NA18861, NA18907, NA18909, NA18916, NA19099, NA19108, NA19114, NA19129, NA19147, NA19190, NA19225, NA19240 and NA19257. Reference sample was NA10851.
Data releaseThese data are being released freely to the scientific community and can be considered a community resource. However, the data generators reserve the right to be the first to publish on the bulk data as indicated by the Fort Lauderdale meeting report (see data release policy below). Our groups are performing various global analyses in this dataset, including:
- generating a genome-wide map of copy number variation
- mapping the genomic-wide CNV map onto functional annotation of the genome
- associations to SNP and haplotype variation
- associations to gene expression variation
- quantify population differentiation for copy number variation
- investigating mechanisms of CNV formation
Authors who use data from this project for presentation and/or publication should acknowledge the project. Below is a sample acknowledgement statement:
This study makes use of data generated by the Genome Structural Variation Consortium (PIs Nigel Carter, Matthew Hurles, Charles Lee and Stephen Scherer) whom we thank for pre-publication access to their CNV discovery [and/or] genotyping data, made available through the websites http://www.sanger.ac.uk/humgen/cnv/42mio/ and http://projects.tcag.ca/variation/ as a resource to the community. Funding for the project was provided by the Wellcome Trust [Grant No. 077006/Z/05/Z], Canada Foundation of Innovation and Ontario Innovation Trust, Canadian Institutes of Health Research, Genome Canada/Ontario Genomics Institute, the McLaughlin Centre for Molecular Medicine, Ontario Ministry of Research and Innovation, the Hospital for Sick Children Foundation, the Department of Pathology at Brigham and Women's Hospital and the National Institutes of Health grants HG004221 and GM081533.
Users should note that the Consortium bears no responsibility for the further analysis or interpretation of these data, over and above that published by the Consortium.
Download data- Normalised intensity data from the CNV discovery array-CGH are available in 5Mb genomic windows
- Validated CNVs called from these normalised data are available here
- CNV genotyping data on a subset of the CNVs discovered
The release of pre-publication data from large resource-generating scientific projects was the subject of a meeting held in January 2003, the "Fort Lauderdale meeting", sponsored by the Wellcome Trust, one of the Project funders. The report from that meeting can be accessed here.
The recommendations of the Fort Lauderdale meeting address the roles and responsibilities of data producers, data users, and funders of "community resource projects", with the aim of establishing and maintaining an appropriate balance between the interests of data users in rapid access to data and the needs of data producers to receive recognition for their work.
The conclusion of the attendees at the meeting was that responsible use of the data is necessary to ensure that first-rate data producers will continue to participate in such projects and produce and quickly release valuable large-scale data sets. "Responsible use" was defined as allowing the data producers to have the opportunity to publish the initial global analyses of the data, as articulated at the outset of the project. Doing so also will ensure that the data generated are fully described.
Please contact Matt Hurles (matthew.hurles@sanger.ac.uk) if you have any queries.
AcknowledgmentsWellcome Trust Sanger Institute: Don Conrad, Richard Redon, Tomas Fitzgerald, Nelo Onyiah, Jan Aerts, Chris Tyler-Smith, Nigel Carter, Matthew Hurles
The Centre for Applied Genomics: Steve Scherer, Lars Feuk, Dalila Pinto
Harvard Medical School, Brigham and Women's Hospital: Charles Lee, Omer Gokcumen





