Enabling the analysis of rare variants in large-scale case control and quantitative trait association studies.
CCRaVAT (Case-Control Rare Variant Analysis Tool) and QuTie (Quantitative Trait) are software packages that enable efficient large-scale analysis of rare variants across specific regions or genome-wide.
These programs implement a rare variant super-locus or collapsing method that investigates the accumulation of rare variant alleles in either a case-control or quantitative trait study design.
Recent advances in high-throughput genotyping have made large-scale genetic association studies possible. Genome-wide association scans (GWAS) for complex disease have met with unprecedented success in identifying common susceptibility variants. However, the discovered common single nucleotide polymorphism (SNP) associations do not account for a large proportion of the genetic component of disease. The field is now focusing on the analysis of low frequency and rare variants (i.e. minor allele frequency (MAF) ≤0.05) to find the missing heritability in complex disease etiology (Bodmer and Bonilla, 2008; Manolio et al., 2009). While the sample sizes currently investigated are large enough for a well-powered GWAS of common variants, they are not large enough to provide sufficient power for the single-point analysis of rare variants with small to moderate effect sizes (Morris and Zeggini, 2009). We have developed rare variant analysis software, CCRaVAT and QuTie, which allow the large-scale analysis of low MAF polymorphisms by pooling rare variants within defined regions and treating them as a single 'super-locus'. This method helps identify regions that contain a significantly higher proportion of rare minor alleles in the disease cases or controls, or within groups of individuals with significantly different quantitative trait means. Collapsing multiple rare minor alleles into a single locus across pre-defined regions (either genes or sliding windows of defined sequence length) can substantially increase power for detecting association (Li and Leal, 2008; Morris and Zeggini, 2009). This approach, implemented in CCRaVAT and QuTie, can be applied to data arising from the targeted examination of specific regions or at the genome-wide scale.
The statistical properties of the rare variant super-locus or collapsing method that we have implemented are described in (Li and Leal, 2008; Morris and Zeggini, 2009). The first step in implementing this approach involves the definition of regions in which rare variant minor alleles are collapsed. These chromosomal regions can either be sliding windows of predefined length or genic regions defined by intervals either side of the transcriptional start and stop sites of genes. CCRaVAT and QuTie use the same approach up to this point. The programs differ in the study designs analyzed and significance determination. CCRaVAT analyzes case-control data and constructs a 2x2 contingency table of the presence or absence of rare variant minor alleles in cases and controls for each region. Differences in the proportion of cases and controls carrying rare variant minor alleles are tested using a Pearson's chi-squared test or a Fisher's exact test when cell counts are small. CCRaVAT also allows users to generate empirical p values by permuting case-control status a predefined number of times and repeating the analysis for each replicate. QuTie implements the analysis of quantitative traits and analyzes the differences in quantitative trait means for individuals carrying at least one rare variant minor allele and individuals carrying no rare variant minor alleles within the defined region. The quantitative trait values in the two groups are compared using linear regression and a Student's t-test.