Apr 1, 2010 12:00 AM

Common copy number variation doesn't explain much complex disease risk - but why not?

A massive study of common, large-scale DNA rearrangements in 16,000 complex disease patients has revealed... well, not much: it appears that common, large deletions and duplications play a relatively minor role in determining susceptibility to common diseases. But why would this be the case?

All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.

Wellcome Trust Case Control Consortium. (2010). Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls Nature, 464 (7289), 713-720 DOI: 10.1038/nature08979

The Wellcome Trust Case Control Consortium has just published the results of a massive survey of common, large DNA duplications and deletions (collectively termed copy number variation, or CNVs) in 16,000 patients suffering from complex diseases and 3,000 controls. The results come as no surprise, but are nonetheless disappointing: the study identified absolutely no novel CNVs associated with complex disease. Although three such variants were found to alter disease susceptibility, all three had been identified from previous studies.

The study's findings suggests that - despite their size - common CNVs play very little role in the etiology of common, complex diseases like rheumatoid arthritis and type 2 diabetes, and researchers will have to look elsewhere to uncover the notorious "missing heritability" for these diseases. The result shouldn't come as a shock: the major conclusions of this study were presaged by a paper published in Naturelate last year (and led by colleagues at the Sanger Institute). This earlier study, using far fewer samples, showed that most common CNVs are well "tagged" by nearby single nucleotide polymorphisms (SNPs), suggesting that any with substantial effects of common CNVs on disease risk would likely have been picked up by earlier SNP-based genome-wide association studies. It also showed that despite this tagging only a handful of common CNVs could be found that were correlated with known disease-associated SNPs. The authors of the earlier paper thus concluded:> [...] CNV could only explain a small minority of the disease risk already accounted for existing GWAS studies, let alone the larger (for most diseases) bulk of 'missing' heritability that remains unaccounted for [...]

The new results from the WTCCC should thus really be viewed as extremely large-scale validation of a known negative result, confirming an already-suspected limited role for common CNVs in complex disease susceptibility.

So, no real surprises here. But still, an obvious question remains: __why don't common CNVs play a major role in complex disease susceptibility?__This result certainly seems counter-intuitive: many CNVs are large, sometimes deleting or duplicating many thousands of bases of DNA, and one might thus expect a priori that such variants will be much more likely to have a functional effect on the genome than single-base SNPs. Yet based on their results, the WTCCC conclude:> Having completed these analyses the hypothesis that, a priori, an arbitrary common CNV is much more likely than an arbitrary common SNP to affect disease susceptibility is not supported by our data.

How can it possibly be the case that a deletions or duplications of thousands of bases can have the same probability of having a functional impact as a variant affecting a single nucleotide?

That question is not explicitly addressed in the paper. However, the key term in the quoted sentence above is "common": for any variant to reach the population frequency required to be detectable in this study (around 5%), it has to have run the gauntlet of purifying natural selection. Genetic variants - either SNPs or CNVs - with sizeable effects on disease risk will (in most cases) have been prevented by selection from ever reaching this frequency in the population.That means that although a brand new CNV would be predicted, on average, to have a substantially larger effect on fitness than a brand new SNP, we should expect *common *CNVs to show roughly the same distribution of effects on disease risk as common SNPs due to the ruthless filtering out of seriously deleterious variants in both classes by selection. So conditioning on a variant being common, its predicted effect on disease susceptibility will be small regardless of whether it is a CNV or a SNP. And __because there are far, far fewer common CNVs in the population than common SNPs, the ____total __contribution of CNVs to disease risk is substantially less. So the negative outcome of this very large study was fairly predictable - although that's easy to say in hindsight, of course!Where to next? The field has already moved on with a new focus on rare variants, which (given the selection-based argument above) seem far more likely to yield useful findings. This year will see the launch of several very large studies taking a variety of approaches to dig into the lower end of the frequency spectrum: imputation using existing data-sets; new genome-wide association chips containing larger numbers of rare SNPs; and large-scale sequencing of candidate genes, whole exomes and even entire genomes. Rare variant discovery has already proved successful in the CNV field, and it seems likely that the next round of CNV association studies will prove enormously more fruitful than this study.