New-technology DNA sequencing provider Complete Genomics will provide near-complete genome sequences of 100 individuals to the Institute for Systems Biology, driving the first ever association study for a complex trait using whole-genome sequencing. Here's the press release, and GenomeWeb has some additional information.
This is pretty exciting stuff:
The goal of this project is not to identify the mutations that cause Huntington's (the genetic basis of this disease is already extremely well-characterised), but rather to look for novel variants that alter the progression of the disease - usually called "disease modifiers". In other words, the goal here is to uncover genetic variants that explain variation between Huntington's patients in things such as age of onset or the speed with which the disease progresses.
The major novelty of this study is that the target trait is complex (i.e. is likely determined by multiple genes), whereas the small number of WGS disease studies reported to date have focused on much more tractable Mendelian diseases (those in which disease status is conferred by the presence of a single, disastrous mutation).
You can expect to see plenty of similar announcements over the next twelve months as the cost of sequencing drops to the point that WGS on moderately large cohorts becomes feasible (Complete Genomics is currently offering the service for around $20,000 per genome).
This project is somewhat unusual in its focus on disease-modifying variants rather than disease-causing variants; it's likely that most of the early WGS studies will actually aim to identify new, rare large-effect risk factors for complex diseases such as type 1 diabetes.
At the American Society of Human Genetics meeting we started to get a sense of how early WGS projects in complex diseases will look:
- Individuals selected from the extremes of the distribution (e.g. particularly early-onset or severe manifestations of disease);
- A focus on individuals with a strong family history of disease;
- Sequencing of both patients and unaffected family members;
- In some cases, experimental designs employing low-coverage sequencing of many individuals rather than high-quality sequencing of a smaller cohort.
The first two features will enrich the target population for the types of rare, large-effect variants that WGS is uniquely capable of detecting, while the addition of unaffected family members will make it easier to differentiate between disease risk variants and the benign polymorphisms that litter all of our genomes. The final feature - low-coverage rather than high-quality sequence - is still controversial, but was strongly advocated by Richard Durbin and Goncalo Abecasis at the meeting; this is the approach currently being taken by the 1000 Genomes Project. I plan to write more about this strategy soon.
Anyway, here we are: the technology has finally arrived that makes WGS-based studies feasible for complex traits. Now the real challenge - coming up with ways of handling the massive volumes of data generated by these technologies, and of finding true causal variants amongst the noise of sequencing artefacts and benign polymorphisms - starts to bite.