The problem of missing heritability in genetic studies (primarily genome-wide association studies, or GWAS) has been a major focus of interest in genetics journals during the past year. Many recent articles cite
Manolio et al. (Nature, 8 Oct. 2009) for stating the problem:
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability.
In April, I cited an excellent article by McClellan and King, who argued that "
many rare alleles account for common diseases". Now, Johansen
et al. (Nature Genetics Aug. 2010), in "
An excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia" provide evidence that many such rare variants can be found by sequencing a small number of candidate genes in affected individuals. Although the variants described in this study increase the proportion of genetic variation explained only incrementally, it is likely that they have only skimmed the surface (since sequencing was limited to coding regions). This article follows similar results looking at candidate genes (
Romeo et al. 2009). "Pooled association [statistical] tests for rare variants in exon-resequencing studies" have already been developed (
Price et al. 2010), and it is reasonable to believe that these methods can be extended to complete genome sequencing data. Thus, it appears that the analysis of rare variants will be increasingly common, and will explain much of the missing heritability.
However, progress has also been made in the more straightforward approach of looking at ever larger samples. In this week's Nature,
Teslovich et al. describe a study of over 100,000 individuals that yielded an amazing 95 loci affecting blood lipids at p values of less than 5 x 10
-8 ("Biological, clinical and population relevance of 95 loci for blood lipids"). The authors take this position:
It has recently been suggested that conducting genetic studies with increasingly larger cohorts will be relatively uninformative for the biology of complex human disease, particularly if initial studies have failed to explain a sizable fraction of the heritability of the disease in question (Goldstein 2009). As the reasoning goes, analysis of a few thousand individuals will uncover the common variants with the strongest effect on phenotype. Larger studies will suffer from a plateau phenomenon in which either no additional common variants will be found or any common variants that are identified will have too small an effect to be of biological interest.
Our study provides strong empirical evidence against this assertion. We extended a GWAS for plasma lipids from ~20,000 to ~100,000 individuals and identified 95 loci (of which 59 are novel) that, in aggregate, explain 10–12% of the total variance (representing ~25–30% of the genetic variance). ... We expect that future investigations of the new loci (for example, resequencing efforts to identify low-frequency and rare variants, or functional experiments in cells and animal models, as demonstrated for SORT1 in a separate study reported in the accompanying paper [Musunuru et al.]) will uncover additional important new genes.
This recent work provides optimism the heritability underlying complex human genetic disease will be found, in the form of both more genes and more variants per gene.