We are entering the era of national genomics. Last week, Nature Genetics published "the Dutch genome" ("a panel of 998 unique haplotypes" from 250 parent-offspring families). This was done by The Genome of the Netherlands Consortium (aka GoNL) What I especially like about this study is that they took care to determine individual haplotypes. They did this by sequencing families and phasing by transmission. This method allowed the discovery of many new variants with high confidence. Apparently, having phase information is especially useful in describing novel short indels (panel at right - which is Fig. 1B from the paper). Virtually all of the variants that they discovered involving insertions of 30-100 nucleotides were novel, implying that all previous methods had missed them. Furthermore, because rare variants are especially likely to impact health, and impution from population data is especially tricky for rare variants, having phase information is invaluable for the interpretation of personal genomes.
Knowing the phase allowed GoNL to make some surprising observations. Perhaps the most significant is that many alleles in genes for monogenic disease that have been described as deleterious in the Human Gene Mutation Database are apparently benign. This result poses a puzzle. Were these false positives? (Perhaps an undiscovered mutation in the same gene is responsible for the phenotype in the original report). Is this genetic epistasis? Is there a previously undescribed environmental component to these diseases?
However the puzzle gets resolved, the implications for public health genomics are enormous, and we never would have known without phasing by transmission.