Monday, August 11, 2014

Go Netherlands - Dutch genomes phased by transmission.

We are entering the era of national genomics. Last week, Nature Genetics published "the Dutch genome" ("a panel of 998 unique haplotypes" from 250 parent-offspring families). This was done by The Genome of the Netherlands Consortium (aka GoNL) What I especially like about this study is that they took care to determine individual haplotypes. They did this by sequencing families and phasing by transmission. This method allowed the discovery of many new variants with high confidence. Apparently, having phase information is especially useful in describing novel short indels (panel at right - which is Fig. 1B from the paper). Virtually all of the variants that they discovered involving insertions of 30-100 nucleotides were novel, implying that all previous methods had missed them. Furthermore, because rare variants are especially likely to impact health, and impution from population data is especially tricky for rare variants, having phase information is invaluable for the interpretation of personal genomes.
Knowing the phase allowed GoNL to make some surprising observations. Perhaps the most significant is that many alleles in genes for monogenic disease that have been described as deleterious in the Human Gene Mutation Database are apparently benign. This result poses a puzzle. Were these false positives? (Perhaps an undiscovered mutation in the same gene is responsible for the phenotype in the original report).  Is this genetic epistasis? Is there a previously undescribed environmental component to these diseases?
However the puzzle gets resolved, the implications for public health genomics are enormous, and we never would have known without phasing by transmission.

Sunday, April 01, 2012

The genetics of Caenorhabditis elegans.

Genetics, Vol. 77, No. 1. (1 May 1974), pp. 71-94

This paper is recommended by Marty Chalfie (Columbia University), who writes:
The Brenner paper is a classic; when does someone have an opportunity to outline the genetics of an entire organism. It is also a terrific paper to go over basis genetic ideas. nearly all the variation affecting gene expression resides.his

Because it's brief, I'm going to quote the entire abstract from this classic:

Methods are described for the isolation, complementation and mapping of mutants of Caenorhabditis elegans, a small free-living nematode worm. About 300 EMS-induced mutants affecting behavior and morphology have been characterized and about one hundred genes have been defined. Mutations in 77 of these alter the movement of the animal. Estimates of the induced mutation frequency of both the visible mutants and X chromosome lethals suggests that, just as in Drosophila, the genetic units in C. elegans are large.

Functional enhancers at the gene-poor 8q24 cancer-linked locus

Functional enhancers at the gene-poor 8q24 cancer-linked locus.

PLoS genetics, Vol. 5, No. 8. (14 August 2009), e1000597, doi:10.1371/journal.pgen.1000597
This paper is recommended by Diane Robins (University of Michigan), who writes that this paper

"is the first functional analysis of a SNP in a gene desert that proves to be a variation in a FOX binding site making it a better androgen-responsive long-range enhancer for myc.  Follow-up paper shows interaction over 500 kb.."

Selection at linked sites shapes heritable phenotypic variation in C. elegans

Selection at linked sites shapes heritable phenotypic variation in C. elegans.

Science (New York, N.Y.), Vol. 330, No. 6002. (15 October 2010), pp. 372-376, doi:10.1126/science.1194208

This paper is recommended by Eric Haag (University of Maryland), who writes:

This paper is a real gem of interdisciplinary genetics thinking. Its key insight is that how natural selection impacts gene expression is highly subject to the overall frequency and chromosomal distribution of recombination (to the point of outweighing the biological processes affected).   Rockman shows that in mostly selfing nematodes like C. elegans, the central 50% of each autosome more or less acts like a "supergene" that harbors very little variation affecting gene expression genome-wide.  In contrast, the terminal 1/4 on either side is where nearly all the variation affecting gene expression resides. The available evidence suggests this occurs because selective sweeps wipe out the variation over much of the central domain of each automosome.

Competition between ADAR and RNAi pathways for an extensive class of RNA targets

Competition between ADAR and RNAi pathways for an extensive class of RNA targets.

Nature structural & molecular biology, Vol. 18, No. 10. (11 October 2011), pp. 1094-1101, doi:10.1038/nsmb.2129

This paper is recommended by Antony Jose (University of Maryland), who writes:

This work is a particularly elegant illustration of the powerful combination of computational methods with biochemical isolation of RNA from different genetic backgrounds. The discovery of numerous sites genome-wide where the transcribed RNA is edited by ADAR is noteworthy.

Saturday, March 24, 2012

Genome sequence, comparative analysis and haplotype structure of the domestic dog

I use this paper in my graduate genetics course because it is the third complete mammalian genome. Being the third genome allowed the first application of a range of comparative methods, revealing several aspects of mammalian genome structure for the first time.  The paper illustrates the methods of comparative genomics, and the application of population genetics to genomics and vice versa

Functions of the nonsense-mediated mRNA decay pathway in Drosophila development.

I use this this paper in my graduate genetics course to illustrate techniques for analysis of gene expression in Drosophila.  It's also a very nice example of serendipity, whereby mutations in a set of genes for a fundamental process (nonsense-mediated decay) were discovered while looking for something else entirely.  

SCNM1, a putative RNA splicing factor that modifies disease severity in mice

I use this paper in my graduate genetics course.  It describes the use of inbred strains to map and molecularly identify genes and illustrates a case of strain-specific phenotypes and genetic interactions.  It simultaneously illustrates the power of working with inbred strains and the caveat that phentypes can be strain-specific.

Novel and expanded roles for MAPK signaling in Arabidopsis stomatal cell fate revealed by cell type-specific manipulations

Novel and expanded roles for MAPK signaling in Arabidopsis stomatal cell fate revealed by cell type-specific manipulations.

The Plant cell, Vol. 21, No. 11. (November 2009), pp. 3506-3517, doi:10.1105/tpc.109.070110
I use this paper in my graduate genetics course. It describes a genetic analysis of stomatal cell fate using epistasis analysis. The paper illustrates epistasis analysis, methods for ectopic expression (in this case, cell type-specific expression of constitutively active and dominant negative alleles of kinases) and a bit of plant biology.

A DNA integrity network in the yeast Saccharomyces cerevisiae

I use this paper in my graduate genetics course.  It describes a global screen for synthetic defects involving DNA integrity, which reveals a network of 16 functional modules.  The paper illustrates screens based on genetic interactions (in this case, synthetic lethality or fitness defects) and the systems biology used to evaluate the results of such a screen.  It also illustrates the use of Saccharomyces cerevisiae as a model system.

A whole-genome RNAi Screen for C. elegans miRNA pathway genes

A whole-genome RNAi Screen for C. elegans miRNA pathway genes.

Current biology : CB, Vol. 17, No. 23. (4 December 2007), pp. 2013-2022, doi:10.1016/j.cub.2007.10.058

I use this paper in my graduate genetics course. It describes a whole-genome screen of C. elegans using RNA interference, provides an example of whole-genome parallel reverse genetic screens, and illustrates both the microRNA biosynthesis pathway and a bit of C. elegans biology.

Friday, August 05, 2011

All 4096 hexamers evaluated as exonic splicing elements

Exon sequences have a large effect on splicing efficiency. Specific sequences can act as ESEs (exonic splicing enhancers) to promote splicing, or as ESSs (exonic splicing suppressors) to reduce splicing. In the August 2011 issue of Genome Research, Ke et al. describe a comprehensive quantitative measure of the splicing impact of all 4,096 6-mer sequences using an Illumina Genome Analyzer to compare spliced transcripts with an input library. They tested five positions within two different internal exons in a minigene system and sequenced millions of successfully spliced transcripts after transfection of human cells. Specific hexamers had different effects in different positions, but these were correlated, and the effect on splicing of each 6-mer could be quantified. Many complications (secondary structure, synergy, effect on chromatin) are addressed by this study, which provides a huge data set. However, this is just the beginning. This paper examines only a single cell type, and it concerns only a single type of alternative splicing - exon inclusion. This high throughput approach, which captures the power of high throughput sequencing, will certainly be extended to other contexts in alternative splicing, and may prove useful for defining other nucleic acid regulatory motifs.

Sunday, March 06, 2011

Assembling haplotypes in diploid sequencing projects

Shotgun sequencing has been the dominant mode of genome sequencing since the beginning of genomics. However, assembly of a complete genome can be complicated when the two haploid genomes present within the individual being sequenced are quite different, in which case the coverage is reduced by half and the two haploid genomes must be assembled separately. Problems also arise when two haploid genomes diverge over only some of their length. For example, Barrière et al. (2009) find that, despite inbreeding designed to generate a fully homozygous sample, "approximately 10% and 30% of the Caenorhabditis remanei and C. brenneri genomes, respectively, are represented by two alleles in the assemblies."

A similar problem arises when attempting to resolve the haplotypes within an individual. In the January issue of Nature Biotechnology, Kitzman et al. describe the "Haplotype-resolved genome sequencing of a Gujarati Indian individual." Sequencing pools of large-insert clones provides information about individual haplotypes across most of the genome. The power of combining "the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning" allows parallel assembly of distinct sequence from large-insert clones to provide information about genome structure that might otherwise be very difficult to tease out of a mixed assembly.

What interests me about the method of Kitzman et al. is that it can be applied directly to cases of widespread structural polymorphism, and I expect to see it used for a variety of problems in the coming years. With this approach, or similar approaches, intractably complex genomes (e.g. Drosophila subobscura - see Sánchez-Gracia and Rozas 2011), asexual species and even metagenomic samples will yield their secrets.

Saturday, August 07, 2010

Missing heritability found?

The problem of missing heritability in genetic studies (primarily genome-wide association studies, or GWAS) has been a major focus of interest in genetics journals during the past year. Many recent articles cite Manolio et al. (Nature, 8 Oct. 2009) for stating the problem:
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability.
In April, I cited an excellent article by McClellan and King, who argued that "many rare alleles account for common diseases". Now, Johansen et al. (Nature Genetics Aug. 2010), in "An excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia" provide evidence that many such rare variants can be found by sequencing a small number of candidate genes in affected individuals. Although the variants described in this study increase the proportion of genetic variation explained only incrementally, it is likely that they have only skimmed the surface (since sequencing was limited to coding regions). This article follows similar results looking at candidate genes (Romeo et al. 2009). "Pooled association [statistical] tests for rare variants in exon-resequencing studies" have already been developed (Price et al. 2010), and it is reasonable to believe that these methods can be extended to complete genome sequencing data. Thus, it appears that the analysis of rare variants will be increasingly common, and will explain much of the missing heritability.

However, progress has also been made in the more straightforward approach of looking at ever larger samples. In this week's Nature, Teslovich et al. describe a study of over 100,000 individuals that yielded an amazing 95 loci affecting blood lipids at p values of less than 5 x 10-8 ("Biological, clinical and population relevance of 95 loci for blood lipids"). The authors take this position:
It has recently been suggested that conducting genetic studies with increasingly larger cohorts will be relatively uninformative for the biology of complex human disease, particularly if initial studies have failed to explain a sizable fraction of the heritability of the disease in question (Goldstein 2009). As the reasoning goes, analysis of a few thousand individuals will uncover the common variants with the strongest effect on phenotype. Larger studies will suffer from a plateau phenomenon in which either no additional common variants will be found or any common variants that are identified will have too small an effect to be of biological interest.

Our study provides strong empirical evidence against this assertion. We extended a GWAS for plasma lipids from ~20,000 to ~100,000 individuals and identified 95 loci (of which 59 are novel) that, in aggregate, explain 10–12% of the total variance (representing ~25–30% of the genetic variance). ... We expect that future investigations of the new loci (for example, resequencing efforts to identify low-frequency and rare variants, or functional experiments in cells and animal models, as demonstrated for SORT1 in a separate study reported in the accompanying paper [Musunuru et al.]) will uncover additional important new genes.
This recent work provides optimism the heritability underlying complex human genetic disease will be found, in the form of both more genes and more variants per gene.

Tuesday, July 27, 2010

The GAO, Congress and the public on genetic testing

The Subcommittee on Oversight and Investigations of the House of Representatives Committee on Energy and Commerce held a hearing on "Direct-To-Consumer Genetic Testing and the Consequences to the Public Health" on Thursday. I haven't seen any news after the fact, but it was covered by the 23andMe blog "The Spittoon," where I got the link to the committee's own website (which has copies of the testimony). The hearing included a report from the GAO which made a strong case for regulation of direct-to-consumer testing, including an extremely disturbing "video" which appeared on YouTube ("video" in quotes because it's actually just recordings of telephone calls with the words printed on screen). I think that it is almost certain that new regulations will emerge soon.

23andMe customers have responded, and there is a petition calling for continued access to genetic information. I signed it myself, making this statement:
I certainly recognize the need to insure that test results are valid. However, I'm not sure that goes beyond CLIA certification. I also recognize the need to protect consumers from misinformation and bad advice from the unqualified. However, I'm not sure that is within the FDA's purview. My main point is that secure and private access to reliable personal genetic information is a valuable thing that does not put the consumer at undo risk.
That said, regulations that protect consumers from bad advice may be appropriate. However, it's going to be tricky, because we're talking about regulation of speech and education. I hope that the regulations are written in a way that encourages the broad dissemination of genetic knowledge from the many reliable sources currently available.

Thursday, July 22, 2010

FDA's public hearing on genetic tests

The FDA had a public hearing on our campus this week (Monday and Tuesday) about oversight of “laboratory developed tests” (including direct-to-consumer genetic tests). I dropped by for the last two sessions (on direct-to-consumer tests and education and outreach). Each included presentations from interested parties (the “public” in this case being representatives from companies selling direct-to-consumer tests, professional and educational organizations), a panel discussion and comments. FDA officials sat on the podium but did not speak. There were a few hundred people there, and it was a sophisticated group (at least the speakers felt no need to define BRCA, GWAS, CLIA, QSR, DTC, LDT, IVDMIA, NIH, CDC, FTC, NIST, NSGC or many other acronyms). I spent a lot of the time Googling jargon with my phone.

It was very interesting. I got the feeling that genotyping companies such as 23andMe will not be able to continue to operate as they have for a lot longer. I heard several people call for physicians being involved in "ordering the test" and "interpreting the test." Already, New York and Maryland prohibit people from obtaining their own genetic information from direct-to-consumer companies.

Of course one problem is that the typical family care physician probably knows even less about genetics and how to interpret the results of these tests than the typical 23andMe customer, and that was not lost on many of the people there. Everyone recognized the need for educating both consumers and physicians. One particularly original idea was that consumers should have to pass a test analogous to a driver’s test (as is currenly done at the Personal Genome Project).

My reaction was to come home and download my personal data from 23andMe so that I can be assured of continued access to it. I am concerned to see what the FDA decides to do.

Monday, July 19, 2010

The FDA and me, and you, and our genetic information

The FDA is holding a public meeting on "Oversight of Laboratory Developed Tests (LDTs)" on my campus (University of Maryland) today and tomorrow. Registration is closed, but it's possible to view the proceedings via webcast (here). I am doing that now (I'm watching Judith Wilber right now). I stopped by briefly this morning and hope to go back tomorrow afternoon. The topic has been in the news a bit lately (see my links under right to know on delicious), and there are legitimate issues to address.

I have expressed my opinion in favor of our right to know our own genetic makeup on my blog and will only briefly restate those arguments further here. My preference would be that oversight and approval of genotype "tests" (including whole-genome sequencing) be limited to technical standards (Are the genotype results are accurate?), but that rules and oversight are appropriate for interpretation and guidance. A genotype is information, and it may be best to separate genotyping tests from the recommendations based on them to the extent possible. Asking a genotyping service to imagine the clinical and medical utility of their tests is a bit like asking the manufacturer of bathroom scales to prove that knowing one's weight is medically useful. Clinical utility should not be a criterion for genotype tests that provide accurate and reliable information. I'm not saying that the FDA should not protect consumers from bad advice, but that in the case of genotype information, this can and should be separated from the regulation of tests per se.

Saturday, April 24, 2010

Many rare alleles account for common genetic diseases

The last three years have witnessed an explosion of genome-wide association studies. A catalog of results at the National Human Genome Research Institute ( currently lists 545 publications associating 2,664 single nucleotide polymorphisms with human traits, and top journals in the field (such as Nature Genetics) have devoted themselves almost entirely to the publication of GWAS results. However, as summarized in an excellent review in the current issue of Cell (McClellan and King, "Genetic Heterogeneity in Human Disease"), it appears that "common risk variants fail to explain the vast majority of genetic heritability for any human disease, either individually or collectively (Manolio et al., 2009)." Instead, while "most human variation is ancient and shared," most alleles, including those that cause disease, are recent and rare. "Rare large-effect mutations are now recognized as the causes of many different common medical conditions." This makes sense in that deleterious alleles should be eliminated relatively quickly by selection, but leaves us unsure of how to interpret the available GWAS data. The good news is that genome sequencing will soon be used to discover rare variants in many people. At least for the researchers, "it will be fun to sort out."

Saturday, February 27, 2010

Conservation of expression without conservation of regulatory sequences

Conservation is a reliable indicator of what sequence features have a function. Very often, in comparative genomics, conservation is the only clue available. However, there are many examples of highly conserved sequence for which no function can be identified, and many of these appear to be non-essential. A recent review by Weirauch and Hughes ("Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same." Trends in Genetics 26:64, PMID 20083321) considers the opposite case: when expression is conserved but regulatory sequences are not. They list 17 examples of genes whose expression pattern is conserved despite divergent cis-regulatory sequences. In many of these cases the regulatory mechanisms are well known, and the review also includes a discussion of mechanisms that allow expression patterns to persist without conservation of the cis-regulatory signals.

Saturday, January 09, 2010

Water fleas have new introns

The study of intron gain and loss can be frustrating, because such events are very rare. Documented cases of intron gain have been particularly elusive. That makes the recent study by Li et al. especially exciting ("Extensive, Recent Intron Gains in Daphnia populations," in Science 2009, from Michael Lynch's group at Indiana). They have found that intron gain is remarkably common in Daphnia, and that the new introns lack features expected from most hypothesized mechanisms of intron gain. The independent gain of introns in parallel at the same site in different lineages is also observed, and also unexpected. These authors hypothesize that intron gain may arise fortuitously as a consequence of DNA damage, but this remains to be established. Whatever the mechanism, the observation that new introns can arise at reasonable rates in at least one species provides both an important clue to the origins of introns and a system for further investigation.

A consideration of the allele-frequency spectrum suggests that these new introns in Daphnia (also known as the water flea) are indeed deleterious, bringing to mind a famous poem by Jonathan Swift ("On Poetry: A Rhapsody", pub. 1733):
"So nat'ralists observe, a flea
Hath smaller fleas that on him prey,
And these have smaller fleas that bite 'em,
And so proceed ad infinitum."