Friday, August 05, 2011

All 4096 hexamers evaluated as exonic splicing elements

Exon sequences have a large effect on splicing efficiency. Specific sequences can act as ESEs (exonic splicing enhancers) to promote splicing, or as ESSs (exonic splicing suppressors) to reduce splicing. In the August 2011 issue of Genome Research, Ke et al. describe a comprehensive quantitative measure of the splicing impact of all 4,096 6-mer sequences using an Illumina Genome Analyzer to compare spliced transcripts with an input library. They tested five positions within two different internal exons in a minigene system and sequenced millions of successfully spliced transcripts after transfection of human cells. Specific hexamers had different effects in different positions, but these were correlated, and the effect on splicing of each 6-mer could be quantified. Many complications (secondary structure, synergy, effect on chromatin) are addressed by this study, which provides a huge data set. However, this is just the beginning. This paper examines only a single cell type, and it concerns only a single type of alternative splicing - exon inclusion. This high throughput approach, which captures the power of high throughput sequencing, will certainly be extended to other contexts in alternative splicing, and may prove useful for defining other nucleic acid regulatory motifs.

Sunday, March 06, 2011

Assembling haplotypes in diploid sequencing projects

Shotgun sequencing has been the dominant mode of genome sequencing since the beginning of genomics. However, assembly of a complete genome can be complicated when the two haploid genomes present within the individual being sequenced are quite different, in which case the coverage is reduced by half and the two haploid genomes must be assembled separately. Problems also arise when two haploid genomes diverge over only some of their length. For example, Barrière et al. (2009) find that, despite inbreeding designed to generate a fully homozygous sample, "approximately 10% and 30% of the Caenorhabditis remanei and C. brenneri genomes, respectively, are represented by two alleles in the assemblies."

A similar problem arises when attempting to resolve the haplotypes within an individual. In the January issue of Nature Biotechnology, Kitzman et al. describe the "Haplotype-resolved genome sequencing of a Gujarati Indian individual." Sequencing pools of large-insert clones provides information about individual haplotypes across most of the genome. The power of combining "the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning" allows parallel assembly of distinct sequence from large-insert clones to provide information about genome structure that might otherwise be very difficult to tease out of a mixed assembly.

What interests me about the method of Kitzman et al. is that it can be applied directly to cases of widespread structural polymorphism, and I expect to see it used for a variety of problems in the coming years. With this approach, or similar approaches, intractably complex genomes (e.g. Drosophila subobscura - see Sánchez-Gracia and Rozas 2011), asexual species and even metagenomic samples will yield their secrets.