Rxivist logo

Clustering of mRNA-Seq data for detection of alternative splicing patterns

By Marla Johnson, Elizabeth F Purdom

Posted 30 Jun 2015
bioRxiv DOI: 10.1101/021733

Current sequencing of mRNA can provide estimates of the levels of individual isoforms within the cell, where isoforms are the different distinct mRNA products or proteins created by a gene. It remains to adapt many standard statistical methods commonly used for analyzing gene expression levels to take advantage of this additional information. One novel question is whether we can find groupings or clusters of samples that are distinguished not by their gene expression but by their isoform usage. Such clusters in tumors, for example, could be the result of shared disruption to the splicing system that creates the different isoforms. We propose a novel approach to clustering mRNA-Seq data that identifies clusters of samples with common isoform usage. We show via simulation that our methods are more sensitive to finding clusters of similar alternative splicing patterns than standard clustering techniques applied directly to the estimates of isoform levels. We further demonstrate that clustering on isoform usage is more accurate than clustering directly on isoform levels by examining real data that contains a technical artifact that resulted in different batches having different isoform usage patterns.

Download data

  • Downloaded 899 times
  • Download rankings, all-time:
    • Site-wide: 21,094
    • In bioinformatics: 2,590
  • Year to date:
    • Site-wide: 16,673
  • Since beginning of last month:
    • Site-wide: 56,065

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)