Rxivist logo

Interpretable, scalable, and transferrable functional projection of large-scale transcriptome data using constrained matrix decomposition

By Nicolas Panchy, Kazuhide Watanabe, Tian Hong

Posted 14 Apr 2021
bioRxiv DOI: 10.1101/2021.04.13.439654

Large-scale transcriptome data, such as single-cell RNA-sequencing data, have provided unprecedented resources for studying biological processes at the systems level. Numerous dimensionality reduction methods have been developed to visualize and analyze these transcriptome data. In addition, several existing methods allow inference of functional variations among samples using gene sets with known biological functions. However, it remains challenging to analyze transcriptomes with reduced dimensions that are both interpretable in terms of dimensions, directionalities and transferrable to new data. In this study, we used gene set non-negative principal component analysis (gsPCA) and non-negative matrix factorization (gsNMF) to analyze large-scale transcriptome datasets. We found that these methods provide low-dimensional information about the progression of biological processes in a quantitative manner, and their performances are comparable to existing functional variation analysis methods in terms of distinguishing multiple cell states and samples from multiple conditions. Remarkably, upon training with a subset of data, these methods allow predictions of locations in the functional space using data from experimental conditions that are not exposed to the models. Specifically, our models predicted the extent of progression and reversion for cells in the epithelial-mesenchymal transition (EMT) continuum. These methods revealed conserved EMT program among multiple types of single cells and tumor samples. Finally, we provide several recommendations on the choice between the two linear methods and the optimal algorithmic parameters. Our methods show that simple constrained matrix decomposition can produce to low-dimensional information in functionally interpretable and transferrable space, and can be widely useful for analyzing large-scale transcriptome data.

Download data

  • Downloaded 220 times
  • Download rankings, all-time:
    • Site-wide: 155,434
    • In bioinformatics: 11,578
  • Year to date:
    • Site-wide: 131,607
  • Since beginning of last month:
    • Site-wide: 132,226

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide