Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 65,152 bioRxiv papers from 288,686 authors.

Varying-Censoring Aware Matrix Factorization for Single Cell RNA-Sequencing

By F. William Townes, Stephanie C Hicks, Martin Aryee, Rafael A. Irizarry

Posted 21 Jul 2017
bioRxiv DOI: 10.1101/166736

Single cell RNA-Seq (scRNA-Seq) has become the most widely used high-throughput technology for gene expression profiling of individual cells. The potential of being able to measure cell-to-cell variability at a high-dimensional genomic scale opens numerous new lines of investigation in basic and clinical research. For example, by identifying groups of cells with expression profiles unlike those observed in cells with known phenotypes, new cell types may be discovered. Dimension reduction followed by unsupervised clustering are the quantitative approaches typically used to facilitate such discoveries. However, a challenge for this approach is that most scRNA-Seq datasets are sparse, with the percentages of measurements reported as zero ranging from 35% to 99% across cells, and these zeros are partially explained by experimental inefficiencies that lead to censored data. Furthermore, the observed across-cell differences in the percentages of zeros are partly due to technical artifacts rather than biological differences. Unfortunately, standard dimension reduction approaches treat these censored values as true zeros, which leads to the identification of distorted low-dimensional factors. When these factors are used for clustering, the distortion leads to incorrect identification of biological groups. Here, we propose an approach that accounts for cell-specific censoring with a varying-censoring aware matrix factor- ization (VAMF) model that permits the identification of factors in the presence of the above described systematic bias. We demonstrate the ad- vantages of our approach on published scRNA-Seq data and confirm these on simulated data.

Download data

  • Downloaded 1,507 times
  • Download rankings, all-time:
    • Site-wide: 3,771 out of 65,152
    • In genomics: 690 out of 4,449
  • Year to date:
    • Site-wide: 36,335 out of 65,152
  • Since beginning of last month:
    • Site-wide: 32,343 out of 65,152

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide

Sign up for the Rxivist weekly newsletter! (Click here for more details.)