Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters
Nathaniel T Hawkins,
Posted 04 Feb 2019
bioRxiv DOI: 10.1101/539833 (published DOI: 10.1186/s12859-019-2951-x)
Posted 04 Feb 2019
Background: Single cell RNA sequencing (scRNA-seq) brings unprecedented opportunities for mapping the heterogeneity of complex cellular environments such as bone marrow, and provides insight into many cellular processes. Single cell RNA-seq, however, has a far larger fraction of missing data reported as zeros (dropouts) than traditional bulk RNA-seq. This makes difficult not only the clustering of cells, but also the assignment of the resulting clusters into predefined cell types based on known molecular signatures, such as the expression of characteristic cell surface markers. Results: We present a computational tool for processing single cell RNA-seq data that uses a voting algorithm to identify cells based on approval votes received by known molecular markers. Using a stochastic procedure that accounts for biases due to dropout errors and imbalances in the number of known molecular signatures for different cell types, the method computes the statistical significance of the final approval score and automatically assigns a cell type to clusters without an expert curator. We demonstrate the utility of the tool in the analysis of eight samples of bone marrow from the Human Cell Atlas. The tool provides a systematic identification of cell types in bone marrow based on a recently-published manually-curated cell marker database, and incorporates a suite of visualization tools that can be overlaid on a t-SNE representation. The software is freely available as a python package at https://github.com/sdomanskyi/DigitalCellSorter. Conclusions: This methodology assures that extensive marker to cell type matching information is taken into account in a systematic way when assigning cell clusters to cell types. Moreover, the method allows for a high throughput processing of multiple scRNA-seq datasets, since it does not involve an expert curator, and it can be applied recursively to obtain cell sub-types. The software is designed to allow the user to substitute the marker to cell type matching information and apply the methodology to different cellular environments.
- Downloaded 423 times
- Download rankings, all-time:
- Site-wide: 45,746 out of 103,808
- In bioinformatics: 5,424 out of 9,474
- Year to date:
- Site-wide: 72,989 out of 103,808
- Since beginning of last month:
- Site-wide: 74,438 out of 103,808
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!