Controlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following URL: <https://github.com/salbrec/seqQscorer>.
- Downloaded 701 times
- Download rankings, all-time:
- Site-wide: 18,530 out of 84,502
- In bioinformatics: 2,753 out of 8,105
- Year to date:
- Site-wide: 2,408 out of 84,502
- Since beginning of last month:
- Site-wide: 3,056 out of 84,502
Downloads over time
Distribution of downloads per paper, site-wide
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!