Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,747 bioRxiv papers from 278,434 authors.
Next Generation Sequencing is a powerful technology highly relevant in biomedical research and pharmaceutical industry. Applied in combination with molecular assays it provides detailed insights in genomic properties such as chromatin accessibility or protein-DNA interactions. However, the biological relevance of results from these assays is extremely sensitive to the quality of the sequencing data. So far, quality control tools require extensive computational resources and manual inspection. This is critical considering the increasing amount of sequencing data due to decreasing costs. In this study, we investigated the possibility to automatically classify the quality of a large set of raw sequencing data in fastq format by using state-of-the-art machine learning algorithms and a comprehensive grid search to tune the parameters. The results showed high classification accuracy in discriminating between low and high quality files. Gradient Boosting machines were performing the best in most of the tested scenarios. Furthermore, some species-specific models were generalizable to classify data files from other species. The results suggest that the studied algorithms have high potential to be used routinely in biomedical and clinical applications to classify sequencing data according to quality. We provide a script that allows the application of the pre trained classification models to new data: https://github.com/salbrec/seqQscorer .
- Downloaded 123 times
- Download rankings, all-time:
- Site-wide: 54,275 out of 62,747
- In bioinformatics: 5,748 out of 6,251
- Year to date:
- Site-wide: 33,891 out of 62,747
- Since beginning of last month:
- Site-wide: 2,427 out of 62,747
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!