Rxivist logo

Automated quality control of next generation sequencing data using machine learning

By Steffen Albrecht, Miguel A. Andrade-Navarro, Jean-Fred Fontaine

Posted 14 Sep 2019
bioRxiv DOI: 10.1101/768713

Controlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following URL: <https://github.com/salbrec/seqQscorer>.

Download data

  • Downloaded 701 times
  • Download rankings, all-time:
    • Site-wide: 18,530 out of 84,502
    • In bioinformatics: 2,753 out of 8,105
  • Year to date:
    • Site-wide: 2,408 out of 84,502
  • Since beginning of last month:
    • Site-wide: 3,056 out of 84,502

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)