Rxivist logo

Toward perfect reads: short reads correction via mapping on compacted de Bruijn graphs

By Antoine Limasset, Jean-Francois Flot, Pierre Peterlongo

Posted 28 Feb 2019
bioRxiv DOI: 10.1101/558395 (published DOI: 10.1093/bioinformatics/btz102)

Motivations: Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large data sets or consider reads as mere suites of k-mers, without taking into account their full-length read information. Results: We propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. Availability and Implementation: The implementation is open source and available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package. Contact: Antoine Limasset antoine.limasset@gmail.com & Jean-Francois Flot jflot@ulb.ac.be & Pierre Peterlongo pierre.peterlongo@inria.fr

Download data

  • Downloaded 571 times
  • Download rankings, all-time:
    • Site-wide: 31,840 out of 106,159
    • In bioinformatics: 4,139 out of 9,474
  • Year to date:
    • Site-wide: 67,396 out of 106,159
  • Since beginning of last month:
    • Site-wide: 49,513 out of 106,159

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)