Rxivist logo

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

By Camille Marchet, Zamin Iqbal, Daniel Gautheret, Mikael Salson, Rayan Chikhi

Posted 30 Mar 2020
bioRxiv DOI: 10.1101/2020.03.29.014159 (published DOI: 10.1093/bioinformatics/btaa487)

Motivation: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. Results: We used REINDEER to index the abundances of sequences within 2,585 human RNA-seq experiments in 45 hours using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ~4 billion distinct k-mers across 2,585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph (DBG) of each dataset, then conceptually merges those DBGs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. Availability: https://github.com/kamimrcht/REINDEER Contact: camille.marchet@univ-lille.fr ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 830 times
  • Download rankings, all-time:
    • Site-wide: 18,692 out of 106,159
    • In bioinformatics: 2,755 out of 9,474
  • Year to date:
    • Site-wide: 3,862 out of 106,159
  • Since beginning of last month:
    • Site-wide: 10,392 out of 106,159

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)