Rxivist logo

AllSome Sequence Bloom Trees

By Chen Sun, Robert S. Harris, Rayan Chikhi, Paul Medvedev

Posted 02 Dec 2016
bioRxiv DOI: 10.1101/090464 (published DOI: 10.1089/cmb.2017.0258)

The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39 - 85%. Notably, it can query a batch of 198,074 queries in under 8 hours(compared to around two days previously) and a whole set of k-mers from a sequencing experiment(about 27 mil k-mers) in under 11 minutes.

Download data

  • Downloaded 1,907 times
  • Download rankings, all-time:
    • Site-wide: 7,497
    • In bioinformatics: 923
  • Year to date:
    • Site-wide: 61,493
  • Since beginning of last month:
    • Site-wide: 53,786

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)