Rxivist logo

SeqOthello: Query over RNA-seq experiments at scale

By Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Chen Qian, Jinze Liu

Posted 01 Feb 2018
bioRxiv DOI: 10.1101/258772

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. SeqOthello requires only five minutes to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets on a standard computer with 19.1 GB memory space. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and further reveals 270 novel fusion occurrences, all of which present as tumor-specific. The entire index is only 76 GB, achieving a 700:1 compression ratio relative to the original sequencing data and making it extremely portable. This is the first sequence search index constructed on the scale of TCGA data. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs. SeqOthello is currently available at https://github.com/LiuBioinfo/SeqOthello.

Download data

  • Downloaded 648 times
  • Download rankings, all-time:
    • Site-wide: 31,644 out of 118,665
    • In bioinformatics: 3,667 out of 9,595
  • Year to date:
    • Site-wide: 104,608 out of 118,665
  • Since beginning of last month:
    • Site-wide: 108,675 out of 118,665

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News