Rxivist logo

Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem

By Yutong Qiu, Cong Ma, Han Xie, Carl Kingsford

Posted 09 Jul 2019
bioRxiv DOI: 10.1101/697367 (published DOI: 10.1186/s13015-020-00170-5)

Transcriptomic structural variants (TSVs) -- structural variants that affect expressed regions -- are common, especially in cancer. Detecting TSVs is a challenging computational problem. Sample heterogeneity (including differences between alleles in diploid organisms) is a critical confounding factor when identifying TSVs. To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangement Problem (MCAP), which seeks k genome rearrangements to maximize the number of reads that are concordant with at least one rearrangement.This directly models the situation of a heterogeneous or diploid sample. We prove that MCAP is NP-hard and provide an 1/4-approximation algorithm for k=1 and a 3/4-approximation algorithm for the diploid case (k=2) assuming an oracle for k=1. Combining these, we obtain a 3/16-approximation algorithm for MCAP when k=2 (without an oracle). We also present an integer linear programming formulation for general k. We completely characterize the graph structures that require k>1 to satisfy all edges and show such structures are prevalent in cancer samples. We evaluate our algorithms on 381 TCGA samples and 2 cancer cell lines and show improved performance compared to the state-of-the-art TSV-calling tool, SQUID.

Download data

  • Downloaded 228 times
  • Download rankings, all-time:
    • Site-wide: 61,203 out of 84,956
    • In bioinformatics: 6,455 out of 8,136
  • Year to date:
    • Site-wide: 59,548 out of 84,956
  • Since beginning of last month:
    • Site-wide: 33,582 out of 84,956

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)