Rxivist logo

Superbubbles, Ultrabubbles and Cacti

By Benedict Paten, Adam M. Novak, Erik Garrison, Glenn Hickey

Posted 18 Jan 2017
bioRxiv DOI: 10.1101/101493 (published DOI: 10.1089/cmb.2017.0251)

A superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genet- ics, the possible paths through a superbubble can be considered to rep- resent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation prob- lems. Here we define snarls and ultrabubbles, generalizations of super- bubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which we show encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural vari- ations, without need for any single reference genome coordinate system. Furthermore, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats, e.g. VCF.

Download data

  • Downloaded 1,117 times
  • Download rankings, all-time:
    • Site-wide: 14,424 out of 118,130
    • In bioinformatics: 1,847 out of 9,572
  • Year to date:
    • Site-wide: 73,739 out of 118,130
  • Since beginning of last month:
    • Site-wide: 78,199 out of 118,130

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)