Succinct dynamic variation graphs

By Jordan Eizenga, Adam M. Novak, Emily Kobayashi, Flavia Villani, Cecilia Cisar, Simon Heumos, Glenn Hickey, Vincenza Colonna, Benedict Paten, Erik Garrison

Posted 25 Apr 2020
bioRxiv DOI: 10.1101/2020.04.23.056317

Motivation: Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this data model can be difficult, and the scale of pangenomic data sets can be challenging to work at. These challenges have impeded progress in this field. Results: Here we present a stack of two C++ libraries, libbdsg and libhandlegraph, which use a simple, field-proven interface, designed to expose elementary features of these graphs while preventing common graph manipulation mistakes. The libraries also provide a Python binding. Using a diverse collection of pangenome graphs, we demonstrate that these tools allow for efficient construction and manipulation of large genome graphs with dense variation. For instance, the speed and memory usage is up to an order of magnitude better than the prior graph implementation in the vg toolkit, which has now transitioned to using libbdsg's implementations. Availability: libhandlegraph and libbdsg are available under an MIT License from https: //github.com/vgteam/libhandlegraph and https://github.com/vgteam/libbdsg. ### Competing Interest Statement The authors have declared no competing interest.

