Computational Pan-Genomics: Status, Promises and Challenges
By
The Computational Pan-Genomics Consortium,
Tobias Marschall,
Manja Marz,
Thomas Abeel,
Louis Dijkstra,
Bas E. Dutilh,
Ali Ghaffaari,
Paul J Kersey,
Wigard P. Kloosterman,
Veli Mäkinen,
Adam M Novak,
Benedict Paten,
David Porubsky,
Eric Rivals,
Can Alkan,
Jasmijn Baaijens,
Paul I. W. De Bakker,
Valentina Boeva,
Raoul J P Bonnal,
Francesca Chiaromonte,
Rayan Chikhi,
Francesca D Ciccarelli,
Robin Cijvat,
Erwin Datema,
Cornelia M Van Duijn,
Evan E. Eichler,
Corinna Ernst,
Eleazar Eskin,
Erik Garrison,
Mohammed El-Kebir,
Gunnar W. Klau,
Jan Korbel,
Eric-Wubbo Lameijer,
Benjamin Langmead,
Marcel Martin,
Paul Medvedev,
John C. Mu,
Pieter Neerincx,
Klaasjan Ouwens,
Pierre Peterlongo,
Nadia Pisanti,
S. Rahmann,
Ben Raphael,
Knut Reinert,
Dick de Ridder,
Jeroen de Ridder,
Matthias Schlesner,
Ole Schulz-Trieglaff,
Ashley D. Sanders,
Siavash Sheikhizadeh,
Carl Shneider,
Sandra Smit,
Daniel Valenzuela,
Jiayin Wang,
Lodewyk Wessels,
Ying Zhang,
Victor Guryev,
Fabio Vandin,
Kai Ye,
Alexander Schönhuth
Posted 12 Mar 2016
bioRxiv DOI: 10.1101/043430
(published DOI: 10.1093/bib/bbw089)
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic datasets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this paper, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies, and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Download data
- Downloaded 4,549 times
- Download rankings, all-time:
- Site-wide: 1,978
- In genomics: 225
- Year to date:
- Site-wide: 22,459
- Since beginning of last month:
- Site-wide: 16,589
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!