Experimenting with reproducibility in bioinformatics
By
Yang-Min Kim,
Jean-Baptiste Poline,
Guillaume Dumas
Posted 20 Jun 2017
bioRxiv DOI: 10.1101/143503
(published DOI: 10.1093/gigascience/giy077)
Reproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of the scien-tific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our attempt to reproduce a promising bioinformatics method [1] and illustrate the challenges to use a published method for which code and data were available. First, we tried to re-run the analysis with the code and data provided by the au-thors. Second, we reimplemented the method in Python to avoid dependency on a MATLAB licence and ease the execu-tion of the code on HPCC (High-Performance Computing Cluster). Third, we assessed reusability of our reimplementation and the quality of our documentation. Then, we experimented with our own software and tested how easy it would be to start from our implementation to reproduce the results, hence attempting to estimate the robustness of the reproducibility. Finally, in a second part, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective level.
Download data
- Downloaded 2,393 times
- Download rankings, all-time:
- Site-wide: 9,080
- In bioinformatics: 905
- Year to date:
- Site-wide: 81,519
- Since beginning of last month:
- Site-wide: 140,620
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!