Benchmarking of computational error-correction methods for next-generation sequencing data
By
Keith Mitchell,
Jaqueline J. Brito,
Igor Mandric,
Qiaozhen Wu,
Sergey Knyazev,
Sei Chang,
Lana S. Martin,
Aaron Karlsberg,
Ekaterina Gerasimov,
Russell Littman,
Brian L. Hill,
Nicholas C. Wu,
Harry Yang,
Kevin Hsieh,
Linus Chen,
Eli Littman,
Taylor Shabani,
German Enik,
Douglas Yao,
Ren Sun,
Jan Schroeder,
Eleazar Eskin,
Alex Zelikovsky,
Pavel Skums,
Mihai Pop,
Serghei Mangul
Posted 20 May 2019
bioRxiv DOI: 10.1101/642843
(published DOI: 10.1186/s13059-020-01988-3)
Background Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. Results In this paper, we evaluate the ability of error-correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error correction methods. Conclusions In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity
Download data
- Downloaded 1,413 times
- Download rankings, all-time:
- Site-wide: 11,737
- In bioinformatics: 1,459
- Year to date:
- Site-wide: 47,642
- Since beginning of last month:
- Site-wide: 44,921
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!