Valection: Design Optimization for Validation and Verification Studies
Christopher I Cooper,
Dorota H. Sendorek,
Takafumi N. Yamaguchi,
SMC-DNA Challenge Participants,
Adam A. Margolin,
Robert G Bristow,
Joshua M. Stuart,
Paul C. Boutros
Posted 28 Jan 2018
bioRxiv DOI: 10.1101/254839 (published DOI: 10.1186/s12859-018-2391-z)
Posted 28 Jan 2018
Background: Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. In disciplines that rely heavily on high-throughput data generation, such as genomics, reducing the impact of false positive and false negative rates in results is a top priority. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. To determine how to create subsets of predictions for validation that maximize inference of global error profiles, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. Results: To evaluate these selection strategies, we obtained 261 sets of somatic mutation calls from a single-nucleotide variant caller benchmarking challenge where 21 teams competed on whole-genome sequencing datasets of three computationally-simulated tumours. By using synthetic data, we had complete ground truth of the tumours' mutations and, therefore, we were able to accurately determine how estimates from the selected subset of verification candidates compared to the complete prediction set. We found that selection strategy performance depends on several verification study characteristics. In particular the verification budget of the experiment (i.e. how many candidates can be selected) is shown to influence estimates. Conclusions: The Valection framework is flexible, allowing for the implementation of additional selection algorithms in the future. Its applicability extends to any discipline that relies on experimental verification and will benefit from the optimization of verification candidate selection.
- Downloaded 222 times
- Download rankings, all-time:
- Site-wide: 54,918 out of 77,108
- In bioinformatics: 5,943 out of 7,442
- Year to date:
- Site-wide: 70,992 out of 77,108
- Since beginning of last month:
- Site-wide: 62,743 out of 77,108
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!