Rxivist logo

Valection: Design Optimization for Validation and Verification Studies

By Christopher I Cooper, Delia Yao, Dorota H. Sendorek, Takafumi N. Yamaguchi, Christine P’ng, Cristian Caloian, Michael Fraser, SMC-DNA Challenge Participants, Kyle Ellrott, Adam A. Margolin, Robert G Bristow, Joshua M. Stuart, Paul C. Boutros

Posted 28 Jan 2018
bioRxiv DOI: 10.1101/254839 (published DOI: 10.1186/s12859-018-2391-z)

Background: Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. In disciplines that rely heavily on high-throughput data generation, such as genomics, reducing the impact of false positive and false negative rates in results is a top priority. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. To determine how to create subsets of predictions for validation that maximize inference of global error profiles, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. Results: To evaluate these selection strategies, we obtained 261 sets of somatic mutation calls from a single-nucleotide variant caller benchmarking challenge where 21 teams competed on whole-genome sequencing datasets of three computationally-simulated tumours. By using synthetic data, we had complete ground truth of the tumours' mutations and, therefore, we were able to accurately determine how estimates from the selected subset of verification candidates compared to the complete prediction set. We found that selection strategy performance depends on several verification study characteristics. In particular the verification budget of the experiment (i.e. how many candidates can be selected) is shown to influence estimates. Conclusions: The Valection framework is flexible, allowing for the implementation of additional selection algorithms in the future. Its applicability extends to any discipline that relies on experimental verification and will benefit from the optimization of verification candidate selection.

Download data

  • Downloaded 222 times
  • Download rankings, all-time:
    • Site-wide: 54,918 out of 77,108
    • In bioinformatics: 5,943 out of 7,442
  • Year to date:
    • Site-wide: 70,992 out of 77,108
  • Since beginning of last month:
    • Site-wide: 62,743 out of 77,108

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)