Rxivist logo

Comparison of three variant callers for human whole genome sequencing

By Anna Supernat, Oskar Valdimar Vidarsson, Vidar M. Steen, Tomasz Stokowy

Posted 05 Nov 2018
bioRxiv DOI: 10.1101/461798 (published DOI: 10.1038/s41598-018-36177-7)

Testing of patients with genetics-related disorders is in progress of shifting from single gene assays to gene panel sequencing, whole-exome sequencing (WES) and whole-genome sequencing (WGS). Since WGS is unquestionably becoming a new foundation for molecular analyses, we decided to compare three currently used tools for variant calling of human whole genome sequencing data. We tested DeepVariant, a new TensorFlow machine learning-based variant caller, and compared this tool to GATK 4.0 and SpeedSeq, using 30x, 15x and 10x WGS data of the well-known NA12878 DNA reference sample. According to our comparison, the performance on SNV calling was almost similar in 30x data, with all three variant callers reaching F-Scores (i.e. harmonic mean of recall and precision) equal to 0.98. In contrast, DeepVariant was more precise in indel calling than GATK and SpeedSeq, as demonstrated by F-Scores of 0.94, 0.90 and 0.84, respectively. We conclude that the DeepVariant tool has great potential and usefulness for analysis of WGS data in medical genetics.

Download data

  • Downloaded 685 times
  • Download rankings, all-time:
    • Site-wide: 20,643 out of 89,678
    • In genomics: 2,398 out of 5,709
  • Year to date:
    • Site-wide: 41,550 out of 89,678
  • Since beginning of last month:
    • Site-wide: 53,329 out of 89,678

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)