Rxivist logo

ProSolo: Accurate Variant Calling from Single Cell DNA Sequencing Data

By David Lähnemann, Johannes Köester, Ute Fischer, Arndt Borkhardt, Alice C. McHardy, Alexander Schönhuth

Posted 28 Apr 2020
bioRxiv DOI: 10.1101/2020.04.27.064071

Obtaining accurate mutational profiles from single cell DNA is essential for the analysis of genomic cell-to-cell heterogeneity at the finest level of resolution. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. As a result, single cell DNA sequencing data violates the assumptions of variant callers developed for bulk sequencing, which when applied to single cells generate significant numbers of false positives and false negatives. Only dedicated models accounting for amplification bias and errors will be able to provide more accurate calls. We present ProSolo, a probabilistic model for calling single nucleotide variants from multiple displacement amplified single cell DNA sequencing data. It introduces a mechanistically motivated empirical model of amplification bias that improves the quantification of genotyping uncertainty. To account for amplification errors, it jointly models the single cell sample with a bulk sequencing sample from the same cell population\---|also enabling a biologically relevant imputation of missing genotypes for the single cell. Through these innovations, ProSolo achieves substantially higher performance in calling and genotyping single nucleotide variants in single cells in comparison to all state-of-the-art tools. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly; not only for single nucleotide variant calls, but also for artefacts of single cell methodology that one may wish to identify, such as allele dropout. ProSolo's model is implemented into a flexible framework, encouraging extensions. The source code and usage instructions are available at: https://github.com/prosolo/prosolo ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 418 times
  • Download rankings, all-time:
    • Site-wide: 46,385 out of 106,159
    • In bioinformatics: 5,475 out of 9,474
  • Year to date:
    • Site-wide: 11,257 out of 106,159
  • Since beginning of last month:
    • Site-wide: 17,880 out of 106,159

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News