Accurate and efficient detection of copy number variants (CNVs) is of critical importance due to their significant association with complex genetic diseases. Although algorithms working on whole genome sequencing (WGS) data provide stable results with mostly-valid statistical assumptions, copy number detection on whole exome sequencing (WES) data has mostly been a losing game with extremely high false discovery rates. This is unfortunate as WES data is cost efficient, compact and is relatively ubiquitous. The bottleneck is primarily due to non-contiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data and learns to correct the copy number variations reported by any over-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that (i) we can efficiently triple the duplication call precision and double the deletion call precisions of the state-of-the-art algorithms. We also show that model consistently improves the performance in a (i) sequencing technology, (ii) exome capture kit and (iii) CNV caller independent manner. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets and surge its application. The code and the models are available at https://github.com/ciceklab/DECoNT. ### Competing Interest Statement The authors have declared no competing interest.
- Downloaded 395 times
- Download rankings, all-time:
- Site-wide: 49,179 out of 103,749
- In bioinformatics: 5,674 out of 9,474
- Year to date:
- Site-wide: 12,147 out of 103,749
- Since beginning of last month:
- Site-wide: 11,250 out of 103,749
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!