Rxivist logo

A Simple Deep Learning Approach for Detecting Duplications and Deletions in Next-Generation Sequencing Data

By Tom Hill, Robert L Unckless

Posted 03 Jun 2019
bioRxiv DOI: 10.1101/657361

Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods or coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.

Download data

  • Downloaded 952 times
  • Download rankings, all-time:
    • Site-wide: 30,600
    • In bioinformatics: 3,374
  • Year to date:
    • Site-wide: 117,354
  • Since beginning of last month:
    • Site-wide: 81,864

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide