Rxivist logo

TADA - a Machine Learning Tool for Functional Annotation based Prioritisation of Putative Pathogenic CNVs

By J. Hertzberg, Stefan Mundlos, Martin Vingron, Giuseppe Gallone

Posted 01 Jul 2020
bioRxiv DOI: 10.1101/2020.06.30.180711

The computational prediction of disease-associated genetic variation is of fundamental importance for the genomics, genetics and clinical research communities. Whereas the mechanisms and disease impact underlying coding single nucleotide polymorphisms (SNPs) and small Insertions/Deletions (InDels) have been the focus of intense study, little is known about the corresponding impact of structural variants (SVs), which are challenging to detect, phase and interpret. Few methods have been developed to prioritise larger chromosomal alterations such as Copy Number Variants (CNVs) based on their pathogenicity. We address this issue with TADA, a method to prioritise pathogenic CNVs through manual filtering and automated classification, based on an extensive catalogue of functional annotation supported by rigorous enrichment analysis. We demonstrate that our machine-learning classifiers for deletions and duplications are able to accurately predict pathogenic CNVs (AUC: 0.8042 and 0.7869, respectively) and produce a well-calibrated pathogenicity score. The combination of enrichment analysis and classifications suggests that prioritisation of pathogenic CNVs based on functional annotation is a promising approach to support clinical diagnostic and to further the understanding of mechanisms that control the disease impact of larger genomic alterations.

Download data

  • Downloaded 224 times
  • Download rankings, all-time:
    • Site-wide: 96,780
    • In genetics: 4,396
  • Year to date:
    • Site-wide: 25,994
  • Since beginning of last month:
    • Site-wide: 21,131

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)