Rxivist logo

Ontology-Aware Deep Learning Enables Ultrafast, Accurate and Interpretable Source Tracking among Sub-Million Microbial Community Samples from Hundreds of Niches

By Kang Ning, Yuguo Zha, Hui Chong, Hao Qiu, Kai Kang, Yuzheng Dun, Zhixue Chen, Xuefeng Cui

Posted 02 Nov 2020
bioRxiv DOI: 10.1101/2020.11.01.364208

The taxonomical structure of microbial community sample is highly habitat-specific, making it possible for source tracking niches where samples are originated. Current methods face challenges when the number of samples and niches are magnitudes more than current in use, under which circumstances they are unable to accurately source track samples in a timely manner, rendering them difficult in knowledge discovery from sub-million heterogeneous samples. Here, we introduce a deep learning method based on Ontology-aware Neural Network approach, ONN4MST (https://github.com/HUST-NingKang-Lab/ONN4MST), which takes into consideration the ontology structure of niches and the relationship of samples from these ontologically-organized niches. ONN4MST's superiority in accuracy, speed and robustness have been proven, for example with an accuracy of 0.99 and AUC of 0.97 in a microbial source tracking experiment that 125,823 samples and 114 niches were involved. Moreover, ONN4MST has been utilized on several source tracking applications, showing that it could provide highly-interpretable results from samples with previously less-studied niches, detect microbial contaminants, and identify similar samples from ontologically-remote niches, with high fidelity. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 242 times
  • Download rankings, all-time:
    • Site-wide: 136,390
    • In bioinformatics: 10,789
  • Year to date:
    • Site-wide: 97,268
  • Since beginning of last month:
    • Site-wide: 62,580

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide