Motivation Zoonosis, the natural transmission of infections from animals to humans, is a far-reaching global problem. The recent outbreaks of Zika virus, Ebola virus and Corona virus are examples of viral zoonosis, which occur more frequently due to globalization. In the case of a virus outbreak, it is helpful to know which host organism was the original carrier of the virus. Once the reservoir or intermediate host is known, it can be isolated to prevent further spreading of the viral infection. Recent approaches aim to predict a viral host based on the viral genome, often in combination with the potential host genome and arbitrarily selected features. These methods have a clear limitation in either the number of different hosts they can predict or the accuracy of their prediction. Results Here, we present a fast and accurate deep learning approach for viral host prediction, which is based on the viral genome sequence only. To ensure a high prediction accuracy, we developed an effective selection approach for the training data to avoid biases due to a highly unbalanced number of known sequences per virus-host combinations. We tested our deep neural network on three different virus species (influenza A, rabies lyssavirus, rotavirus A). We reached for each virus species an AUG between 0.93 and 0.98, outperforming previous approaches and allowing highly accurate predictions while only using fractions (100-400 bp) of the viral genome sequences. We show that deep neural networks are suitable to predict the host of a virus, even with a limited amount of sequences and highly unbalanced available data. The deep neural networks trained for this approach build the core of the virus-host predicting tool VIDHOP (Virus Deep learning HOst Prediction). Availability The trained models for the prediction of the host for the viruses influenza A, rabies lyssavirus, rotavirus A are implemented in the tool VIDHOP. This tool is freely available under <https://github.com/flomock/vidhop>. Supplementary information Supplementary data are available at DOI 10.17605/OSF.IO/UXT7N
- Downloaded 2,340 times
- Download rankings, all-time:
- Site-wide: 3,369 out of 106,159
- In bioinformatics: 580 out of 9,474
- Year to date:
- Site-wide: 2,511 out of 106,159
- Since beginning of last month:
- Site-wide: 15,021 out of 106,159
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!