Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype
Rick A. A. van der Spek,
Bas E. Dutilh,
Posted 29 Jan 2019
bioRxiv DOI: 10.1101/533679 (published DOI: 10.1093/bioinformatics/btz369)
Posted 29 Jan 2019
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype-phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the ProjectMinE dataset. Based on recent insight that regulatory regions on the genome play a major role in ALS, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. Our approach identifies potential ALS-associated genetic variants, and generally outperforms other classification methods. Test results support the hypothesis that ALS is caused by non-additive combinations of variants. Our method can be applied to large-scale whole genome data. We consider this a first step towards genotype-phenotype association with deep learning that is tailored to genomics and can deal with genome-sized data.
- Downloaded 875 times
- Download rankings, all-time:
- Site-wide: 20,683 out of 118,130
- In genetics: 1,073 out of 5,131
- Year to date:
- Site-wide: 66,602 out of 118,130
- Since beginning of last month:
- Site-wide: 58,868 out of 118,130
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!