Rxivist logo

G2P: Using machine learning to understand and predict genes causing rare neurological disorders

By Juan A. Botía, Sebastian Guelfi, David Zhang, Karishma D’Sa, Regina H Reynolds, Daniel Onah, Ellen M. McDonagh, Antonio Rueda Martin, Arianna Tucci, Augusto Rendon, Henry Houlden, John Hardy, Mina Ryten

Posted 27 Mar 2018
bioRxiv DOI: 10.1101/288845

To facilitate precision medicine and neuroscience research, we developed a machine-learning technique that scores the likelihood that a gene, when mutated, will cause a neurological phenotype. We analysed 1126 genes relating to 25 subtypes of Mendelian neurological disease defined by Genomics England (March 2017) together with 154 gene-specific features capturing genetic variation, gene structure and tissue-specific expression and co-expression. We randomly re-sampled genes with no known disease association to develop bootstrapped decision-tree models, which were integrated to generate a decision tree-based ensemble for each disease subtype. Genes generating larger numbers of distinct transcripts and with higher probability of having missense mutations in normal individuals were significantly more likely to cause neurological diseases. Using mouse-mutant phenotypic data we tested the accuracy of gene-phenotype predictions and found that for 88% of all disease subtypes there was a significant enrichment of relevant phenotypic abnormalities when predicted genes were mutated in mice and in many cases mutations produced specific and matching phenotypes. Furthermore, using only newly identified genes included in the Genomics England November 2017 release, we assessed our gene-phenotype predictions and showed an 8.3 fold enrichment relative to chance for correct predictions. Thus, we demonstrate both the explanatory and predictive power of machine-learning-based models in neurological disease.

Download data

  • Downloaded 1,980 times
  • Download rankings, all-time:
    • Site-wide: 7,547
    • In genetics: 355
  • Year to date:
    • Site-wide: 29,581
  • Since beginning of last month:
    • Site-wide: 25,725

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News