Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 67,594 bioRxiv papers from 298,187 authors.
Electronic Health Records Based Prediction of Future Incidence of Alzheimers Disease Using Machine Learning
Background: Accurate prediction of future incidence of Alzheimers disease may facilitate intervention strategy to delay disease onset. Existing AD risk prediction models require collection of biospecimen (genetic, CSF, or blood samples), cognitive testing, or brain imaging. Conversely, EHR provides an opportunity to build a completely automated risk prediction model based on individuals history of health and healthcare. We tested machine learning models to predict future incidence of AD using administrative EHR in individuals aged 65 or older. Methods: We obtained de-identified EHR from Korean elders age above 65 years old (N=40,736) collected between 2002 and 2010 in the Korean National Health Insurance Service database system. Consisting of Participant Insurance Eligibility database, Healthcare Utilization database, and Health Screening database, our EHR contain 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness, and socio-demographics. Our event of interest was new incidence of AD defined from the EHR based on both AD codes and prescription of anti-dementia medication. Two definitions were considered: a more stringent one requiring a diagnosis and dementia medication resulting in n=614 cases (definite AD) and a more liberal one requiring only diagnostic codes (n=2,026; probable AD). We trained and validated a random forest, support vector machine, and logistic regression to predict incident AD in 1,2,3, and 4 subsequent years using the EHR available since 2002. The length of the EHR used in the models ranged from 1,571 to 2,239 days. Model training, validation, and testing was done using iterative (5 times), nested, stratified 5-fold cross validation. Results: Average duration of EHR was 1,936 days in AD and 2,694 days in controls. For predicting future incidence of AD using the definite AD outcome, the machine learning models showed the best performance in 1 year prediction with AUC of 0.781; in 2 year, 0.739; in 3 year, 0.686; in 4 year, 0.662. Using probable AD outcome, the machine learning models showed the best performance in 1 year prediction with AUC of 0.730; in 2 year, 0.645; in 3 year, 0.575; in 4 year, 0.602. Important clinical features selected in logistic regression included hemoglobin level (b=-0.902), age (b=0.689), urine protein level (b=0.303), prescription of Lodopin (antipsychotic drug) (b=0.303), and prescription of Nicametate Citrate (vasodilator) (b=-0.297). Conclusion: This study demonstrates that EHR can detect risk for incident AD. This approach could enable risk-specific stratification of elders for better targeted clinical trials.
- Downloaded 188 times
- Download rankings, all-time:
- Site-wide: 50,428 out of 67,591
- In bioinformatics: 5,527 out of 6,655
- Year to date:
- Site-wide: 27,987 out of 67,591
- Since beginning of last month:
- Site-wide: 21,094 out of 67,591
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!