Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 65,436 bioRxiv papers from 289,855 authors.
Semi-Supervised Learning of the Electronic Health Record for Phenotype Stratification
Brett K Beaulieu-Jones,
Casey S. Greene,
the Pooled Resource Open-Access ALS Clinical Trials Consortium
Posted 18 Feb 2016
bioRxiv DOI: 10.1101/039800 (published DOI: 10.1016/j.jbi.2016.10.007)
Posted 18 Feb 2016
Patient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes, which can be paired with genetic data to identify genes underlying common human diseases. This task remains challenging: high quality phenotyping is costly and requires physician review; many fields in the records are sparsely filled; and our definitions of diseases are continuing to improve over time. Here we develop and evaluate a semi-supervised learning method for EHR phenotype extraction using denoising autoencoders for phenotype stratification. By combining denoising autoencoders with random forests we find classification improvements across multiple simulation models and improved survival prediction in ALS clinical trial data. This is particularly evident in cases where only a small number of patients have high quality phenotypes, a common scenario in EHR-based research. Denoising autoencoders perform dimensionality reduction enabling visualization and clustering for the discovery of new subtypes of disease. This method represents a promising approach to clarify disease subtypes and improve genotype-phenotype association studies that leverage EHRs.
- Downloaded 2,417 times
- Download rankings, all-time:
- Site-wide: 1,677 out of 65,436
- In bioinformatics: 336 out of 6,486
- Year to date:
- Site-wide: 17,232 out of 65,436
- Since beginning of last month:
- Site-wide: 16,599 out of 65,436
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!