Rxivist logo

Augmented Intelligence with Natural Language Processing Applied to Electronic Health Records is Useful for Identifying Patients with Non-Alcoholic Fatty Liver Disease at Risk for Disease Progression

By Tielman T. Van Vleck, Lili Chan, Steven G Coca, Catherine K Craven, Ron Do, Stephen B Ellis, Joseph L Kannry, Ruth J.F. Loos, Peter A Bonis, Judy Cho, Girish N Nadkarni

Posted 11 Jan 2019
bioRxiv DOI: 10.1101/518217 (published DOI: 10.1016/j.ijmedinf.2019.06.028)

Objective: Electronic health record (EHR) systems contain structured data and unstructured documentation. Clinical insights can be derived from analyzing both but optimal methods for this have not been studied extensively. We compared various approaches to analyzing EHR data for non-alcoholic fatty liver disease (NAFLD). Materials and Methods: We compared analysis of structured and unstructured EHR data using natural language processing (NLP), free-text search, and diagnostic codes against expert adjudication as the reference standard. Results: Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP was more sensitive than ICD and text search (NLP 0.93 vs. ICD 0.28 vs. text search 0.81) with higher a F2 score (NLP 0.92 vs. ICD 0.34 vs. text search 0.81). 619 patients had suspected NAFLD documented in radiology notes not acknowledged in other forms of clinical documentation. Of these, 232 (37.5%) were found to have more advanced liver disease after a median of 1,057 days. Discussion: NLP-based approaches have superior accuracy in identifying NAFLD within the EHR compared to ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease. Conclusion: For identification of NAFLD, NLP performed better than alternative selection modalities and facilitated follow-on analysis of information flow. If accuracy can be proven to persist across clinical domains, NLP can identify patient phenotypes for biomedical research in an accurate and high-throughput manner.

Download data

  • Downloaded 531 times
  • Download rankings, all-time:
    • Site-wide: 48,976
    • In bioinformatics: 5,095
  • Year to date:
    • Site-wide: 60,531
  • Since beginning of last month:
    • Site-wide: 90,720

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)