UK phenomics platform for developing and validating EHR phenotypes: CALIBER
Natalie K Fitzpatrick,
Richard J Dobson,
Laurence J Howe,
R. Tom Lumbers,
Riyaz S Patel,
Anoop D. Shah,
Aroon D. Hingorani,
Cathie LM Sudlow,
Posted 04 Feb 2019
bioRxiv DOI: 10.1101/539403
Posted 04 Feb 2019
Objective Electronic Health Records (EHR) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems and collected for purposes other than medical research. We describe an approach for developing, validating and sharing reproducible phenotypes from national structured EHR in the United Kingdom (UK) with applications for translational research. Materials and Methods We implemented a rule-based phenotyping framework, with up to six approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements e.g. blood pressure, medication information and coded diagnoses, symptoms, procedures and referrals, recorded using five controlled clinical terminologies: a) Read (primary care, subset of SNOMED-CT), b) International Classification of Diseases 9th/10th Revision (ICD-9, ICD-10, secondary care diagnoses and cause of mortality), c) OPCS Classification of Interventions and Procedures (OPCS-4, hospital surgical procedures), and d) DM+D prescription codes. Results Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers and lifestyle risk factors and provide up to six validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (<https://www.caliberresearch.org/portal>) and have been used by 40 national/international research groups in 60 peer-reviewed publications. Conclusion We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step towards international use of UK EHR data for health research.
- Downloaded 934 times
- Download rankings, all-time:
- Site-wide: 21,732
- In epidemiology: 1,226
- Year to date:
- Site-wide: 32,462
- Since beginning of last month:
- Site-wide: 34,405
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!