Rxivist logo

An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis

By Dominic Cushnan, Oscar Bennett, Rosalind Berka, Ottavia Bertolli, Ashwin Chopra, Samie Dorgham, Alberto Favaro, Tara Ganepola, Mark Halling-Brown, Gergely Imreh, Joseph Jacob, Emily Jefferson, Fran├žois Lemarchand, Daniel Schofield, Jeremy C Wyatt, NCCID Collaborative

Posted 03 Mar 2021
medRxiv DOI: 10.1101/2021.03.02.21252444

The National COVID-19 Chest Imaging Database (NCCID) is a centralised database containing chest X-rays, chest Computed Tomography (CT) scans and cardiac Magnetic Resonance Images (MRI) from patients across the UK, jointly established by NHSX, the British Society of Thoracic Imaging (BSTI), Royal Surrey NHS Foundation Trust (RSNFT) and Faculty. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and development of machine learning (ML) technologies that will improve care for patients hospitalised with a severe COVID-19 infection. The NCCID is now accumulating data from 20 NHS Trusts and Health Boards across England and Wales, with a total contribution of approximately 25,000 imaging studies in the training set (at time of writing) and is actively being used as a research tool by several organisations. This paper introduces the training dataset, including a snapshot analysis performed by NHSX covering: the completeness of clinical data, the availability of image data for the various use-cases (diagnosis, prognosis and longitudinal risk) and potential model confounders within the imaging data. The aim is to inform both existing and potential data users of the NCCIDs suitability for developing diagnostic/prognostic models. In addition, a cohort analysis was performed to measure the representativeness of the NCCID to the wider COVID-19 affected population. Three major aspects were included: geographic, demographic and temporal coverage, revealing good alignment in some categories, e.g., sex and identifying areas for improvements to data collection methods, particularly with respect to geographic coverage. All analyses and discussions are focused on the implications for building ML tools that will generalise well to the clinical use cases.

Download data

  • Downloaded 757 times
  • Download rankings, all-time:
    • Site-wide: 49,250
    • In radiology and imaging: 89
  • Year to date:
    • Site-wide: 54,537
  • Since beginning of last month:
    • Site-wide: 71,018

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

News