Rxivist logo

Sample Size Analysis for Machine Learning Clinical Validation Studies

By Daniel M Goldenholz, Haoqi Sun, Wolfgang Ganglberger, M Brandon Westover

Posted 27 Oct 2021
medRxiv DOI: 10.1101/2021.10.26.21265541

OBJECTIVE: Before integrating new machine learning (ML) into clinical practice, algorithms must undergo validation. Validation studies require sample size estimates. Unlike hypothesis testing studies seeking a p-value, the goal of validating predictive models is obtaining estimates of model performance. Our aim was to provide a standardized, data distribution- and model-agnostic approach to sample size calculations for validation studies of predictive ML models. MATERIALS AND METHODS: Sample Size Analysis for Machine Learning (SSAML) was tested in three previously published models: brain age to predict mortality (Cox Proportional Hazard), COVID hospitalization risk prediction (ordinal regression), and seizure risk forecasting (deep learning). The SSAML steps are: 1) Specify performance metrics for model discrimination and calibration. For discrimination, we use area under the receiver operating curve (AUC) for classification and Harrell's C-statistic for survival models. For calibration, we employ calibration slope and calibration-in-the-large. 2) Specify the required precision and accuracy (<=0.5 normalized confidence interval width and +/-5% accuracy). 3) Specify the required coverage probability (95%). 4) For increasing sample sizes, calculate the expected precision and bias that is achievable. 5) Choose the minimum sample size that meets all requirements. RESULTS: Minimum sample sizes were obtained in each dataset using standardized criteria. DISCUSSION: SSAML provides a formal expectation of precision and accuracy at a desired confidence level. CONCLUSION: SSAML is open-source and agnostics to data type and ML model. It can be used for clinical validation studies of ML models.

Download data

  • Downloaded 87 times
  • Download rankings, all-time:
    • Site-wide: 167,168
    • In health informatics: 789
  • Year to date:
    • Site-wide: None
  • Since beginning of last month:
    • Site-wide: None

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

News