Rxivist logo

I TRIED A BUNCH OF THINGS: THE DANGERS OF UNEXPECTED OVERFITTING IN CLASSIFICATION

By Michael Powell, Mahan Hosseini, John Collins, Chloe Callahan-Flintoft, William Jones, Howard Bowman, Brad Wyble

Posted 03 Oct 2016
bioRxiv DOI: 10.1101/078816

Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, and MEG data. With these powerful techniques comes the danger of overfitting of hyper-parameters which can render results invalid, and cause a failure to generalize beyond the data set. We refer to this problem as ‘over-hyping’ and show that it is pernicious despite commonly used precautions. In particular, over-hyping occurs when an analysis is run repeatedly with slightly different analysis parameters and one set of results is selected based on the analysis. When this is done, the resulting method is unlikely to generalize to a new dataset, rendering it a partially, or perhaps even completely spurious result that will not be valid outside of the data used in the original analysis. While it is commonly assumed that cross-validation is an effective protection against such spurious results generated through overfitting or overhyping, this is not actually true. In this article, we show that both one-shot and iterative optimization of an analysis are prone to over-hyping, despite the use of cross-validation. We demonstrate that non-generalizable results can be obtained even on non-informative (i.e. random) data by modifying hyper-parameters in seemingly innocuous ways. We recommend a number of techniques for limiting over-hyping, such as lock-boxes, blind analyses, pre-registrations, and nested cross-validation. These techniques, are common in other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques in the neurosciences.

Download data

  • Downloaded 4,708 times
  • Download rankings, all-time:
    • Site-wide: 875 out of 92,062
    • In neuroscience: 92 out of 16,376
  • Year to date:
    • Site-wide: 2,161 out of 92,062
  • Since beginning of last month:
    • Site-wide: 2,767 out of 92,062

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News