Rxivist logo

Addressing Inaccurate Nosology in Mental Health: A Multi Label Data Cleansing Approach for Detecting Label Noise from Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders

By Hooman Rokham, Godfrey Pearlson, Anees Abrol, Haleh Falakshahi, Sergey M. Plis, V. D. Calhoun

Posted 08 May 2020
bioRxiv DOI: 10.1101/2020.05.06.081521 (published DOI: 10.1016/j.bpsc.2020.05.008)

Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from individuals with mood and psychosis disorders from over 1400 individuals comprising healthy controls, psychosis patients and their unaffected first-degree relatives including 176 bipolar probands, 134 schizoaffective probands, 240 schizophrenia proband, 581 patients relatives and 362 controls. We assumed there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (MRI), using iterative data cleansing approach. Results: Simulation results showed our method was highly accurate in identifying label noise. Both diagnostic and Biotype categories showed about 65% and 63% respectively of noisy labels with the largest amount of relabeling occurring between the healthy control and bipolar and schizophrenia disorder individuals as well as in the unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step towards developing strategies that need not assume existing mental health diagnostic categories are always valid, but rather allows us to leverage this information while also acknowledging that there are misassignments. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 163 times
  • Download rankings, all-time:
    • Site-wide: 114,721
    • In neuroscience: 17,722
  • Year to date:
    • Site-wide: 79,190
  • Since beginning of last month:
    • Site-wide: 95,708

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News