Addressing Inaccurate Nosology in Mental Health: A Multi Label Data Cleansing Approach for Detecting Label Noise from Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders
Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from individuals with mood and psychosis disorders from over 1400 individuals comprising healthy controls, psychosis patients and their unaffected first-degree relatives including 176 bipolar probands, 134 schizoaffective probands, 240 schizophrenia proband, 581 patients relatives and 362 controls. We assumed there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (MRI), using iterative data cleansing approach. Results: Simulation results showed our method was highly accurate in identifying label noise. Both diagnostic and Biotype categories showed about 65% and 63% respectively of noisy labels with the largest amount of relabeling occurring between the healthy control and bipolar and schizophrenia disorder individuals as well as in the unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step towards developing strategies that need not assume existing mental health diagnostic categories are always valid, but rather allows us to leverage this information while also acknowledging that there are misassignments. ### Competing Interest Statement The authors have declared no competing interest.
- Downloaded 163 times
- Download rankings, all-time:
- Site-wide: 114,721
- In neuroscience: 17,722
- Year to date:
- Site-wide: 79,190
- Since beginning of last month:
- Site-wide: 95,708
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!