Background: Effective classification of cancer patients into groups with differential survival remains an important and unsolved challenge. Biomarkers have been developed based on mRNA abundance data, but their replicability and clinical utility is modest. Integrating functional information, such as pathway data, has been suggested to improve biomarker performance. To date, however, the advantages of subnetwork-based biomarkers have not been quantified. Results: We deeply sampled the population of prognostic gene-based and subnetwork-based biomarkers in a breast cancer meta-dataset of 4,960 patients. Analysing the performance and robustness of 22,000,000 gene biomarkers and 6,250,000 subnetwork biomarkers across twenty different training:testing cohort partitions of the meta-dataset revealed that subnetwork biomarkers exhibit superior overall performance and higher concordance across partitions. We find evidence of an upper bound for optimal biomarker size of ~200 genes or ~100 subnetworks. Additionally, with both biomarker feature types, larger biomarkers tend to show less consistency in performance across partitions, suggestive of over-fitting. Finally, an evaluation of varying training cohort sizes quantifies the effects of training cohort size. Conclusions: Many groups are developing techniques for exploiting network-based representations of biological pathways to characterize cancer and other diseases. By considering the distribution of gene- and subnetwork-based biomarkers, we show that pathway data improves performance and replicability, and that smaller biomarkers are more robust across patient cohorts. These insights may facilitate development of clinically useful biomarkers.
- Downloaded 227 times
- Download rankings, all-time:
- Site-wide: 53,931 out of 76,789
- In bioinformatics: 5,876 out of 7,421
- Year to date:
- Site-wide: 69,546 out of 76,789
- Since beginning of last month:
- Site-wide: 62,798 out of 76,789
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!