For prostate cancer patients, timing and intensity of therapy are adjusted based on their prognosis. Clinical and pathological factors, and recently, gene expression-based signatures have been shown to predict metastatic prostate cancer. Previous studies used labelled datasets, i.e. those with information on the metastasis outcome, to discover gene signatures to predict metastasis. Due to steady progression of prostate cancer, datasets for this cancer have a limited number of labelled samples but more unlabelled samples. In addition to this issue, the high dimensionality of the gene expression data also poses a significant challenge to train a classifier and predict metastasis accurately. In this study, we aim to boost the prediction accuracy by utilizing both labelled and unlabelled datasets together. We propose Deep Genomic Signature (DGS), a method based on Denoising Auto-Encoders (DAEs) and transfer learning. DGS has the following steps: first, we train a DAE on a large unlabelled gene expression dataset to extract the most salient features of its samples. Then, we train another DAE on a small labelled dataset for a similar purpose. Since the labelled dataset is small, we employ a transfer learning approach and use the parameters learned from the first DAE in the second one. This approach enables us to train a large DAE on a small dataset. After training the second DAE, we obtain the list of genes with high weights by applying a standard deviation filter on the transferred and learned weights. Finally, we train an elastic net logistic regression model on the expression of the selected genes to predict metastasis. Because of the elastic net regularization, some of the selected genes have non-zero coefficients in the classifier which we consider as the DGS gene signature for metastasis. We apply DGS to six labelled and one large unlabelled prostate cancer datasets. Results on five validation datasets indicate that DGS outperforms state-of-the-art gene signatures (obtained from only labelled datasets) in terms of prediction accuracy. Survival analyses demonstrate the potential clinical utility of our gene signature that adds novel prognostic information to the well-established clinical factors and the state-of-the-art gene signatures. Finally, pathway analysis reveals that the DGS gene signature captures the hallmarks of prostate cancer metastasis. These results suggest that our method helps to identify a robust gene signature that may improve patient management.
- Downloaded 1,264 times
- Download rankings, all-time:
- Site-wide: 13,033
- In bioinformatics: 1,636
- Year to date:
- Site-wide: 72,315
- Since beginning of last month:
- Site-wide: 44,477
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!