Rxivist logo

Deep Genomic Signature for early metastasis prediction in prostate cancer

By Hossein Sharifi-Noghabi, Yang Liu, Nicholas Erho, Raunak Shrestha, Mohammed Alshalalfa, Elai Davicioni, Colin C. Collins, Martin Ester

Posted 04 Mar 2018
bioRxiv DOI: 10.1101/276055

For prostate cancer patients, timing and intensity of therapy are adjusted based on their prognosis. Clinical and pathological factors, and recently, gene expression-based signatures have been shown to predict metastatic prostate cancer. Previous studies used labelled datasets, i.e. those with information on the metastasis outcome, to discover gene signatures to predict metastasis. Due to steady progression of prostate cancer, datasets for this cancer have a limited number of labelled samples but more unlabelled samples. In addition to this issue, the high dimensionality of the gene expression data also poses a significant challenge to train a classifier and predict metastasis accurately. In this study, we aim to boost the prediction accuracy by utilizing both labelled and unlabelled datasets together. We propose Deep Genomic Signature (DGS), a method based on Denoising Auto-Encoders (DAEs) and transfer learning. DGS has the following steps: first, we train a DAE on a large unlabelled gene expression dataset to extract the most salient features of its samples. Then, we train another DAE on a small labelled dataset for a similar purpose. Since the labelled dataset is small, we employ a transfer learning approach and use the parameters learned from the first DAE in the second one. This approach enables us to train a large DAE on a small dataset. After training the second DAE, we obtain the list of genes with high weights by applying a standard deviation filter on the transferred and learned weights. Finally, we train an elastic net logistic regression model on the expression of the selected genes to predict metastasis. Because of the elastic net regularization, some of the selected genes have non-zero coefficients in the classifier which we consider as the DGS gene signature for metastasis. We apply DGS to six labelled and one large unlabelled prostate cancer datasets. Results on five validation datasets indicate that DGS outperforms state-of-the-art gene signatures (obtained from only labelled datasets) in terms of prediction accuracy. Survival analyses demonstrate the potential clinical utility of our gene signature that adds novel prognostic information to the well-established clinical factors and the state-of-the-art gene signatures. Finally, pathway analysis reveals that the DGS gene signature captures the hallmarks of prostate cancer metastasis. These results suggest that our method helps to identify a robust gene signature that may improve patient management.

Download data

  • Downloaded 1,264 times
  • Download rankings, all-time:
    • Site-wide: 13,033
    • In bioinformatics: 1,636
  • Year to date:
    • Site-wide: 72,315
  • Since beginning of last month:
    • Site-wide: 44,477

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)