Rxivist logo

Mining the forest: uncovering biological mechanisms by interpreting Random Forests

By Julian de Ruiter, Theo Knijnenburg, Jeroen de Ridder

Posted 10 Nov 2017
bioRxiv DOI: 10.1101/217695

Biological datasets are large and complex. Machine learning models are therefore essential to capture relationships in the data. Unfortunately, the inferred complex models are often difficult to understand and interpretation is limited to a list of features ranked on their importance in the model. We propose a computational approach, called Foresight, which enables interpretation of the patterns uncovered by Random Forest models trained on biological datasets. Foresight exploits the correlation structure in the data to uncover relevant groups of features and the interactions between them. This facilitates interpretation of the computational model and can provide more detailed insight in the underlying biological relationships than simply ranking features. We demonstrate Foresight on both an artificial dataset and a large gene expression dataset of breast cancer patients. Using the latter dataset we show that our approach retrieves biologically relevant features and provides a rich description of the interactions and correlation structure between these features.

Download data

  • Downloaded 680 times
  • Download rankings, all-time:
    • Site-wide: 25,171 out of 105,737
    • In bioinformatics: 3,454 out of 9,474
  • Year to date:
    • Site-wide: 43,819 out of 105,737
  • Since beginning of last month:
    • Site-wide: 47,448 out of 105,737

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)