Rxivist logo

Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima

By Gang Li, Kersten S. Rabe, Jens Nielsen, Martin KM Engqvist

Posted 16 Jan 2019
bioRxiv DOI: 10.1101/522342 (published DOI: 10.1021/acssynbio.9b00099)

Enzymes that catalyze chemical reactions at high temperatures are used for industrial biocatalysis, applications in molecular biology, and as highly evolvable starting points for protein engineering. The optimal growth temperature (OGT) of organisms is commonly used to estimate the stability of enzymes encoded in their genomes, but the number of experimentally determined OGT values are limited, particularly for ther-mophilic organisms. Here, we report on the development of a machine learning model that can accurately predict OGT for bacteria, archaea and microbial eukaryotes directly from their proteome-wide 2-mer amino acid composition. The trained model is made freely available for re-use. In a subsequent step we OGT data in combination with amino acid composition of individual enzymes to develop a second machine learning model – for prediction of enzyme catalytic temperature optima ( T opt ). The resulting model generates enzyme T opt estimates that are far superior to using OGT alone. Finally, we predict T opt for 6.5 million enzymes, covering 4,447 enzyme classes, and make the resulting dataset available for researchers. This work enables simple and rapid identification of enzymes that are potentially functional at extreme temperatures.

Download data

  • Downloaded 1,703 times
  • Download rankings, all-time:
    • Site-wide: 15,125
    • In bioinformatics: 1,610
  • Year to date:
    • Site-wide: 38,707
  • Since beginning of last month:
    • Site-wide: 64,197

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide