Rxivist logo

Proximal Exploration for Model-guided Protein Sequence Design

By Zhizhou Ren, Jiahan Li, Fan Ding, Yuan Zhou, Jianzhu Ma, Jian Peng

Posted 13 Apr 2022
bioRxiv DOI: 10.1101/2022.04.12.487986

Designing protein sequences with a particular biological function is a long-lasting challenge for protein engineering. Recent advances in machine-learning-guided approaches focus on building a surrogate sequence-function model to reduce the burden of expensive in-lab experiments. In this paper, we study the exploration mechanism of model-guided sequence design. We leverage a natural property of protein fitness landscape that a concise set of mutations upon the wild-type sequence are usually sufficient to enhance the desired function. By utilizing this property, we propose Proximal Exploration (PEX) algorithm that prioritizes the evolutionary search for high-fitness mutants with low mutation counts. In addition, we develop a specialized model architecture, called Mutation Factorization Network (MuFacNet), to predict low-order mutational effects, which further improves the sample efficiency of model-guided evolution. In experiments, we extensively evaluate our method on a suite of in-silico protein sequence design tasks and demonstrate substantial improvement over baseline algorithms.

Download data

  • Downloaded 527 times
  • Download rankings, all-time:
    • Site-wide: 82,231
    • In bioinformatics: None
  • Year to date:
    • Site-wide: 4,585
  • Since beginning of last month:
    • Site-wide: 5,401

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide