Rxivist logo

Accelerating Protein Design Using Autoregressive Generative Models

By Adam Riesselman, Jung-Eun Shin, Aaron Kollasch, Conor McMahon, Elana Simon, Chris Sander, Aashish Manglik, Andrew C. Kruse, Debora S. Marks

Posted 05 Sep 2019
bioRxiv DOI: 10.1101/757252

A major biomedical challenge is the interpretation of genetic variation and the ability to design functional novel sequences. Since the space of all possible genetic variation is enormous, there is a concerted effort to develop reliable methods that can capture genotype to phenotype maps. State-of-art computational methods rely on models that leverage evolutionary information and capture complex interactions between residues. However, current methods are not suitable for a large number of important applications because they depend on robust protein or RNA alignments. Such applications include genetic variants with insertions and deletions, disordered proteins, and functional antibodies. Ideally, we need models that do not rely on assumptions made by multiple sequence alignments. Here we borrow from recent advances in natural language processing and speech synthesis to develop a generative deep neural network-powered autoregressive model for biological sequences that captures functional constraints without relying on an explicit alignment structure. Application to unseen experimental measurements of 43 deep mutational scans predicts the effect of insertions and deletions while matching state-of-art missense mutation prediction accuracies. We then test the model on single domain antibodies, or nanobodies, a complex target for alignment-based models due to the highly variable complementarity determining regions. We fit the model to a naïve llama immune repertoire and generate a diverse, optimized library of 105 nanobody sequences for experimental validation. Our results demonstrate the power of the 'alignment-free' autoregressive model in mutation effect prediction and design of traditionally challenging sequence families.

Download data

  • Downloaded 3,051 times
  • Download rankings, all-time:
    • Site-wide: 2,177 out of 106,159
    • In systems biology: 54 out of 2,616
  • Year to date:
    • Site-wide: 1,210 out of 106,159
  • Since beginning of last month:
    • Site-wide: 1,694 out of 106,159

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News