Rxivist logo

DeCoDe: degenerate codon design for complete protein-coding DNA libraries

By Tyler C. Shimko, Polly M. Fordyce, Yaron Orenstein

Posted 17 Oct 2019
bioRxiv DOI: 10.1101/809004 (published DOI: 10.1093/bioinformatics/btaa162)

Motivation: High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more non-functional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. Results: We introduce a novel algorithm for total DC library optimization, DeCoDe, based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states.

Download data

  • Downloaded 347 times
  • Download rankings, all-time:
    • Site-wide: 48,680 out of 92,880
    • In bioinformatics: 5,585 out of 8,696
  • Year to date:
    • Site-wide: 28,596 out of 92,880
  • Since beginning of last month:
    • Site-wide: 42,160 out of 92,880

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)