Rxivist logo

Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’

By Zhi Ping, Shihong Chen, Guangyu Zhou, Xiaoluo Huang, Sha Joe Zhu, Chen Chai, Haoling Zhang, Henry H Lee, Tsan-Yu Chiu, Tai Chen, Huanming Yang, Xun Xu, George Church, Yue Shen

Posted 05 Nov 2019
bioRxiv DOI: 10.1101/829721

Motivation DNA has been reported as a promising medium of data storage for its remarkable durability and space-efficient storage capacity. Here, we propose a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’. Results Using this strategy, we successfully stored different file formats in a single synthetic DNA oligonucleotide pool. Compared to most well-established DNA-based data storage coding schemes presented to date, this codec system can achieve a variety of user goals (e.g. reduce homopolymer length to 3 or 4 at most, maintain balanced GC content between 40% and 60% and simple secondary structure with the Gibbs free energy above −30 kcal/mol). It also shows enhanced robustness in transcoding of different data structure and practical feasibility. We tested this codec with an end-to-end experiment including encoding, DNA synthesis, sequencing and decoding. Through successful retrieval of 3 files totaling 2.02 Megabits after sequencing and decoding, our strategy exhibits great qualities of achieving high storing capacity per nucleotide (427.1 PB/gram) and high fidelity of data recovery.

Download data

  • Downloaded 657 times
  • Download rankings, all-time:
    • Site-wide: 24,404 out of 100,591
    • In synthetic biology: 402 out of 917
  • Year to date:
    • Site-wide: 8,287 out of 100,591
  • Since beginning of last month:
    • Site-wide: 8,807 out of 100,591

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!