Rxivist logo

Privacy-preserving generative deep neural networks support clinical data sharing

By Brett K. Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, Casey S. Greene

Posted 05 Jul 2017
bioRxiv DOI: 10.1101/159756 (published DOI: 10.1161/CIRCOUTCOMES.118.005122)

Background: Data sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier. Methods and Results: Using pairs of deep neural networks, we generated simulated, synthetic "participants" that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants' data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data. Conclusions: Deep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.

Download data

  • Downloaded 12,912 times
  • Download rankings, all-time:
    • Site-wide: 181 out of 100,856
    • In bioinformatics: 20 out of 9,260
  • Year to date:
    • Site-wide: 1,663 out of 100,856
  • Since beginning of last month:
    • Site-wide: 1,994 out of 100,856

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News

  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!