Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,836 bioRxiv papers from 309,140 authors.

Synthesizer: Expediting synthesis studies from context-free data with natural language processing

By Lisa Gandy, Jordan Gumm, Benjamin Fertig, Michael J Kennish, Sameer Chavan, Ann Thessen, Luigi Marchionni, Xiaoxan Xia, Shambhavi Shankrit, Elana J. Fertig

Posted 16 May 2016
bioRxiv DOI: 10.1101/053629

Today's low cost digital data provides unprecedented opportunities for scientific discovery from synthesis studies. For example, the medical field is revolutionizing patient care by creating personalized treatment plans based upon mining electronic medical records, imaging, and genomics data. Standardized annotations are essential to subsequent analyses for synthesis studies. However, accurately combining records from diverse studies requires tedious and error-prone human curation, posing a significant barrier to synthesis studies. We propose a novel natural language processing (NLP) algorithm, Synthesize, to merge data annotations automatically. Application to patient characteristics for diverse human cancers and ecological datasets demonstrates the accuracy of Synthesize in diverse scientific disciplines. This NLP approach is implemented in an open-source software package, Synthesizer. Synthesizer is a generalized, user-friendly system for error-free data merging.

Download data

  • Downloaded 400 times
  • Download rankings, all-time:
    • Site-wide: 29,541 out of 70,836
    • In bioinformatics: 3,881 out of 6,933
  • Year to date:
    • Site-wide: 65,004 out of 70,836
  • Since beginning of last month:
    • Site-wide: 27,527 out of 70,836

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)