Rxivist logo

Synthesizer: Expediting synthesis studies from context-free data with natural language processing

By Lisa Gandy, Jordan Gumm, Benjamin Fertig, Michael J Kennish, Sameer Chavan, Ann Thessen, Luigi Marchionni, Xiaoxan Xia, Shambhavi Shankrit, Elana J. Fertig

Posted 16 May 2016
bioRxiv DOI: 10.1101/053629

Today's low cost digital data provides unprecedented opportunities for scientific discovery from synthesis studies. For example, the medical field is revolutionizing patient care by creating personalized treatment plans based upon mining electronic medical records, imaging, and genomics data. Standardized annotations are essential to subsequent analyses for synthesis studies. However, accurately combining records from diverse studies requires tedious and error-prone human curation, posing a significant barrier to synthesis studies. We propose a novel natural language processing (NLP) algorithm, Synthesize, to merge data annotations automatically. Application to patient characteristics for diverse human cancers and ecological datasets demonstrates the accuracy of Synthesize in diverse scientific disciplines. This NLP approach is implemented in an open-source software package, Synthesizer. Synthesizer is a generalized, user-friendly system for error-free data merging.

Download data

  • Downloaded 457 times
  • Download rankings, all-time:
    • Site-wide: 37,236 out of 94,912
    • In bioinformatics: 4,613 out of 8,837
  • Year to date:
    • Site-wide: 76,299 out of 94,912
  • Since beginning of last month:
    • Site-wide: 48,274 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)