Rxivist logo

CGAT-core: a python framework for building scalable, reproducible computational biology workflows

By Adam Cribbs, Sebastian Luna-Valero, Charlotte George, Ian M Sudbery, Antonio J Berlanga-Taylor, Steven N Sansom, Thomas Smith, Nicholas E Ilott, Jethro Johnson, Jakub Scaber, Katherine Brown, David Sims, Andreas Heger

Posted 18 Mar 2019
bioRxiv DOI: 10.1101/581009 (published DOI: 10.12688/f1000research.18674.1)

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.

Download data

  • Downloaded 708 times
  • Download rankings, all-time:
    • Site-wide: 34,403
    • In genomics: 2,997
  • Year to date:
    • Site-wide: 99,528
  • Since beginning of last month:
    • Site-wide: 114,406

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)