Rxivist logo

HPCI: A Perl module for writing cluster-portable bioinformatics pipelines

By John M Macdonald, Christopher M Lalansingh, Christopher I Cooper, Anqi Yang, Felix Lam, Paul C. Boutros

Posted 05 Sep 2018
bioRxiv DOI: 10.1101/408666

Background: Most biocomputing pipelines are run on clusters of computers. Each type of cluster has its own API (application programming interface). That API defines how a program that is to run on the cluster must request the submission, content and monitoring of jobs to be run on the cluster. Sometimes, it is desirable to run the same pipeline on different types of cluster. This can happen in situations including when: - different labs are collaborating, but they do not use the same type of cluster; - a pipeline is released to other labs as open source or commercial software; - a lab has access to multiple types of cluster, and wants to choose between them for scaling, cost or other purposes; - a lab is migrating their infrastructure from one cluster type to another; - during testing or travelling, it is often desired to run on a single computer. However, since each type of cluster has its own API, code that runs jobs on one type of cluster needs to be re-written if it is desired to run that application on a different type of cluster. To resolve this problem, we created a software module to generalize the submission of pipelines across computing environments, including local compute, clouds and clusters. Results: HPCI (High Performance Computing Interface) is a Perl module that provides the interface to a standardized generic cluster. When the HPCI module is used, it accepts a parameter to specify the cluster type. The HPCI module uses this to load a driver HPCD::<cluster>. This is used to translate the abstract HPCI interface to the specific software interface. Simply by changing the cluster parameter, the same pipeline can be run on a different type of cluster with no other changes. Conclusion: The HPCI module assists in writing Perl programs that can be run in different lab environments, with different site configuration requirements and different types of hardware clusters. Rather than having to re-write portions of the program, it is only necessary to change a configuration file. Using HPCI, an application can manage collections of jobs to be runs, specify ordering dependencies, detect success or failure of jobs run and allow automatic retry of failed jobs (allowing for the possibility of a changed configuration such as when the original attempt specified an inadequate memory allotment). Keywords: portability; cluster; environment; pipeline

Download data

  • Downloaded 192 times
  • Download rankings, all-time:
    • Site-wide: 59,434 out of 76,820
    • In bioinformatics: 6,280 out of 7,425
  • Year to date:
    • Site-wide: 58,626 out of 76,820
  • Since beginning of last month:
    • Site-wide: 59,357 out of 76,820

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)