Workflow Tools¶
Supporting data-centric science involves the movement of data, multi-stage processing, and visualization at scales where manual control becomes prohibitive and automation is needed. Workflow tools can improve the productivity and efficiency of data-centric science by orchestrating and automating these steps. We advise against writing your own workflow manager from the ground up. Many of the workflow tools on this page are open source and written in python so they can easily be modified and or extended to suit your needs.
Let us help you find the right tool!
Do you have questions about how to choose the right workflow tool for your application? Are you unsure about which tools will work on NERSC systems? Please open a ticket at help.nersc.gov, explain you would like help choosing a workflow tool, and your ticket will be routed to experts who can help you.
Tool | Main Feature | Workflow Language | NERSC Documentation | Tool Documentation |
---|---|---|---|---|
GNU Parallel | Simple to start using | Shell Scripts | Docs | Docs |
Parsl/funcX | Extensible with Globus Compute | Python API | Docs | Docs |
FireWorks | Tracks tasks in database and web gui | Python API / Yaml files | Docs | Docs |
Snakemake | Easy to mange many shell commands as tasks | Snakemake files | Docs | Docs |
Don't run sruns
in a loop
Running sruns
in a loop, or many sruns
in the same job can cause contention in the scheduler, effecting your tasks as well as other users tasks running on the system.
Workflow and Cron QOSes¶
Some workflow tools require a main process to coordinate tasks or manage Slurm resources. These services can be started as a long running jobs in the workflow QOS. Some automated workflows use cron jobs to check for new data and start running new workflows. Both of these tasks can be run on Perlmutter login nodes using scrontab. Scrontabs combines the functionality of cron
with the the batch system so processes start on any of the available login nodes to allow for resiliency of services.
GNU Parallel¶
GNU Parallel is a shell tool for executing commands in parallel and in sequence on a single node. Parallel is a very usable and effective tool for running High Throughput Computing workloads without data dependencies at NERSC. Following simple Slurm command patterns allows parallel to scale up to running tasks in job allocations with multiple nodes.
Parsl¶
Parsl is a Python library for programming and executing data-oriented workflows in parallel. It lets you express complicated workflows with task and data dependencies in a single Python script. Parsl is made with HPC in mind, scales well, and runs on many HPC platforms. Under the hood, Parsl uses a driver or master process to orchestrate the work. Data and tasks are serialized and communicated bidirectional with worker process using ZeroMQ sockets. The workers are organized in worker pools and launched on the compute infrastructure.
Fireworks¶
FireWorks is a free, open-source code for defining, managing, and executing scientific workflows. It can be used to automate calculations over arbitrary computing resources, including those that have a queueing system. Some features that distinguish FireWorks are dynamic workflows, failure-detection routines, and built-in tools and execution modes for running high-throughput computations at large computing centers. It uses a centralized server model, where the server manages the workflows and workers run the jobs.
Snakemake¶
Snakemake is a tool that combines the power of Python with shell scripting. It allows users to define workflows with complex dependencies; users can easily visualize the job dependency graph and track which tasks have been completed and are still pending. Snakemake works best at NERSC for single node jobs.
Community Supported Tools¶
If you find that these tools don't meet your needs, you can check out some of the community supported workflow tools. The community supported workflow tools is a place for workflow developers and avid workflow users to share instructions for using workflow tools on NERSC systems.