Workflow QOS¶
Some workflow tools require a lightweight main process to coordinate tasks, manage Slurm resources, or interact with Perlmutter scratch file system. On Perlmutter tasks with these requirements can be run in the workflow
QOS. If your task doesn't need these resources, NERSC also offers Spin. Jobs in the workflow QOS are often managed and restarted automatically by using scrontab
.
Getting access to the workflow QOS¶
To request access to the Perlmutter workflow QOS, please fill out the Workflow QOS Request Form at the NERSC help desk.
To help us decide if your use case is appropriate for the workflow QOS, you will be required to enter:
User:
Email:
Project name:
Statement of purpose:
Estimated memory usage:
Estimated CPU usage:
Estimated data usage:
Estimated I/O:
Frequency / length of process:
Need external resources (yes/no):
Example workflow scrontab script¶
Scripts must include traditional Slurm flags like -q
, -A
, and -t
. Below is an an example workflow scrontab script with a walltime of 30 days that will run once every hour. Note the #SCRON --open-mode=append
line, which instructs Slurm to append any new output to the output file:
#SCRON -q workflow
#SCRON -C cron
#SCRON -A <account>
#SCRON -t 30-00:00:00
#SCRON -o output-%j.out
#SCRON --open-mode=append
0 */1 * * * <full_path_to_your_script>
Scrontab times are in UTC
Currently, scrontab times on Perlmutter are in UTC.
Workflow QOS details¶
Jobs in the workflow QOS may request a walltime of up to 90 days and up to one quarter of the resources (CPU and/or memory) of a Perlmutter login node.
Known issues¶
If an scrontab job is canceled for any reason (e.g., by the user or ahead of an upcoming maintenance), the user must manually re-enable the job by editing their scrontab file with scrontab -e
and removing comment characters which were inserted by Slurm upon job cancellation.