scrontab
(Slurm crontab)¶
Traditional cron functionality has been replaced at NERSC with the Slurm crontab tool called scrontab
. This combines the same functionality as cron with the resiliency of the batch system. Jobs are run on a pool of login nodes, so unlike with regular cron, a single node going down won't keep your scrontab
job from running. You can also find and modify your scrontab
job on any login node.
You can edit your scrontab
script with
scrontab -e
Once you save your script, it will automatically be scheduled by the batch system. By default, vi
is the editor for scrontab
, if you desire a different editor, you can set the EDITOR
environment variable (e.g. export EDITOR=/global/common/software/nersc/bin/emacs
).
You can view your existing scripts with
scrontab -l
Example scrontab
Script¶
Each script should include traditional Slurm flags like -A
and -t
. Here's an example scrontab
job script that will run every three hours (note the #SCRON --open-mode=append
line which will tell Slurm to append any new output to the output file):
#SCRON -q cron
#SCRON -C cron
#SCRON -A <account>
#SCRON -t 00:30:00
#SCRON -o output-%j.out
#SCRON --open-mode=append
0 */3 * * * <full_path_to_your_script>
scrontab
times are in UTC
Currently, scrontab
times on Perlmutter are in UTC.
Long-Running scrontab
Jobs¶
Projects often need long-running processes to manage their work at NERSC (e.g. a listener process to facilitate external data movement). For now we are supporting these via the workflow QOS which allows a much longer run time. However, jobs in this QOS may get interrupted by maintenances or login nodes going offline. Since it's generally desirable to have these jobs restart as soon as possible, we recommend that you set the start up time to be fairly frequent (e.g., once an hour) and add the singleton
flag to that scrontab
job's flags:
#SCRON --qos=workflow
#SCRON --account=<account>
#SCRON --time=30-00:00:00
#SCRON --dependency=singleton
#SCRON --name=my_data_movement_helper
0 * * * * <full_path_to_your_script>
This means Slurm will check every hour whether an instance of your job is running, and if not, it will start it.
Use singleton for long running jobs
You must use --dependency=singleton
for long running jobs to avoid Slurm starting multiple instances of the same job every time your scrontab
file is edited.
Monitoring Your scrontab
Jobs¶
You can monitor your scrontab
jobs with
squeue --me -q cron -O JobID,EligibleTime
This will show the next time the batch system will run your job. If the scrontab
job is set to repeat, the system will automatically reschedule the next job. Additionally, if you modify your scrontab
job, Slurm will automatically cancel the old job and resubmit a new one.
Canceling a scrontab
job¶
To remove a scrontab
job from your running jobs, you can edit the scrontab file with scrontab -e
and comment out all the lines associated with the entry.
Using scancel
on a scrontab
job
The scancel
command will give a warning when attempting to remove a job started with scrontab
.
perlmutter$ scancel 555
scancel: error: Kill job error on job id 555: Cannot scancel a scrontab job without the --hurry flag, or modify scrontab jobs through scontrol
scrontab
job with the --hurry
flag, the entry in the scrontab
file will be prepended with #DISABLED
. These comments will need to be removed before the job will be able to start again. Using scrontab
to submit other batch jobs¶
scrontab
can be used to submit batch jobs at regular intervals, often as part of a larger workflow. It is important to note that scrontab
jobs set certain Slurm-related environment variables which may be inherited by batch jobs submitted from the scrontab
job.
A notable example is that scrontab
jobs set a default SLURM_MEM_PER_CPU=2048
which can cause errors when inherited into batch jobs, often of the form srun: error: Unable to create step for job <id>: More processors requested than permitted
.
A known workaround to avoid this is to set
if [[ ! -z "${SLURM_MEM_PER_CPU}" ]]; then
unset SLURM_MEM_PER_CPU
unset SLURM_OPEN_MODE
fi
in the scrontab
file to handle that specific environment variable, or to use unset ${!SLURM_@};
to unset all Slurm-related environment variables in the file.