Skip to content

Interactive Jobs

Allocation

salloc is used to allocate resources in real time to run an interactive batch job. Typically, this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.

"interactive" QOS on Perlmutter

Perlmutter has a dedicated interactive QOS to support medium-length interactive work. This QOS is intended to deliver nodes for interactive use within 6 minutes of the job request.

Warning

On Perlmutter, if you have not set a default account, salloc may fail with the following error message:

salloc: error: Job request does not match any supported policy.
salloc: error: Job submit/allocate failed: Unspecified error

Perlmutter GPU nodes

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint gpu --gpus 4 --account mxxxx

When using srun, you must explicitly request for GPU resources

One must use the --gpus (-G), --gpus-per-node, or --gpus-per-task flag to make the allocated node's GPUs visible to your srun command.

Otherwise, you will see errors / complaints similar to:

 no CUDA-capable device is detected

 No Cuda device found

When requesting for an interactive node on the Perlmutter GPU compute nodes

One must use the project name that ends in _g (e.g., mxxxx_g) to submit any jobs to run on the Perlmutter GPU nodes. The -C (constraint flag) must also be set to GPUs for any interactive jobs (-C gpu or --constraint gpu).

Otherwise, you will notice errors such as:

sbatch: error: Job request does not match any supported policy.
sbatch: error: Batch job submission failed: Unspecified error

Perlmutter CPU nodes

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint cpu --account mxxxx

Limits

If resources are not readily available for the requested interactive job, it is automatically canceled after 6 minutes. To allow a job to wait for resources for a longer time, one can use optional the --immediate flag to specify the number of seconds that the job should wait for available resources:

# wait for up to 10 minutes
salloc --nodes 1 --qos interactive --time 01:00:00 --constraint <node type> --account mxxxx --immediate=600

There is a maximum node limit of 4 nodes for interactive jobs on both cpu and gpu partitions. For more details see QOS Limits and Charges.