Skip to content

Queues and Charges

This page details the QOS and queue usage policies. Examples for each type of slurm job are available.


When a job runs on a NERSC supercomputer, charges accrue against one of the user's projects. The unit of accounting for these charges is the "Node Hour", based on the performance of the nodes on Perlmutter. The total number of charged hours for a job is a function of:

  • the number of nodes and the walltime used by the job,
  • the QOS of the job, and
  • the "charge factor" for the system upon which the job was run.

Job charging policies, including the intended use of each queue, are outlined in more detail under "Policies". This page summarizes the limits and charges applicable to each queue.

Selecting a Queue

Jobs are submitted to different queues depending on the queue constraints and the user's desired outcomes. Each queue corresponds to a "Quality of Service" (QOS): Each queue has a different service level in terms of priority, run and submit limits, walltime limits, node-count limits, and cost. At NERSC, the terms "queue" and "QOS" are often used interchangeably.

Most jobs are submitted to the "regular" queue. A user who needs fast turnaround while they are using the large telescope could prearrange with NERSC to use the realtime queue for these runs. The user with the scientific emergency will incur a higher-than-regular charge, while a user who can be flexible about their required runtime is rewarded with a substantial discount.

Note

The "debug" queue is the default queue on Perlmutter and will be applied to any job for which a QOS is not specified.

Assigning Charges

Users who are members of more than one project can select which one should be charged for their jobs by default. In Iris, under the "Compute" tab in the user view, select the project you wish to make default.

To charge to a non-default project, use the -A projectname flag in Slurm, either in the Slurm directives preamble of your script, e.g.,

#SBATCH -A myproject

or on the command line when you submit your job, e.g., sbatch -A myproject ./myscript.sl.

Warning

For users who are members of multiple NERSC projects, charges are made to the default project, as set in Iris, unless the #SBATCH --account=<NERSC project> flag has been set.

Calculating Charges

The cost of a job is computed in the following manner: $$ \text{walltime in hours} \times \text{number of nodes} \times \text{QOS factor} \times \text{charge factor} $$.

Example

The charge for a job that runs for 240 minutes on 3 CPU nodes in the preempt QOS (QOS factor of 0.5) would be calculated $$ \frac{240\ \text{mins}}{60\ \text{min/hr}} \times 3\ \text{nodes} \times 0.5 \times 1\ \text{charged-hours/node-hour} = 4 \times 3 \times 0.5 \times 1 = 6.0\ \text{charged hours}.$$

Example

A job which ran for 35 minutes on 3 GPU nodes on Perlmutter with the regular QOS would be charged: $$ \frac{35}{60}\ \text{hours} \times 3\ \text{nodes} \times 1 = 1.75\ \text{charged hours} $$

Note

Jobs in the "shared" QOS are only charged for the fraction of the node used.

Example

A job which ran for 10 hours on 2 GPUs in the shared QOS on Perlmutter GPU would be charged: $$ 10\ \text{hours} \times (2 \text{GPUs}/4) \times 1 = 5\ \text{charged hours} $$

Note

Jobs are charged according to the resources they made unavailable for other jobs, i.e., the number of nodes reserved (regardless of use) and the actual walltime used (regardless of the specified limit).

Charge Factors

Charge factors on Perlmutter are based on the performance of the Perlmutter nodes and are therefore equal to one. Future systems may have different charge factors.

Architecture Charge Factor
Perlmutter CPU 1
Perlmutter GPU 1

Note

Perlmutter GPU and CPU allocations are separate pools that can be used only on the respective resource and cannot be exchanged.

QOS Cost Factor: Charge Multipliers and Discounts

A job's QOS cost factor is a function of which QOS it is run in.

QOS QOS Factor Conditions
regular 1 (standard charge factor)
overrun 0 available only when allocation is exhausted
preempt 0.25 or 0.5 guaranteed 2-hr minimum walltime, then subject to preemption
premium 24 project has used less than 20% of allocation on premium jobs

preempt QOS jobs are charged for a minimum of 2 hours

Preempt QOS jobs are charged for 2 hours of walltime at a minimum, regardless of actual walltime elapsed. Please see our resource policies for more information.

Big Job Discount

Perlmutter GPU jobs in the regular QOS using 128 or more nodes are charged at a 50% discount. Perlmutter CPU jobs in the regular QOS using 256 or more nodes are also discounted by 50%.

System and Architecture Big Job Discount Conditions
Perlmutter GPU 0.5 Job using 128 or more nodes in regular QOS
Perlmutter CPU 0.5 Job using 256 or more nodes in regular QOS

QOS Limits and Charges

Perlmutter GPU

QOS Max nodes Max time (hrs) Min time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
regular - 24 - 5000 - medium 1 1
interactive1 4 4 - 2 2 high 1 1
shared_interactive 0.5 4 - 2 2 high 1 1
jupyter 4 6 - 1 1 high 1 1
debug 8 0.5 - 5 2 medium 1 1
shared2 0.5 24 - 5000 - medium 1 12
preempt3 128 24 (preemptible after two hours) 2 5000 - medium 0.25 0.253
debug_preempt 2 0.5 (preemptible after five minutes) - 5 2 medium 0.25 0.25
premium4 - 24 - 5 - high 24 24
overrun - 24 (preemptible after two hours) - 5000 - very low 0 0
shared_overrun 0.5 24 (preemptible after two hours) - 5000 - very low 0 0
realtime custom custom custom custom custom very high 1 1
  • The "debug" QOS is the default.

  • Nodes allocated by a "regular" QOS job are exclusively used by the job.

  • GPU jobs in the shared QOS may request 1 or 2 GPUs and will be allocated a corresponding 16 CPU cores and 64 GB RAM or 32 CPU cores and 128 GB RAM, respectively.

  • NERSC's Jupyter service uses the jupyter QOS to start JupyterLab on compute nodes. Other uses of the QOS are currently not authorized, and the QOS is monitored for unauthorized use.

  • Jobs in the "preempt" QOS must request a minimum of 2 hours of walltime, and these jobs are charged for a minimum of 2 hours of walltime. Preemptible jobs are subject to preemption after two hours. Jobs can be automatically requeued after preemption using the --requeue sbatch flag; see the Preemptible Jobs section for details.

  • Jobs may run on the "standard" Perlmutter GPU nodes or on the subset of GPU nodes which have double the GPU-attached memory. To specifically request these higher-bandwidth memory nodes, use -C gpu&hbm80g in your job script instead of -C gpu. Jobs with this constraint must use 256 or fewer nodes. To specifically request the "standard" Perlmutter GPU nodes, use -C gpu&hbm40g in your job script.

Specific GPU requests on the command line

If you are submitting a job from the command line (e.g., via salloc) which uses one of the memory-specific GPU constraints, you will need to specify the constraint within quotation marks (e.g., -C "gpu&hbm80g") so that the ampersand in the constraint is not interpreted as a shell symbol.

Perlmutter CPU

QOS Max nodes Max time (hrs) Min time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
regular - 24 - 5000 - medium 1 1
interactive1 4 4 - 2 2 high 1 1
shared_interactive 0.5 4 - 2 2 high 1 1
jupyter 4 6 - 1 1 high 1 1
debug 8 0.5 - 5 2 medium 1 1
shared2 0.5 24 - 5000 - medium 1 12
preempt3 128 24 (preemptible after two hours) 2 5000 - medium 0.5 0.53
debug_preempt 2 0.5 (preemptible after five minutes) - 5 2 medium 0.5 0.5
premium4 - 24 - 5 - high 24 24
overrun - 24 (preemptible after two hours) - 5000 - very low 0 0
shared_overrun 0.5 24 (preemptible after two hours) - 5000 - very low 0 0
realtime custom custom custom custom custom very high 1 1
  • The "debug" QOS is the default.

  • Nodes allocated by a "regular" QOS job are exclusively used by the job.

  • NERSC's Jupyter service uses the "jupyter" QOS to start JupyterLab on compute nodes. Other uses of the QOS are currently not authorized, and the QOS is monitored for unauthorized use.

  • Even though there is no node limit for the regular queue, not all of the projected 3072 nodes are available today. Please check the state of the nodes in the regular queue with sinfo -s -p regular_milan_ss11

  • Jobs in the "preempt" QOS must request a minimum of 2 hours of walltime, and these jobs are charged for a minimum of 2 hours of walltime. Preemptible jobs are subject to preemption after two hours. Jobs can be automatically requeued after preemption using the --requeue sbatch flag; see the Preemptible Jobs section for details.

Perlmutter Login

QOS Max nodes Max time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
xfer 1 (login) 48 100 15 low - 0
cron 1/128 (login) 24 - - low - 0
workflow 0.25 (login) 2160 - - low - 0

Discounts

Several QOSes offer reduced charging rates:

  • The "preempt" QOS is charged 50% of the "regular" QOS on CPU nodes and 25% of "regular" QOS on GPU nodes, with a minimum walltime of two hours guaranteed. Jobs in the "preempt" QOS are charged for a minimum of 2 hours of walltime, regardless of actual job runtime.
  • The "overrun" QOS is free of charge and is only available to projects that are out of allocation time. Please refer to the overrun section for more details.

  1. Batch job submission is not enabled; jobs must be submitted via salloc

  2. Shared jobs are only charged for the fraction of the node resources used. 

  3. Jobs in the "preempt" QOS are charged for a minimum of 2 hours of walltime, regardless of actual job runtime, and "preempt" QOS jobs must request a walltime of at least 2 hours. 

  4. The charge factor for premium jobs will increase once a project has used 20 percent of its allocation on premium.