GPU Power Capping on Perlmutter¶
As HPC enters exascale, power has become a critical limiting factor in HPC. Power capping as one of the commonly used power management approaches can effectively keep the system and jobs within a preset power limit. To prepare more for power-constrained future systems, we encourage users to explore the power capping option with your production workloads. NERSC encourages users to explore this option with their production workloads and see if their workloads can adopt the power capping without hurting performance too much.
Perlmutter allows end users to cap GPU power through a SLURM directive. This feature makes the nvidia-smi -pl <power limit>
command which requires root privileges, available to users via a SLURM plugin developed at NERSC.
How to Apply a GPU Power Cap¶
The supported power limits on Perlmutter GPU nodes are:
- A100 40 GB GPUs: Power cap range is 100 W - 400 W.
- A100 80 GB GPUs: Power cap range is 100 W - 500 W.
To request a specific power cap value for your job, use the following SLURM directive in your job script:
#SBATCH --gpu-power=200
or
#SBATCH --gpu-power=200W
This will apply a 200 W GPU power cap to all allocated nodes for your job.
Sample job script:¶
#!/bin/bash
#SBATCH -J pc200w
#SBATCH -q regular
#SBATCH -C gpu
#SBATCH -N 2
#SBATCH -G 8
#SBATCH -t 4:00:00
#SBATCH -A mxyz
#SBATCH --gpu-power=200
#SBATCH -o %x-%j.out
srun -n 8 -c 32 --cpu-bind=cores -G 8 --gpu-bind=none ./a.out
Note
- GPU power capping also works for the shared QOS (
#SBATCH -q=shared
) on Perlmutter, where it sets the power limit for individual GPUs on the shared node.
How to Track Power Cap Usage¶
You can track the GPU power cap usage via the sacct
command, which reports the power cap in the AdminComment
field on Perlmutter.
Example Workflow¶
- Request an interactive job with a 200 W power cap:
elvis@perlmutter:login34:~> salloc -C gpu -q interactive --gpu-power=200 -A mxyz
...
salloc: Nodes nid001124 are ready for job
- Check the GPU power cap and usage using
nvidia-smi
:
elvis@nid001124:~> nvidia-smi -q -i 0 -d POWER
Sample output:
============== NVSMI LOG ==============
Timestamp : Mon Dec 16 11:34:42 2024
Driver Version : 535.216.01
CUDA Version : 12.2
Attached GPUs : 4
GPU 00000000:03:00.0
GPU Power Readings
Power Draw : 53.87 W
Current Power Limit : 200.00 W
Requested Power Limit : 200.00 W
Default Power Limit : 400.00 W
Min Power Limit : 100.00 W
Max Power Limit : 400.00 W
Power Samples
Duration : 2.39 sec
Number of Samples : 119
Max : 53.94 W
Min : 53.87 W
Avg : 53.88 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
- Exit the interactive session:
elvis@nid001124:~> exit
exit
salloc: Relinquishing job allocation 33994454
- Check the power cap usage with sacct:
elvis@perlmutter:login34:~> sacct -j 33994454 -XPno admincomment | jq . | fgrep gpuPower
"gpuPower": "200",
"gpuPowerRaw": "200",
The gpuPower
field displays the applied power cap (e.g., 200 W in this case). If the #SBATCH --gpu-power=200W
directive is used (note the "W" for watts), the gpuPowerRaw
field will report "200W", while the gpuPower
field will display "200" (without the unit).
Notes¶
- The power cap applies to all GPUs allocated to your job throughout the job duration (hence to all job steps).
- Currently, power capping capability is not available via
srun
(per job step). -
Ensure you specify the correct power limit based on the GPU model (e.g., A100 40 GB vs. A100 80 GB). If the requested power exceeds or falls short of the allowed range, the power limit will be automatically adjusted to the nearest valid value. Your job will proceed with a message similar to the following:
slurmstepd: error: gpu-power: nid001204: requested power 80W less than 100W, setting to 100W
or
slurmstepd: error: gpu-power: nid001021: requested power 600W greater than maximum rating of 400W, setting to 400W
-
The
nvidia-smi
tool can provide detailed power usage metrics, which can be useful for debugging or monitoring power consumption.
For more information about the options of nvidia-smi
and sacct
used in this document, please refer to the nvidia-smi
man page (nvidia-smi -h
) and the sacct
command man page (man sacct
).