Using NVSHMEM with MPI on Perlmutter¶
Overview¶
NVSHMEM is a parallel programming interface for NVIDIA GPUs that enables efficient GPU-to-GPU communication across multiple nodes. A detailed description of the library is available in NVSHMEM-doc. On Perlmutter, there are several important configuration requirements to get NVSHMEM working correctly alongside MPI. This page documents known pitfalls and recommended configurations.
Common Pitfalls¶
1. Do Not Use the NVSHMEM from the NVIDIA HPC SDK¶
The NVSHMEM library bundled with the NVIDIA HPC SDK (nvidia module) is pre-compiled for InfiniBand networks and OpenMPI, neither of which Perlmutter uses. Using it will result in errors such as:
bootstrap_loader.cpp:45: NULL value Bootstrap unable to load 'nvshmem_bootstrap_mpi.so'
nvshmem_bootstrap_mpi.so: undefined symbol: ompi_mpi_comm_world
or MPI behaving incorrectly
Always use NERSC's standalone nvshmem module, which is compiled with libfabric support for Perlmutter's Slingshot network and includes MPI bootstrap libraries compatible with Cray-MPICH.
module load nvshmem
2. NVSHMEM Defaults to PMIx Bootstrap, Not MPI¶
By default, NERSC's nvshmem module is configured to bootstrap using PMIx. This is because nvshmem_init uses PMIx. If your application uses MPI for initialization, you must explicitly override this:
export NVSHMEM_BOOTSTRAP=MPI
3. Set the Correct MPI Launch Type¶
SLURM must be configured to use the correct MPI flavor for Perlmutter. Set this before launching your job:
export SLURM_MPI_TYPE=cray_shasta
Or pass it directly to srun:
srun --mpi=cray_shasta -n <ntasks> ./your_app
4. GPU-Aware MPI Is Not Required for Basic NVSHMEM Usage¶
If your application uses MPI for CPU/host communication and NVSHMEM for GPU-to-GPU communication, you do not need GPU-aware MPI. Disabling it explicitly avoids potential conflicts:
export MPICH_GPU_SUPPORT_ENABLED=0
Note: If you do need GPU-aware MPI (e.g., direct GPU buffer transfers via MPI), see the GPU-Aware MPI section below.
Recommended Environment Setup¶
Use the following module and environment configuration before building or running any application that combines MPI and NVSHMEM:
# Load NERSC provided NVSHMEM (not the one from nvhpc)
module load nvshmem
# Use MPI (not PMIx) for NVSHMEM bootstrapping
export NVSHMEM_BOOTSTRAP=MPI
# Use the correct MPI type for Perlmutter
export SLURM_MPI_TYPE=cray_shasta
# Disable GPU-aware MPI if not needed
export MPICH_GPU_SUPPORT_ENABLED=0
Summary of Key Environment Variables¶
| Variable | Recommended Value | Purpose |
|---|---|---|
NVSHMEM_BOOTSTRAP | MPI | Use MPI (not PMIx) for NVSHMEM init |
SLURM_MPI_TYPE | cray_shasta | Correct MPI type for Perlmutter |
MPICH_GPU_SUPPORT_ENABLED | 0 | Disable GPU-aware MPI if not needed |
Getting Help¶
If you continue to experience issues with NVSHMEM or MPI on Perlmutter, open a ticket at help.nersc.gov and include the following information:
- Your full module list (
module list) - Your SLURM job script
- The complete error output
- Whether you are using the HPC SDK NVSHMEM or NERSC's standalone
nvshmemmodule