Skip to content

Using NVSHMEM with MPI on Perlmutter

Overview

NVSHMEM is a parallel programming interface for NVIDIA GPUs that enables efficient GPU-to-GPU communication across multiple nodes. A detailed description of the library is available in NVSHMEM-doc. On Perlmutter, there are several important configuration requirements to get NVSHMEM working correctly alongside MPI. This page documents known pitfalls and recommended configurations.

Common Pitfalls

1. Do Not Use the NVSHMEM from the NVIDIA HPC SDK

The NVSHMEM library bundled with the NVIDIA HPC SDK (nvidia module) is pre-compiled for InfiniBand networks and OpenMPI, neither of which Perlmutter uses. Using it will result in errors such as:

bootstrap_loader.cpp:45: NULL value Bootstrap unable to load 'nvshmem_bootstrap_mpi.so'
nvshmem_bootstrap_mpi.so: undefined symbol: ompi_mpi_comm_world

or MPI behaving incorrectly

Always use NERSC's standalone nvshmem module, which is compiled with libfabric support for Perlmutter's Slingshot network and includes MPI bootstrap libraries compatible with Cray-MPICH.

module load nvshmem

2. NVSHMEM Defaults to PMIx Bootstrap, Not MPI

By default, NERSC's nvshmem module is configured to bootstrap using PMIx. This is because nvshmem_init uses PMIx. If your application uses MPI for initialization, you must explicitly override this:

export NVSHMEM_BOOTSTRAP=MPI

3. Set the Correct MPI Launch Type

SLURM must be configured to use the correct MPI flavor for Perlmutter. Set this before launching your job:

export SLURM_MPI_TYPE=cray_shasta

Or pass it directly to srun:

srun --mpi=cray_shasta -n <ntasks> ./your_app

4. GPU-Aware MPI Is Not Required for Basic NVSHMEM Usage

If your application uses MPI for CPU/host communication and NVSHMEM for GPU-to-GPU communication, you do not need GPU-aware MPI. Disabling it explicitly avoids potential conflicts:

export MPICH_GPU_SUPPORT_ENABLED=0

Note: If you do need GPU-aware MPI (e.g., direct GPU buffer transfers via MPI), see the GPU-Aware MPI section below.


Use the following module and environment configuration before building or running any application that combines MPI and NVSHMEM:

# Load NERSC provided NVSHMEM (not the one from nvhpc)
module load nvshmem

# Use MPI (not PMIx) for NVSHMEM bootstrapping
export NVSHMEM_BOOTSTRAP=MPI

# Use the correct MPI type for Perlmutter
export SLURM_MPI_TYPE=cray_shasta

# Disable GPU-aware MPI if not needed
export MPICH_GPU_SUPPORT_ENABLED=0

Summary of Key Environment Variables

Variable Recommended Value Purpose
NVSHMEM_BOOTSTRAP MPI Use MPI (not PMIx) for NVSHMEM init
SLURM_MPI_TYPE cray_shasta Correct MPI type for Perlmutter
MPICH_GPU_SUPPORT_ENABLED 0 Disable GPU-aware MPI if not needed

Getting Help

If you continue to experience issues with NVSHMEM or MPI on Perlmutter, open a ticket at help.nersc.gov and include the following information:

  • Your full module list (module list)
  • Your SLURM job script
  • The complete error output
  • Whether you are using the HPC SDK NVSHMEM or NERSC's standalone nvshmem module