Skip to content

Guide to Using Python on Perlmutter

This is a new page where we provide important information and tips about using Python on Perlmutter. Please be aware that the programming environment on Perlmutter changes quickly and it may be difficult to keep this page fully up to date. We will do our best, but we welcome you to contact us if you find anything that appears incorrect or deprecated.

mpi4py on Perlmutter

The most current release of mpi4py now includes CUDA-aware capabilities. If you intend to use mpi4py to transfer GPU objects, you will need CUDA-aware mpi4py

The mpi4py you obtain via module load python is CUDA-aware. The mpi4py in module load cray-python is not currently CUDA-aware.

If the mpi4py you are using is CUDA-aware, you must have either cuda or cudatoolkit loaded when using it, even for CPU-only code. mpi4py will look for CUDA libraries at runtime.

Building CUDA-aware mpi4py

Using module load cuda

module load PrgEnv-gnu cpe-cuda cuda python
conda create -n cudaaware python=3.9 -y
source activate cudaaware
MPICC="cc -shared" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py 

Using cudatoolkit

module load PrgEnv-gnu cpe-cuda cudatoolkit craype-accel-nvidia80 python
conda create -n cudaaware python=3.9 -y
source activate cudaaware
MPICC="cc -shared" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py

Testing CUDA-aware mpi4py with CuPy

You can test that your CUDA-aware mpi4py installation is working with an example like

from mpi4py import MPI
import cupy as cp
size = comm.Get_size()
rank = comm.Get_rank()
print("starting reduce")
sendbuf = cp.arange(10, dtype='i')
recvbuf = cp.empty_like(sendbuf)
print("rank:", rank, "sendbuff:", sendbuf)
print("rank:", rank, "recvbuff:", recvbuf)
assert hasattr(sendbuf, '__cuda_array_interface__')
assert hasattr(recvbuf, '__cuda_array_interface__')
comm.Allreduce(sendbuf, recvbuf)
print("finished reduce")
print("rank:", rank, "sendbuff:", sendbuf)
print("rank:", rank, "recvbuff:", recvbuf)
assert cp.allclose(recvbuf, sendbuf*size)

Test on one node:

MPICH_GPU_SUPPORT_ENABLED=1 srun -C gpu -n 1 --gpus-per-node=1 python

Test on two nodes:

MPICH_GPU_SUPPORT_ENABLED=1 srun -C gpu -N 2 --ntasks-per-node 2 --gpus-per-node 2 --gpu-bind=single:1 python

Python modules

NERSC provides semi-custom Anaconda Python installations. You can use them via module load python.

You will also find a Cray-provided Python module: cray-python/, but this is not conda-based. Note that the mpi4py provided in the cray-python module is not CUDA-aware.

Please note that Python 2.7 retired in 2020, so NERSC will not be providing Python 2 on Perlmutter.

Customizing Python stacks

We strongly encourage the use of conda environments at NERSC for users to install and customize their own software stacks. We also encourage users to customize their Python software stacks via Shifter. If you are interested in installing or using Python in other ways, please contact us so we can help you.

Using AMD CPUs on Perlmutter

Python users should be aware that using the Intel MKL library may be slow on Perlmutter's AMD CPUs, although it is often still faster than OpenBLAS.

We advise users to try our MKL workaround via

module load fast-mkl-amd