Skip to content

Guide to Using Python on Perlmutter

This is a new page where we provide important information and tips about using Python on Perlmutter. Please be aware that the programming environment on Perlmutter changes quickly and it may be difficult to keep this page fully up to date. We will do our best, but we welcome you to contact us if you find anything that appears incorrect or deprecated.

Tips and known issues

The current biggest Python stumbling block on Perlmutter is related to MPI and mpi4py. Python users should be aware that there are two major relevant flavors of MPI: CUDA-aware and non-CUDA aware. If you intend to use mpi4py to transfer GPU objects, you will need CUDA-aware mpi4py.

For anyone who would like to use CUDA-aware mpi4py, please be aware that mpi4py does provide this feature, but not yet in an official release. As a result you'll need to clone the current master branch and build mpi4py.

You can obtain CUDA by loading the cudatoolkit module. Please note that the CUDA provided by this module is currently 11.0. For packages like CuPy, it is important that the version you install matches the CUDA version.

Python modules

On Perlmutter, NERSC provides an Anaconda-based Python module which is identical to the current Cori default Python module: python/3.8-anaconda-2020.11.

You will also find a Cray-provided Python module: cray-python/, but this is not conda-based.

Note that the mpi4py in both of these modules is CPU-only (i.e. is not CUDA-aware). If you need CUDA-aware mpi4py, at the moment you will need to build it yourself in either a conda environment or in a Shifter container (see below for more info).

Please note that Python 2.7 retired in 2020, so NERSC will not be providing Python 2 on Perlmutter.

Customizing Python stacks

We strongly encourage the use of conda environments at NERSC for users to install and customize their own software stacks. We also encourage users to customize their Python software stacks via Shifter. If you are interested in installing or using Python in other ways, please contact us so we can help you.

Building CUDA-aware mpi4py

Here are the important steps you need to build CUDA-aware mpi4py (Cython is a pre-requisite):

module load PrgEnv-gnu cudatoolkit craype-accel-nvidia80 python
tar -xvf master.tar.gz
cd mpi4py-master/
python build --force --mpicc="$(which cc) -shared -lcuda -lcudart -lmpi -lgdrapi"
python install

Here is an full example of how to build and test a CuPy example using CUDA-aware mpi4py in a conda environment:

module load PrgEnv-gnu cudatoolkit craype-accel-nvidia80 python
conda create -n cudaaware python=3.9 -y
source activate cudaaware
conda install cython -y
pip install cupy-cuda110
tar -xvf master.tar.gz
cd mpi4py-master/
python build --force --mpicc="$(which cc) -shared -lcuda -lcudart -lmpi -lgdrapi"
python install
MPICH_GPU_SUPPORT_ENABLED=1 srun -C gpu -n 1 --gpus-per-node=1 python 

where is:

from mpi4py import MPI
import cupy as cp
size = comm.Get_size()
rank = comm.Get_rank()
print("starting reduce")
sendbuf = cp.arange(10, dtype='i')
recvbuf = cp.empty_like(sendbuf)
print("rank:", rank, "sendbuff:", sendbuf)
print("rank:", rank, "recvbuff:", recvbuf)
assert hasattr(sendbuf, '__cuda_array_interface__')
assert hasattr(recvbuf, '__cuda_array_interface__')
comm.Allreduce(sendbuf, recvbuf)
print("finished reduce")
print("rank:", rank, "sendbuff:", sendbuf)
print("rank:", rank, "recvbuff:", recvbuf)
assert cp.allclose(recvbuf, sendbuf*size)

Keep in mind that our GPUs are currently in exclusive mode, so you'll need to specify GPU binding to use more GPU resources:

MPICH_GPU_SUPPORT_ENABLED=1 srun -C gpu -N 2 --tasks-per-node 2 --gpus-per-node 2 --gpu-bind=single:1 python