Podman at NERSC¶
Podman (Pod Manager) is an open-source, OCI-compliant container framework that is under active development by Red Hat. In many ways Podman can be treated as a drop-in replacement for Docker.
Since "out of the box" Podman currently lacks several key capabilities for HPC users, NERSC has been working with Red Hat to adapt Podman for HPC use-cases and has developed an add-on called podman-hpc.
podman-hpc is now available to all users on Perlmutter.
podman-hpc enables improved performance, especially at large scale, and makes using common HPC tools like Cray MPI and NVIDIA CUDA capabilities easy.
podman-hpc at NERSC is experimental
podman-hpc has been recently deployed at NERSC and should not be considered stable or suitable for production. If you encounter what you think could be a problem/bug, please report it to us via filing a NERSC ticket.
Users may be interested in using Podman Desktop on their local machines. It is a free alternative to Docker Desktop.
Users who are comfortable with Shifter, the current NERSC production container runtime, may wonder what advantages Podman offers over Shifter. Here are a few:
podman-hpcdoesn't impose many of the restrictions that Shifter does:
- No container modules will be loaded by default.
- Most environment variables will not be automatically propagated into the container.
- Applications that require root permission inside the container will be allowed to run. This is securely enabled via Podman's rootless mode.
- Users can modify the contents of their containers at runtime.
- Users can build images directly on Perlmutter.
- Users can choose to run these images directly via
podman-hpcwithout uploading to an intermediate repository.
- Podman is an OCI-compliant framework (like Docker). Users who are familiar with Docker will find that Podman has very similar syntax and can often be used as a drop-in replacement for Docker. Users may also find that this makes their workflow more portable.
podman-hpcis a transparent wrapper around Podman. Users will find that they can pass standard unprivileged Podman commands to
- Podman is a widely used tool that is not specific to NERSC.
How to use
podman-hpc is available on Perlmutter.
To see all available commands, users can issue the
podman-hpc --help command:
elvis@nid001036:~> podman-hpc --help Manage pods, containers and images ... on HPC! Description: The podman-hpc utility is a wrapper script around the Podman container engine. It provides additional subcommands for ease of use and configuration of Podman in a multi-node, multi-user high performance computing environment. Usage: podman-hpc [options] COMMAND [ARGS]... Options: --additional-stores TEXT Specify other storage locations --squash-dir TEXT Specify alternate squash directory location --help Show this message and exit. Commands: infohpc Dump configuration information for podman_hpc. migrate Migrate an image to squashed. pull Pulls an image to a local repository and makes a squashed... rmsqi Removes a squashed image. shared-run Launch a single container and exec many threads in it This is... ...
Users can issue the
podman-hpc images to see any images that they have built or pulled.
elvis@nid001036:~> podman-hpc images REPOSITORY TAG IMAGE ID CREATED SIZE R/O elvis@nid001036:~>
This should show there are no images yet.
Users should generate a
Containerfile is a more general form of a
Dockerfile- they follow the same syntax and usually can be used interchangeably.) Users can build and tag the image in the same directory via a command like:
podman-hpc build -t elvis:test .
podman-hpc images and caches are stored in local storage
podman-hpc build artifacts and cache files will be stored on the login node where the issue performed the build. If a user logs onto a new node, they will not have access to these cached files and will need to build from scratch. At the moment we have no purge policy for the local image build storage, although users can likely expect one in the future.
If a user would like their image to be usable in a job, they will need to issue the
podman-hpc migrate elvis:test
command. This will convert the image into a suitable squashfile format for
podman-hpc. These images can be directly accessed and used in a job. If you migrate your image, you will notice that there are two kinds of images listed by
elvis@perlmutter:login01:/> podman-hpc images REPOSITORY TAG IMAGE ID CREATED SIZE R/O localhost/elvis test f55898589b7a 11 seconds ago 80.3 MB false elvis@perlmutter:login01:/> podman-hpc migrate elvis:test elvis@perlmutter:login01:/> podman-hpc images REPOSITORY TAG IMAGE ID CREATED SIZE R/O localhost/elvis test f55898589b7a 45 seconds ago 80.3 MB false localhost/elvis test f55898589b7a 45 seconds ago 80.3 MB true elvis@perlmutter:login01:/>
The migrated squashfile is listed as read-only (R/O) in this display. However, you will be able to modify the image at runtime since
podman-hpc adds an overlay filesystem on top of the squashed image.
Users can pull public images via
podman-hpc with no additional configuration.
elvis@perlmutter:login01:/> podman-hpc pull ubuntu:latest Trying to pull docker.io/library/ubuntu:latest... Getting image source signatures Copying blob 2ab09b027e7f skipped: already exists Copying config 08d22c0ceb done Writing manifest to image destination Storing signatures 08d22c0ceb150ddeb2237c5fa3129c0183f3cc6f5eeb2e7aa4016da3ad02140a INFO: Migrating image to /pscratch/sd/e/elvis/storage elvis@perlmutter:login01:/>
Images that a user pulls from a registry will be automatically converted into a suitable squashfile format for
podman-hpc. These images can be directly accessed and used in a job.
If a user needs to pull an image in a private registry, they must first log in to their registry via
podman-hpc. In this case we are logging into Dockerhub.
elvis@nid001036:~> podman-hpc login docker.io Username: elvis Password: Login Succeeded!
The user can then pull the image
elvis@nid001036:~> podman-hpc pull elvis/hello-world:1.0 Trying to pull docker.io/elvis/hello-world:1.0... Getting image source signatures Copying blob sha256:7b1a6ab2e44dbac178598dabe7cff59bd67233dba0b27e4fbd1f9d4b3c877a54 Copying config sha256:0849b79544d682e6149e46977033706b17075be384215ef8a69b5a37037c7231 Writing manifest to image destination Storing signatures 0849b79544d682e6149e46977033706b17075be384215ef8a69b5a37037c7231 elvis@nid001036:~> podman-hpc images REPOSITORY TAG IMAGE ID CREATED SIZE R/O docker.io/elvis/hello-world 1.0 0849b79544d6 16 months ago 75.2 MB true
podman-hpc as a container runtime¶
Unlike Shifter, the Slurm
--image flag is not required
podman-hpc can be used on a login node or in a job. Unlike Shifter, no Slurm flags are needed to use a
podman-hpc image in a job. The only requirement is that the user has pulled an image, which is automatically migrated, or built and migrated an image.
Users can use
podman-hpc as a container runtime. Early benchmarking has shown that in many cases, performance is comparable to Shifter and bare metal.
Our goal has been to design
podman-hpc so that standard Podman commands still work. Please check out this page for a full list of podman run capabilities.
Users can use
podman-hpc in both interactive and batch jobs without requesting any special resources. They only need to have previously built or pulled an image via
podman-hpc. Users may chose to run a container in interactive mode, like in this example:
elvis@nid001036:~> podman-hpc run --rm -it registry.nersc.gov/library/nersc/mpi4py:3.1.3 /bin/bash root@d23b3ea141ed:/opt# cat /etc/os-release NAME="Ubuntu" VERSION="20.04.5 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.5 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal root@d23b3ea141ed:/opt# exit exit elvis@nid001036:~>
Here we see that the container is using the Ubuntu Jammy OS.
Users may also chose to run a container in standard run mode:
elvis@nid001036:~> podman-hpc run --rm registry.nersc.gov/library/nersc/mpi4py:3.1.3 echo $SLURM_JOB_ID 198507 elvis@nid001036:~>
Here we print the SLURM job id from inside the container.
podman-hpc does not enable any MPI or GPU capability by default. Users must request the additional utilities they need.
|--mpi||Uses optimized Cray MPI|
|--cuda-mpi||Uses CUDA-aware optimized Cray MPI|
|--gpu||Enable NVIDIA GPU|
|--cvmfs||Enable the CVMFS filesystem|
Note that the
--cuda-mpi flag must be used together with the
More modules will be added soon.
Unlike Shifter, no capabilities are loaded by default
Shifter users may be aware that MPICH and GPU capabilities are loaded by default. In
podman-hpc, we take the opposite (and more OCI-compliant approach) in which users must explicitly request all capabilities they need.
Using Cray MPICH in
Using Cray MPICH in
podman-hpc is very similar to what we describe in our MPI in Shifter documentation. To be able to use Cray MPICH at runtime, users must first include a standard implementation of MPICH in their image. If users add the
podman-hpc --mpi flag, it will enable our current Cray MPICH to be inserted and replaced with the MPICH in their container at runtime.
Here is an example of running an MPI-enabled task in
podman-hpc in an interactive job:
elvis@nid001037:~> srun -n 2 podman-hpc run --rm --mpi registry.nersc.gov/library/nersc/mpi4py:3.1.3 python3 -m mpi4py.bench helloworld Hello, World! I am process 0 of 2 on nid001037. Hello, World! I am process 1 of 2 on nid001041.
Using NVIDIA GPUs in
Accessing NVIDIA GPUs in a container requires that the NVIDIA CUDA user drivers and other utilities are present in the container at runtime. If users add the
podman-hpc --gpu flag, this will ensure all required utilities are enabled at runtime.
Here is an example of running a GPU-enabled task in
podman-hpc in an interactive job:
elvis@nid001037:~> srun -n 2 -G 2 podman-hpc run --rm --gpu registry.nersc.gov/library/nersc/mpi4py:3.1.3 nvidia-smi Sat Jan 14 01:16:06 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| Sat Jan 14 01:16:06 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... Off | 00000000:03:00.0 Off | 0 | | N/A 27C P0 52W / 400W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
Sharing the host network with the container¶
If you would like your container to share its network with the host, try running with the --net host option:
podman-hpc --net host elvis:test ...
Graphics forwarding in
Here is an example of setting up graphics forwarding with an application running inside a
podman-hpc container. In this example we pass
DISPLAY environment variable and set the container to share the host network.
podman-hpc run -it --rm --gpu -e DISPLAY -v /tmp:/tmp --net host --volume="$HOME/.Xauthority:/root/.Xauthority:rw" -v $(pwd):/workspace nvcr.io/hpc/vmd:1.9.4a44
Running a container as a user instead of root¶
If you wish to run a container as your user rather than as root, try running with the --userns keep-id option:
podman-hpc run --userns keep-id elvis:test ...
Profiling an application in
Profiling a containerized application can be more complex than a bare-metal application. There are several possible approaches:
Option 1- Profile a containerized application using a profiling tool already installed in the container. This may be possible using NVIDIA NGC containers which often ship with
nsys and related tools. The user will need to bind-mount a directory on the host system to a mount within the running container, so that they can access the output file written by the profiler. For example:
podman-hpc run --gpu --rm -w /work -v $PWD:/work nsys:test nsys profile mytest
Option 2- Profile a containerized application using a profiling tool from the host system mounted into the running container. This may be necessary if the container does not ship with the profiling tool installed and/or the user does not have the source code to build the profiling tool themselves. In this case the user may need to adjust
LD_LIBRARY_PATH in the running container so that the profiling tool and its dependencies can be used. For example:
podman-hpc run --gpu --rm -w /work -v $PWD:/work \ -v $EBROOTNSIGHTMINSYSTEMS/target-linux-x64:$EBROOTNSIGHTMINSYSTEMS/target-linux-x64 \ -v $EBROOTNSIGHTMINSYSTEMS/host-linux-x64:$EBROOTNSIGHTMINSYSTEMS/host-linux-x64 \ nsys:test ./profile.sh
#!/bin/bash export LD_LIBRARY_PATH=$EBROOTNSIGHTMINSYSTEMS/target-linux-x64:$EBROOTNSIGHTMINSYSTEMS/host-linux-x64:$LD_LIBRARY_PATH export PATH=$EBROOTNSIGHTMINSYSTEMS/target-linux-x64:$PATH nsys profile mytest
Note that in this case
nsys should be mounted into the container using the original path on the host to ensure that all dependencies can be correctly found.
Option 3- Profile a containerized application using a profiling tool on the host (i.e. outside the container). This means that the container must be launched with additional settings made to enable the collection of various system and kernel metrics. For example:
strace -e trace=file -f -o podman.strace podman-hpc run --rm --net=host --privileged --cap-add SYS_ADMIN --cap-add SYS_PTRACE openmpi:test date
Note that Option 3 is currently known to work with strace, but not currently with nsys.
As of the September 28-29, 2023 system maintenance, we have introduced a bug in the
podman-hpc images command. Users who wish to display the images they have migrated will need to use the command
podman-hpc images --storage-opt additionalimagestore=$SCRATCH/storage
We have a fix for this and anticipate that it will be resolved in the next maintenance which is tentatively scheduled for October 10, 2023.
--userns=keep-id may not be working for migrated images. See issue for more information.
OpenMPI using external PMIx is not yet working. We are actively working to enable this in
We have had reports that the
screen command mangles
podman-hpc commands. We suggest using
podman-hpc in a bare shell.
If you discover what appears to be an issue or bug, please let us know in our issue tracker.
podman-hpc can get into a bad configuration state. You can try clearing several storage areas.
- On a local login node you can delete:
- On a compute or login node you can delete:
$SCRATCH/storage(and then recreate this directory)
- On a compute node you can delete:
Note that you may need to
podman unshare rm -rf /images/<userid>_hpc exit
to obtain the file permissions needed to delete files in the user namespace.
If clearing these areas doesn't fix your issue, please contact us at
help.nersc.gov so we can help.