How to use Python in Shifter¶
Do you:
- want better performance at large scales (10+ nodes) by improving library load times?
- want a more portable way to manage your Python stack?
- want an environment that is easy to use on a login node, compute node, or as a Jupyter kernel?
- want much more control over your software stack, for stability or legacy software reasons?
- feel tired of conda environments that make it hard to stay under your filesystem quota?
If any of these apply to you, you may find Shifter a good solution for using Python at NERSC.
We performed a small benchmarking study to compare Python performance on $HOME
, /global/common/software
, and Shifter. We summarize the results here:
Number of nodes | $HOME | /global/common/software | Shifter |
---|---|---|---|
1 | 0m4.256s | 0m3.894s | 0m3.998s |
10 | 0m10.025s | 0m4.891s | 0m4.274s |
100 | 0m30.790s | 0m17.392s | 0m7.098s |
500 | 4m7.673s | 0m48.916s | 0m14.193s |
This benchmark example supports our recommendation that users consider using Shifter at jobsizes larger than 10 nodes. At large scale (100+ nodes), we strongly urge users to use Shifter. If Shifter is not an option, we suggest that users consider /global/common/software instead
.
Shifter¶
At NERSC, our current container solution are Shifter and Podman-HPC. Below, we provide several example Python Dockerfiles intended to help get you started using Python in Shifter.
You should be able to copy and use all of these Dockerfiles to build images on your own system, you will find instructions on building a container in our Building Shifter Images documentation.
You will also find an mpi4py example on our main Shifter page and documentation on integrating Shifter with Jupyter kernels on our Using Shifter with Jupyter page.
Example Python Dockerfiles¶
Basic Python Dockerfile example¶
First we'll demonstrate a basic container with Python. We'll make it easy by starting from an image where Python 3 is already installed. Note that we are using the latest
tag, so if you require a different version, you will need to adjust this tag. We'll install numpy
and scipy
using pip. If your Python setup is relatively simple, you may find that pip will meet your package installation requirements within an image. If your setup is more complex or if you rely on packages that are only distributed via conda, you'll want to skip ahead to our next example.
FROM docker.io/library/python:latest
WORKDIR /opt
RUN \
pip3 install \
--no-cache-dir \
numpy \
scipy
Using pip
inside a container.
Do not install dependencies with pip
from within a running container. This will install the dependencies in your .local
folder, making them visible to any container you might run, causing unwanted behavior as an installation contaminates another environment.
Conda environment Dockerfile example¶
For those of you who are used to conda environments, there are a few key concepts that you will find different in containers. First, you won't want to build and activate a separate custom environment. Instead, you'll just want to install the packages you need into the base environment and then make this environment available by adding it to your PATH
. We suggest that each image be used for a single Python environment. (If you find yourself needing multiple conda environments in the same image, most likely you'll want multiple images.) To save space, you'll likely want to start with miniconda. In this example, we'll start from an image in which miniconda has already been installed. As in our previous example, we'll install numpy
and scipy
.
FROM docker.io/continuumio/miniconda3:latest
ENV PATH=/opt/conda/bin:$PATH
RUN /opt/conda/bin/conda install numpy scipy
Python GPU Dockerfile example¶
If you plan to use Python on GPUs, you will likely find it easiest to start with an NVIDIA-provided image that includes CUDA and related libraries. This example demonstrates how to build an image to use Dask. In our example, we FROM
on top of an NVIDIA CUDA base image. Note that in addition to base
, NVIDIA also offers runtime and devel flavors of images.
In this example, we use mamba to speed up the package installation process. You can also see that we attempt to shrink our image by deleting whatever we can when we're done. This will reduce the time it takes to upload to the registry and download via Shifter. Note however that the NVIDIA images, even the base
image, are quite large.
FROM nvidia/cuda:11.2.1-base-ubuntu20.04
ENV DEBIAN_FRONTEND noninteractive
WORKDIR /opt
RUN \
apt-get update && \
apt-get upgrade --yes && \
apt-get install --yes \
wget \
vim && \
apt-get clean all && \
rm -rf /var/lib/apt/lists/*
#install miniconda
#pin to python 3.8 for rapids compatibility
ENV installer=Miniconda3-py38_4.9.2-Linux-x86_64.sh
RUN wget https://repo.anaconda.com/miniconda/$installer && \
/bin/bash $installer -b -p /opt/miniconda3 && \
rm -rf $installer
ENV PATH=/opt/miniconda3/bin:$PATH
#use mamba to speed up package resolution
RUN /opt/miniconda3/bin/conda install mamba -c conda-forge -y
RUN \
/opt/miniconda3/bin/mamba install \
dask-cuda \
dask-cudf \
ipykernel \
matplotlib \
seaborn \
-c rapidsai-nightly -c nvidia -c conda-forge -c defaults -y && \
/opt/miniconda3/bin/mamba clean -a -y
If you have questions about any of these examples or about how to use Python in Shifter, we encourage you to check our How to use Shifter page or contact NERSC's online help desk.