Skip to content

How to use Python in Shifter

Do you:

  • want better performance at large scales (10+ nodes) by improving library load times?
  • want a more portable way to manage your Python stack?
  • want an environment that is easy to use on a login node, compute node, or as a Jupyter kernel?
  • want much more control over your software stack, for stability or legacy software reasons?
  • feel tired of conda environments that make it hard to stay under your filesystem quota?

If any of these apply to you, you may find Shifter a good solution for using Python at NERSC.

We performed a small benchmarking study to compare Python performance on $HOME, /global/common/software, and Shifter. We summarize the results here:

Number of nodes $HOME /global/common/software Shifter
1 0m4.256s 0m3.894s 0m3.998s
10 0m10.025s 0m4.891s 0m4.274s
100 0m30.790s 0m17.392s 0m7.098s
500 4m7.673s 0m48.916s 0m14.193s

This benchmark example supports our recommendation that users consider using Shifter at jobsizes larger than 10 nodes. At large scale (100+ nodes), we strongly urge users to use Shifter. If Shifter is not an option, we suggest that users consider /global/common/software instead.

Shifter

At NERSC, our current container solution are Shifter and Podman-HPC. Below, we provide several example Python Dockerfiles intended to help get you started using Python in Shifter.

You should be able to copy and use all of these Dockerfiles to build images on your own system, you will find instructions on building a container in our Building Shifter Images documentation.

You will also find an mpi4py example on our main Shifter page and documentation on integrating Shifter with Jupyter kernels on our Using Shifter with Jupyter page.

Example Python Dockerfiles

Basic Python Dockerfile example

First we'll demonstrate a basic container with Python. We'll make it easy by starting from an image where Python 3 is already installed. Note that we are using the latest tag, so if you require a different version, you will need to adjust this tag. We'll install numpy and scipy using pip. If your Python setup is relatively simple, you may find that pip will meet your package installation requirements within an image. If your setup is more complex or if you rely on packages that are only distributed via conda, you'll want to skip ahead to our next example.

FROM docker.io/library/python:latest

WORKDIR /opt

RUN \
    pip3 install            \
        --no-cache-dir      \
        numpy               \
        scipy

Using pip inside a container.

Do not install dependencies with pip from within a running container. This will install the dependencies in your .local folder, making them visible to any container you might run, causing unwanted behavior as an installation contaminates another environment.

Conda environment Dockerfile example

For those of you who are used to conda environments, there are a few key concepts that you will find different in containers. First, you won't want to build and activate a separate custom environment. Instead, you'll just want to install the packages you need into the base environment and then make this environment available by adding it to your PATH. We suggest that each image be used for a single Python environment. (If you find yourself needing multiple conda environments in the same image, most likely you'll want multiple images.) To save space, you'll likely want to start with miniconda. In this example, we'll start from an image in which miniconda has already been installed. As in our previous example, we'll install numpy and scipy.

FROM docker.io/continuumio/miniconda3:latest

ENV PATH=/opt/conda/bin:$PATH

RUN /opt/conda/bin/conda install numpy scipy

Python GPU Dockerfile example

If you plan to use Python on GPUs, you will likely find it easiest to start with an NVIDIA-provided image that includes CUDA and related libraries. This example demonstrates how to build an image to use Dask. In our example, we FROM on top of an NVIDIA CUDA base image. Note that in addition to base, NVIDIA also offers runtime and devel flavors of images.

In this example, we use mamba to speed up the package installation process. You can also see that we attempt to shrink our image by deleting whatever we can when we're done. This will reduce the time it takes to upload to the registry and download via Shifter. Note however that the NVIDIA images, even the base image, are quite large.

FROM nvidia/cuda:11.2.1-base-ubuntu20.04

ENV DEBIAN_FRONTEND noninteractive

WORKDIR /opt

RUN \
    apt-get update        && \
    apt-get upgrade --yes && \
    apt-get install --yes    \
        wget                 \
        vim              &&  \
    apt-get clean all    &&  \
    rm -rf /var/lib/apt/lists/*

#install miniconda
#pin to python 3.8 for rapids compatibility
ENV installer=Miniconda3-py38_4.9.2-Linux-x86_64.sh

RUN wget https://repo.anaconda.com/miniconda/$installer && \
    /bin/bash $installer -b -p /opt/miniconda3          && \
    rm -rf $installer

ENV PATH=/opt/miniconda3/bin:$PATH

#use mamba to speed up package resolution
RUN /opt/miniconda3/bin/conda install mamba -c conda-forge -y

RUN \
    /opt/miniconda3/bin/mamba install \
    dask-cuda \
    dask-cudf \
    ipykernel \
    matplotlib \
    seaborn \
    -c rapidsai-nightly -c nvidia -c conda-forge -c defaults -y && \
    /opt/miniconda3/bin/mamba clean -a -y

If you have questions about any of these examples or about how to use Python in Shifter, we encourage you to check our How to use Shifter page or contact NERSC's online help desk.