Best Practices for using Python at NERSC

Many of us know and love Python because of its ease of use. But don't be fooled by this friendly language and environment! Like any programming language, there are plenty of ways to shoot yourself in the foot in Python.

Here are some tips to help you avoid common Python problems at NERSC so you can work on your Nature article instead of debugging your code.

Always Load a Python Module

Always use a version of Python provided by NERSC through

module load python
with an optional version suffix. To see which Python modules we offer you can type module show python.

Never use the version of Python found at /usr/bin/python. This is an older version of Python that NERSC does not support for our users.

Choose your own adventure: our default module or a custom conda environment

At NERSC there are two major ways of using Python:

  • Simply use our default Python modules via module load python where we have done the work for you by installing the most commonly used Python libraries.

  • Build your own conda environment. You might do this if you need a library that is not in our default modules or require a custom setup. Building a conda environment is fast and easy. You can read more about it here.

The benefits of using either our base Python module (which is actually itself a conda environment) or your custom conda environment are numerous:

  • Conda environments will correctly set your interpreter search path automatically. This means you don't need to set things like PYTHONPATH. For more about this, see below.

  • Conda environments are disposable. If something goes wrong, simply delete your environment and build a new one.

  • Conda environments check to see that your libraries are compatible and will upgrade or downgrade them as necessary when you install new packages.

Select the Right File System for Installing Your Own Packages

  • If you mostly run serial Python jobs or use multiprocessing on a single node, you might be able to just install Python packages in your $HOME directory or on the Community file system without seeing any substantial performance issues.

  • The best-performing shared file system for launching parallel Python applications is /global/common. This file system is mounted read-only on Cori compute nodes with client-side caching enabled. This is the file system that NERSC uses to install software modules (such as Python). Contact NERSC if to see if your required packages can be made available on /global/common either as a NERSC-build module or through Anaconda.

  • There are several interventions that can further improve Python package import times. For more information please see here. Users are advised to consider them and choose one that delivers the performance they desire at the level of invasiveness they are willing to accept.

Shell resource files: out of sight, out of mind

Some developers like to add things to their shell resource files (i.e. .bashrc) to avoid having to type things over and over again. Ok we get it, nobody likes unnecessary typing. However, there is something to be said for manually setting environment variables and loading modules because it forces you to think about what you are doing.

Please use your shell resource files judiciously. They can be a good resource but you should periodically check them to see if they need to be changed or updated. Because it is easy to forget they are there, shell resource files can make debugging a challenge.

Setting PYTHONPATH not advised

What is PYTHONPATH anyway? PYTHONPATH is an environment variable that lets you add additional search paths to the standard places your Python interpreter looks for modules. You can read further about the different types of environment variables used by Python here.

In a well-constructed Python setup (like our Python modules or your own custom conda environment), there is almost no reason to set environment variables like PYTHONPATH since your environment will already know exactly where to find everything it needs.

PYTHONPATH should only be used when other alternatives are not feasible.

pip tips

On Cori-- a shared system in which none of us (very fortunately) have root access-- users in the default Python environment must append --user

pip install myfavoritepackage --user
to install into their own environment without invoking root privileges.

Sometimes you want to pip install in your custom conda environment. In this case, since you own your custom conda environment, you don't need to append --user. The right syntax is

pip install myfavoritepackage
This can be confusing. It is important to think carefully about which kind of environment you are in before you type.

Sometimes pip will try to re-use packages it has already downloaded which can cause problems. Just to be safe it is good practice to use

pip install myfavoritepackage --user --no-cache-dirs
to force pip to re-download all packages.

Upgrade to Python 3

Python 2 retired on January 1, 2020. This means it is now unsupported by the wider Python developer community. You can continue to use Python 2, but bugs and security issues will not be patched. You can find more information here.