Skip to content

PyTorch

PyTorch is a high-productivity Deep Learning framework based on dynamic computation graphs and automatic differentiation. It is designed to be as close to native Python as possible for maximum flexibility and expressivity.

Availability on Cori

PyTorch can be picked up from the Anaconda python installations (e.g. via module load python) or from dedicated modules with distributed support (including MPI) enabled. You can see which versions are available with module avail pytorch.

The currently recommended version of PyTorch to use on Cori Haswell and KNL is the latest version, v1.5.0, which can be loaded with

module load pytorch/v1.5.0

On Cori-GPU, you can use the corresponding module pytorch/v1.5.0-gpu.

Customizing environments

Want to integrate your own packages with PyTorch at NERSC? There are two suggested solutions:

  1. Install your packages on top of our PyTorch + Python installations - You can use the $PYTHONUSERBASE environment variable (set automatically when you load one of our modules) and user installations with pip install --user ... to install your own packages on top of our PyTorch installations.
  2. Install PyTorch into your custom conda environments - You can setup a conda environment as described in our Python documentation and install PyTorch into it. If you do not need distributed support, you can install PyTorch via conda or pip as described at https://pytorch.org/get-started/locally/. If you need distributed support, it can be a little trickier. We share our build scripts for PyTorch at https://github.com/sparticlesteve/nersc-pytorch-build. Please open a support ticket at http://help.nersc.gov/ for assistance.

Multi-node training

PyTorch makes it fairly easy to get up and running with multi-node training via its included distributed package. Refer to the distributed tutorial for details: https://pytorch.org/tutorials/intermediate/dist_tuto.html

Examples

We're putting together a coherent set of example problems, datasets, models, and training code in this repository: https://github.com/NERSC/pytorch-examples

This repository can serve as a template for your research projects with a flexibly organized design for layout and code structure. The template branch contains the core layout without all of the examples so you can build your code on top of that minimal, fully functional setup. The code provided should minimize your own boiler plate and let you get up and running in a distributed fashion on Cori as quickly and seamlessly as possible.

The examples include:

  • A simple hello-world example
  • MNIST image classification with a simple CNN
  • CIFAR10 image classification with a ResNet50 model
  • DCGAN (currently disabled until update)