Skip to content

Papermill

Papermill, developed by Netflix, is an open-source tool that allows users to run Jupyter notebooks 1) via the command line and 2) in an easily parameterizable way. Papermill is best-suited for Jupyter users who would like to run the same notebook with different input values. An example use case could be hyperparameter optimization for machine learning.

For general information about using Jupyter at NERSC, please see our Jupyter documentation.

Strengths of Papermill:

  • Easy to use
  • Allows users to run Jupyter notebooks via the command line
  • Easy to run the same notebook with multiple input parameters
  • Papermill will automatically save each individual completed notebook
  • Provides a framework for reproducible Jupyter workflows
  • Easy on Slurm

Disadvantages of Papermill:

  • Code must already be in a working Jupyter notebook
  • Workflow structure must be specifiable in wrapper Python script
  • Best-suited for serial execution on a single node

How to use Papermill at NERSC

There are two main aspects of using Papermill at NERSC. The first is writing your Jupyter notebook and parameterizing the appropriate cell. The second is building a conda environment that will allow you to use Papermill. We'll provide an example that will show you how to run Papermill at NERSC.

No Papermill needed in your Jupyter notebook itself

Note that you don't need to import Papermill in your Jupyter notebook itself, nor do you need to be using a Jupyter kernel that includes Papermill. The default NERSC Python kernel will do just fine.

The step that transforms your notebook from a normal Jupyter notebook to a Papermill-enabled Jupyter notebook is manually adding a cell tag called parameters. Depending on your version of Jupyter or JupyterLab this might look a little different, but the idea is the same.

JupyterLab >3.0 directions (current NERSC version)

Open a Jupyter notebook in JupyterLab. You'll need to select the cell where you will specify changing input parameters. Note that the Papermill developers advise that all parameters be placed in the same cell to avoid unexpected or undesirable behavior.

Select the cell which contains your parameters. Click the property inspector in the right sidebar (double gear icon). This will open a sidebar window where you have the option to click the Add Tag button. Type parameters and click the + button. This indicates to Papermill that this is the cell which will be changed depending on the input parameters you specify. Save your notebook.

cell_tags_3.0

Legacy JupyterLab directions (JupyterLab <3.0)

Please take to the official Papermill docs for adding the parameter tag for older versions of JupyterLab.

Building your Papermill environment

Papermill is not installed in the NERSC default Python environment, so if you'd like to use it, you need to build your own Papermill conda environment. This is easy:

module load python
conda create -n mypapermillenv python=3.9
source activate mypapermillenv
conda install ipykernel
conda install -c conda-forge papermill

Running your Papermill code

Now that you've parameterized your Jupyter notebook papermill.ipynb and built your conda environment, you're ready to go. Unlike typical Jupyter workflows, you don't have to open or even use Jupyter to run Papermill which might seem a little counterintuitive. This is actually very nice because it doesn't require leaving Jupyter open to run which can sometimes be fragile or tedious.

You can request a compute node either interactively or submit your Papermill workflow via a batch job.

On our interactive node or in our batch script:

module load python
source activate mypapermillenv
python run_papermill.py

This is our Python script run_papermill.py:

import papermill as pm

for i in range(10):
    alpha = i
    beta = i+2
    pm.execute_notebook(input_path='papermill.ipynb',
                      output_path='papermill_output_{}.ipynb'.format(i),
                      parameters={'alpha': alpha, 'beta': beta})

In our example script we are specifying several values of alpha and beta via a garden variety Python loop. This script will create 10 output notebooks, each corresponding to the specific pair of inputs for alpha and beta. More sophisticated orchestration is also possible.

This will run your parameterized papermill notebook and create and save 10 new notebooks. Voila!

You'll see output that looks like this:

(mypapermillenv) elvis@nid00xxxx:~> python run_papermill.py 
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.01s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.61s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.45s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.70s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.46s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.22s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.61s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.90s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.19s/cell]
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00,  2.52s/cell]