Skip to content

Running an application on NERSC resources

Purpose

This playbook will guide you through the steps to run an application on NERSC's Perlmutter supercomputer.

Overview of process

To run a code, we'll need to identify which code we are going to run, and, if applicable, compile it.

We'll need to be sure that we have the right modules loaded, and can find the application, and that it has the proper file permissions to be executed by your account. Similarly, we will determine any input data and ensure that it is accessible.

Next, we will determine how to run the application -- how many nodes, GPUs, MPI processes, OpenMP threads, etc. -- plus the proper input arguments. And we will consider any necessary follow-up to running the code, such as removing interim files, performing data analysis, moving output data to mass storage, etc.

We will also decide whether to run it interactively or through the batch system, and if the latter, develop a batch script to automate the process. Finally, we will submit our job to the Slurm scheduler.

Steps

  1. Identify application to be run
  2. Determine prerequisites for running
  3. Determine post-run procedures
  4. Write a batch script
  5. Submit a job

Detailed instructions

Step 1: Identify application to be run

First, we need to determine what application we are going to run, and how to access it. NERSC does provide a few applications for users; browse the list to determine whether yours is provided. If not, then you may need to compile it yourself.

Step 2: Determine prerequisites for running

To access a NERSC-provided application, or to provide libraries that were linked into an application you compiled, you may need to load modules. Make a note of what modules must be loaded for your application to work. Ensure that file permissions are correct and you are able to access and execute the executable.

In addition, you may need to provide input data. What type of file(s) is/are required? Where are those files stored? Can they be staged onto the scratch file system for better performance? Should this staging be performed as a separate process before the application runs, to minimize any waste of compute time, or is this not a concern?

Determine the command-line arguments to the executable. Some applications may use command-line arguments to invoke a particular solution method or set some sort of error tolerance. Ensure that you understand what the command you are invoking means.

Determine how many compute resources of which kind(s) are necessary for running the application for your problem size. For your first run, it may be wise to try a smaller problem that does not require a lot of resources, for validation purposes. But be sure that you know whether your run will require GPUs, and how much memory, how many nodes, and approximately how long the job will require.

Step 3: Determine post-run procedures

After the application completes, what is the output, and where is it? How do you evaluate the success of a run?

What should be done with any outputs? Can some files be deleted? Is post-processing required? Should outputs be transferred out of the scratch file system, where it runs the risk of being purged according to NERSC policy?

Which actions could be automated, perhaps within the batch script, and which need to be performed by hand?

Step 4: Write a batch script

Next we will encorporate all of the above considerations about how to run the application into our procedure.

Most applications should be run in batch (asynchronous) mode. The majority of applications do not require any human intervention or feedback. So a job that can run asynchronously can be run at any time, including times when you are asleep, spending time with family, or performing another task -- a major win for convenience!

NERSC has extensive documentation on batch scripts, including many example job scripts. Another great resource is the job script generator, which can be used to ensure correct process affinities and other settings.

Step 5: Submit a job

Submit your script with sbatch scriptname (substituting the name of your script for scriptname).

Troubleshooting

There are many small things that can cause a batch script to be unsubmittable.