Migrating from Cori to Perlmutter¶
Cori Retirement Plan¶
Cori had its first users in 2015, and since then, NERSC's longest running system has been a valuable resource for thousands of users and projects. With the complete Perlmutter system to be operational during the 2023 allocation year, NERSC plans to decommission Cori at the end of April 2023.
We will begin decommissioning auxiliary systems associated with Cori at the end of March. On March 31 at noon, the Cori large memory nodes will be taken offline in preparation for their migration to Perlmutter, and the Cori GPU nodes will be retired. The Haswell and KNL nodes will continue to operate through the end of April.
Cori Retirement Timeline¶
- Oct 2022: Software freeze (no new user-facing software installed by NERSC)
- Allocation Year 2023: All allocations based on Perlmutter’s capacity only
- Nov 2022 - Jan 2023: Cori to Perlmutter transition training focus & office hours
- Jan 30, 2023: Final date (T, i.e., end of April 2023) for decommissioning announced
- Feb - Apr 2023: More Cori to Perlmutter transition training focus & office hours
- Mar 31, 2023, at noon: Retire Cori GPU nodes, and take large memory nodes offline
- T - 1 week: Implement reservation, preventing new jobs from running effective T
- T: Delete all jobs from queue, no new jobs can be submitted; continue to allow login to retrieve files from Cori scratch
- T + 1 week: Close login nodes permanently
- T + 1 month: Disassembly begins
Cori has 1,900 Intel Haswell and 9,300 Intel KNL CPU nodes. Perlmutter has 1,536 A100 Nvidia GPU nodes and 3072 AMD CPU-only nodes.
Detailed system architectures info can be found at: Cori Architecture and Perlmutter Architecture. Below is a quick comparison table:
|Peak Performance||~30 PF||~120 PF|
|System Memory||>1 PB||>2 PB|
|Node Performance||>3 TF||>70 TF|
|Node Processors||Intel KNL + Intel Haswell||AMD EPYC (Milan) + Nvidia A100 GPUs|
|# of Nodes||9300 KNL + 1900 Haswell||1536 GPU Accelerated + 3072 CPU-only|
|Intra-Node Interconnect||N/A||NVLink across GPUs; PCIe|
|File System||28 PB, 0.75 TB/s||35PB All-Flash; > 4TB/s|
Cori / Perlmutter Comparison: Similarities¶
Perlmutter and Cori have similar Cray user environments. There are PrgEnv-xxx modules, and compiler wrappers (ftn, cc, and CC) are used to build applications. Here xxx can be gnu, nvidia, and cray for Perlmutter.
The batch scheduler on both systems is Slurm, with familiar interactive, debug, regular, premium, shared, and overrun queues, etc.
Both Perlmutter and Cori have CPU nodes with standard CPU architectures that function similarly. Perlmutter CPU nodes are AMD processors while Cori CPU nodes are Intel processors. Perlmutter CPU nodes have similar clock speed to Cori Haswell nodes, and have similar number of cores per nodes to Cori KNL nodes.
There are Python, Jupyter, various profiling and debugging tools, workflow tools, science application packages, Data Analytics, and ML/DL packages installed on Perlmutter CPUs, as we have on Cori. The Extreme-scale Scientific Software Stack (E4S) is also available on Perlmutter.
Migrating applications from Cori Haswell to Perlmutter CPU-only nodes is straightforward.
Jobs running on Perlmutter CPU nodes are charged against your CPU allocations, like runs on Cori Haswell or KNL nodes.
Cori / Perlmutter Comparison: Differences¶
Perlmutter uses Lmod modules, which differ slightly from the Tcl modules on Cori. Most module commands are the same, but the way the modules are organized is different -- Lmod is hierarchical. For example, with Lmod, modules may not be initially visible because of dependencies; you should use
module spider instead of
module avail to search for modules in this hierarchical organization scheme.
Cori supports Intel (default), CCE, and GCC compilers. Perlmutter supports GCC (default), Nvidia, CCE, and LLVM compilers. Currently, there is no plan to support Intel compilers on Perlmutter.
Perlmutter also has Nvidia GPU nodes, which require substantially different programming models in order to exploit the GPU. User codes may have different GPU-compatible and CPU-only versions. More profiling and debugging tools, science application packages, Data Analytics, and ML/DL packages are installed on Perlmutter for use on GPUs.
Jobs running on Perlmutter GPU nodes are charged against your GPU allocations.
Compiling and Running on CPU Nodes¶
We recommend using
module load cpu to set a cleaner cpu environment, because the gpu environment is loaded as default (see the GPU section below. In most cases, the GPU settings do not have an impact for CPU applications, so this step may not be necessary. Compile and run on Perlmutter CPU is very similar to those on Cori Haswell.
To compile on Perlmutter:
- The default compiler is GCC.
- Using the compiler wrappers (ftn, cc, and CC) will link the default cray-mpich libraries.
module load PrgEnv-xxxto switch to another compiler (where
gnu). There is no need to do
module swap PrgEnv-xxx PrgEnv-yyyas on Cori.
- To enable OpenMP, use the
-fopenmpflag for the GCC and CCE compilers, and the
-mpflag for the Nvidia compiler.
A few quick tips may be helpful for compiling older codes (that worked on Cori) on Perlmutter with the default GCC compiler:
- Fortran: Try
-fallow-argument-mismatchfirst, followed by the more extensive flag
-std=legacyto reduce strictness.
- C/C++: Look for flags that reduce strictness, such as
-Wpedanticcan warn you about lines that break code standards.
Running jobs on Perlmutter CPU nodes is very similar to running on Cori Haswell nodes. One particular thing to point out is how to set the
-c value in the
srun line. The table below shows the compute node comparisons and how the
-c value is calculated for Cori Haswell, Cori KNL, Perlmutter CPU nodes, and the CPU on Perlmutter GPU nodes. (Note: "tpn" = tasks per node)
|-||Cori Haswell||Cori KNL||Perlmutter CPU||CPU on Perlmutter GPU|
|Logical CPUs per physical core||2||4||2||2|
|Logical CPUs per node||64||272||256||128|
|-c value for srun||floor(32/tpn)*2||floor(68/tpn)*4||floor(128/tpn)*2||floor(64/tpn)*2|
Below are some sample batch script comparisons:
Cori-Haswell Pure MPI, 40 nodes, 1280 MPI tasks
#!/bin/bash #SBATCH --qos=regular #SBATCH --constraint=haswell #SBATCH --time=1:00:00 #SBATCH --nodes=40 export OMP_NUM_THREADS=1 srun -n 1280 -c 2 --cpu-bind=cores ./mycode.exe
Perlmutter Pure MPI, 10 nodes, 1280 MPI tasks
#!/bin/bash #!/bin/bash #SBATCH --qos=regular #SBATCH --constraint=cpu #SBATCH --time=1:00:00 #SBATCH --nodes=10 export OMP_NUM_THREADS=1 srun -n 1280 -c 2 --cpu-bind=cores ./mycode.exe
Cori-Haswell MPI/OpenMP, 40 nodes, 160 MPI tasks, 8 OpenMP threads per node
#!/bin/bash #SBATCH --qos=regular #SBATCH --constraint=haswell #SBATCH --time=1:00:00 #SBATCH --nodes=40 export OMP_NUM_THREADS=8 export OMP_PLACES=threads export OMP_PROC_BIND=spread srun -n 160 -c 16 --cpu-bind=cores ./mycode.exe
Perlmutter CPU MPI/OpenMP, 10 nodes, 160 MPI tasks, 8 OpenMP threads per node
#!/bin/bash #SBATCH --qos=regular #SBATCH --constraint=cpu #SBATCH --time=1:00:00 #SBATCH --nodes=10 export OMP_NUM_THREADS=8 export OMP_PLACES=threads export OMP_PROC_BIND=spread srun -n 160 -c 16 --cpu-bind=cores ./mycode.exe
Please see more information at Migrating from Cori to Perlmutter: CPU Codes. You can find example batch scripts on Perlmutter CPU nodes. The Job Script Generator in Iris can help you create a job script template using the job parameters you choose.
Compiling and Running on GPU Nodes¶
CUDA-aware MPI is enabled by default. Modules
gpu are loaded by default. The
gpu module also sets
MPICH_GPU_SUPPORT_ENABLED to 1.
This table summarizes the GPU programming models supported on Perlmutter's GPU nodes (where Nvidia, CCE, and GNU are vendor supported, while LLVM is NERSC supported):
|-||Fortran/C/C++||CUDA||OpenACC 2.x||OpenMP 5.x||CUDA Fortran||Kokkos/Raja||MPI||HIP||DPC++/SYCL|
And the table below shows the recommended Programming Environment for various Programming Models:
|Programming Model||Programming Environment|
|CUDA||PrgEnv-nvidia or PrgEnv-gnu|
|Kokkos||PrgEnv-nvidia or PrgEnv-gnu|
|OpenMP offload||PrgEnv-nvidia or PrgEnv-gnu|
Please see more information at Migrating from Cori to Perlmutter: GPU Codes and the Transitioning Applications to Perlmutter webpage for a wealth of useful information on how to transition your applications for Perlmutter GPU.
For running jobs, refer to Running Jobs on Perlmutter page for many example job scripts on Perlmutter GPU nodes, and more info on GPU Affinity. The Job Script Generator in Iris can help you to create a job script template with the job parameters you select. Also refer to the
CPU on Perlmutter GPU Column in the
Compute Nodes Comparison for CPU Affinity table in the CPU section above to determine how the -c value is calculated.
File Systems and Data Considerations¶
The files you see on Cori in your home directory, and in the global common and community file systems can be accessed from Perlmutter in the same way that they are accessed from Cori, so there is no need to do anything special with them before Cori retires.
Your files on Cori scratch are inaccessible directly from Perlmutter, since Perlmutter and Cori have separate scratch file systems. We will retire Cori scratch along with Cori, so be sure to back up Cori scratch files before Cori retires, or migrate Cori scratch data onto CFS or HPSS via Globus or scp first, then access them on Perlmutter.
One mechanism for large file transfer from Cori scratch to Perlmutter scratch is via Globus, using the Globus end point on Cori for Cori scratch, and the Globus end point on the DTNs for Perlmutter scratch. Please see more information at Transferring Data to and from Perlmutter scratch.
Remove references to the project file system from old scripts
The old symlink
/global/project/projectdirs to CFS on Cori does not exist on Perlmutter; be sure to replace it with
/global/cfs/cdirs in any script you are porting to Perlmutter.
Data Analytics on Perlmutter¶
Users with more advanced data workflow needs please refer to the New User Training, Sept 2022 afternoon materials and the Data Day, Oct 2022 talks and tutorials, on topics such as Workflows, Python/Julia, Jupyter, IO, Containers/Shifter, and Deep Learning; and refer to the abundant Perlmutter user documentations, such as:
- Using Pythton on Perlmutter
- Preparing Python for Perlmutter GPU
- Workflow Tools
- Machine Learning
Slides and videos are available from the training events below: