Skip to content

Base compilers on NERSC systems

Introduction

There are several options for compilers that can be used on NERSC compute systems. Some of the compilers are open-source products, while others are commercial. These compilers may have different features, optimize some codes better than others, and/or support different architectures or standards. It is up to the user to decide which compiler is best for their particular application.

These base compilers are loaded into the user environment via the programming environment modules. They can then be invoked through compiler wrappers (recommended) or on their own. All compilers on NERSC machines are able to compile codes written in C, C++, or Fortran, and provide support for OpenMP.

On Cori, there are three vendor-provided compiler suites: Intel, GNU, and Cray. In addition, NERSC provides LLVM compilers.

There are several vendor-provided base compilers available on Perlmutter, with varying levels of support for GPU code generation: Cray, GNU, AOCC (AMD Optimizing C/C++ Compiler), and NVIDIA. NERSC plans to provide LLVM compilers on Perlmutter, at a date TBD.

LLVM compilers not compatible with all vendor software

The LLVM compilers are not supported by HPE Cray and therefore are not compatible with all of the same software and libraries that the vendor-provided compiler suites are, but may nevertheless be useful for users who require an open-source LLVM-based compiler toolchain.

Below is a table listing the available compilers on Perlmutter and Cori, with the default compilers indicated.

Compilers Perlmutter Cori
Intel -
(Default)
GNU
(Default)
Cray
NVIDIA -
AOCC -
LLVM
(Provided by NERSC)

(Provided by NERSC)

All vendor-supplied compilers are provided via the "programming environments" that are accessed via the module utility. Each programming environment contains the full set of compatible compilers and libraries. To change from one compiler suite to another, you change the programming environment via the module swap command. For example, the following command changes from the GNU programming environment to the Cray environment. Since Perlmutter uses Lmod, loading rather than explicit swapping works there.

module swap PrgEnv-gnu PrgEnv-cray      # On Cori and Perlmutter
module load PrgEnv-cray                 # On Perlmutter only

Programming environment for using GPUs on Perlmutter

To compile a CUDA source code in any of the supported programming environments, the cudatoolkit module is required to make the CUDA Toolkit accessible. The toolkit includes GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler, and a runtime library to build and deploy your application. For information about the CUDA Toolkit, see the documentation. Note that this module is not loaded by default.

To set the NVIDIA GPUs as the OpenMP and OpenACC offloading target while using the Cray compiler wrappers, use the compiler flag -target-accel=nvidia80 or set the environment variable CRAY_ACCEL_TARGET to nvidia80. To set the acceleration target to host CPUs instead, use the -target-accel=host flag, set the environment variable to host, or load the craype-accel-host module.

Do not use base compilers' target flag with the Cray compiler wrappers

The base compiler's target flag (e.g., NVIDIA's -target=gpu) will not work with the Cray compiler wrappers.

Using compatible gcc for CUDA compiler drivers with PrgEnv-gnu

When using the PrgEnv-gnu environment in conjunction with the cudatoolkit module (i.e., if compiling any application for both host and device side), one must note that not every version of gcc is compatible with every version of nvcc.

Older versions of the cudatoolkit may not support the default GCC compiler (see document outlining supported host compilers for each nvcc installation). For older versions, one can use the cpe-cuda module available on the system to automatically downgrade the gcc version or manually load the version of GCC that is supported by the older cudatoolkit.

If using the cpe-cuda module, it must be loaded after loading the PrgEnv-gnu:

  module load PrgEnv-gnu
  module load cudatoolkit
  module load cpe-cuda

Compilers

Intel

The Intel compiler suite is available via the PrgEnv-intel module, which will load the intel module for Intel base compilers. This compiler suite is loaded by default on Cori. The base compilers in this suite are:

  • C: icc
  • C++: icpc
  • Fortran: ifort

See the full documentation of the Intel compilers. Additionally, compiler documentation is provided through man pages (e.g., man icpc) and through the -help flag to each compiler (e.g., ifort -help).

OpenMP and OpenACC

To enable OpenMP, use the -qopenmp flag.

The Intel compilers do not support OpenACC.

GNU

The GCC compiler suite is available via the PrgEnv-gnu module, which will load the gcc module for the GNU base compilers. The base compilers in this suite are:

  • C: gcc
  • C++: g++
  • Fortran: gfortran

See the full documentation of the GCC compilers. Additionally, compiler documentation is provided through man pages (e.g., man g++) and through the --help flag to each compiler (e.g., gfortran --help).

OpenMP and OpenACC

To enable OpenMP for CPU code, use the -fopenmp flag.

OpenMP/OpenACC offloading to GPUs not supported yet

Offloading to GPUs with OpenMP/OpenACC is not supported in the PrgEnv-gnu environment on Perlmutter at the moment. The offloading-related information below is for future references only, and can be updated.

GCC has support for OpenMP and OpenACC offloading to GPUs. OpenMP offloading with gcc looks something like:

gcc -fopenmp -foffload=nvptx-none="-Ofast -lm -misa=sm_80" base.c -c

where -misa=sm_80 is for the NVIDIA A100 GPU. The extra compile flags of -Ofast -lm are passed for building a binary for the architecture.

Note that, if the Cray compiler wrapper, cc, is used instead, use the -target-accel=nvidia80 flag instead.

cc -fopenmp -target-accel=nvidia80 base.c -c

OpenMP/OpenACC GPU offload support in GCC is limited

The GCC compiler's offload capabilities for GPU code generation may be limited, in terms of both functionality and performance. Users are advised to try different compilers for C/C++ codes, which also includes a Fortran compiler with OpenMP offload capability.

Mixture of C/C++/Fortran and CUDA codes

The programming environment supports a mixture of C/C++/Fortran and CUDA codes. CUDA and CPU codes should be in separate files, and Cray compiler wrapper commands must be used at link time:

CC -c main.cxx
nvcc -c cuda_code.cu
CC -o main.ex main.o cuda_code.o

Compatibility between nvcc host compiler and gcc compiler

To make the above work, the GCC version needs to be 9.x due to compatibility issues between the compilers.

Cray

The HPE Cray compiler suite is available via the PrgEnv-cray module, which will load the cce module for the Cray base compilers. The base compilers in this suite are:

  • C: cc
  • C++: CC
  • Fortran: ftn

Full documentation of the Cray compilers is provided in the HPE Cray Clang C and C++ Quick Reference for the C/C++ compilers, and the HPE Cray Fortran Reference Manual for the Fortran compiler. Additionally, compiler documentation is provided through man pages (e.g., man clang or man crayftn) or the help page (cc -help, etc.).

Cray base compilers and Cray compiler wrappers are not the same

It is easy to confuse the Cray base compilers and the compiler wrappers that wrap all compilers, since their names are identical. The underlying compiler that is currently loaded is based on the programming environment that has been loaded; for example, if PrgEnv-gnu has been loaded, then invoking cc ultimately invokes gcc, not the Cray C compiler.

Major changes to Cray compilers starting in version 9.0

Version 8.7.9 of the Cray compiler (CCE) is the last version based on the old compiler environment and default settings. Starting in version 9.0, Cray made major changes to the C/C++ compilers, and smaller changes to the Fortran compiler. In particular:

  • The C/C++ compilers have been replaced with LLVM and clang, with some additional Cray enhancements. This means that nearly all of the compiler flags have changed, and some capabilities available in CCE 8 and previous versions are no longer available in CCE 9. It may also result in performance differences in code generated using CCE 8 vs CCE 9, due to the two versions using different optimizers.
  • OpenMP has been disabled by default in the C, C++, and Fortran compilers. This behavior is more consistent with other compilers. To enable OpenMP, one can use the following flags:
    • C/C++: -fopenmp
    • Fortran: -h omp

Cray provides a migration guide for users switching from CCE 8 to CCE 9.

For users who are unable to migrate their workflows to the clang/LLVM-based CCE 9 C/C++ compilers, Cray has simultaneously released a CCE 9 "classic" version, which continues to use the same compiler technology in CCE 8 and older versions. This version of CCE is available as the module cce/<version>-classic. However, users should be aware that "classic" CCE is now considered "legacy," and that all future versions of CCE are based on clang/LLVM. See the the Cray Classic C and C++ Reference Manual.

OpenMP and OpenACC

To enable OpenMP for CPU code, use the -fopenmp flag.

OpenMP/OpenACC offloading to GPUs not supported yet

Offloading to GPUs with OpenMP/OpenACC is not supported in the PrgEnv-cray environment on Perlmutter at the moment. The offloading-related information below is for future references only, and can be updated.

The Cray compilers have a mature OpenMP offloading implementation.

Compiling codes using OpenMP offload capabilities on Perlmutter requires different flags for C and C++ codes than for Fortran codes. The C and C++ compilers are based on clang, and thus use similar flags that one would use for clang to generate OpenMP offload code:

cc -fopenmp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.c

CC -fopenmp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.cpp

For Fortran codes, the flag is different, and the environment variable CRAY_ACCEL_TARGET must be set to nvidia80 at compile time, or use the `-target-accel=nvidia80 compiler flag. Then, build as follows:

ftn -h omp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.f90

Only the Fortran compiler supports OpenACC.

The compiler flag for enabling OpenACC in Fortran codes is -h acc. To offload to GPUs, use the -target-accel=nvidia80 compiler flag, or set the CRAY_ACCEL_TARGET environment variable to nvidia80.

Explicitly set the target to host CPUs when compiling OpenMP/OpenACC code for the host on Perlmutter

Due to an issue with the PrgEnv-cray compiler wrappers, you must add -target-accel=host compiler option or load the craype-accel-host module in order to successfully compile any OpenMP/OpenACC code for the host.

ftn -h omp -target-accel=host -o my_openmp_code.ex my_openmp_code.f90

Mixture of C/C++/Fortran and CUDA codes

The programming environment allows a mixture of C/C++/Fortran and CUDA codes. In this case CUDA and CPU codes should be in separate files. Cray compiler wrapper commands must be used at link time, and CUDA runtime must be included:

CC -c main.cxx
nvcc -c cuda_code.cu
CC -o main.ex main.o cuda_code.o -lcudart

NVIDIA

The NVIDIA compiler suite is available via the PrgEnv-nvidia module, which will load the nvidia module for the NVIDIA base compilers. The base compilers in this suite are:

  • CUDA compiler drivers
    • CUDA C/C++: nvcc
    • CUDA Fortran: nvfortran
  • HPC compilers: for host multithreading and GPU offloading with OpenMP, OpenACC, C++17 Parallel Algorithms and Fortran's DO-CONCURRENT; part of the NVIDIA HPC SDK:
    • C: nvc
    • C++: nvc++
    • Fortran: nvfortran

The CUDA compiler drivers are used to compile CUDA codes. Below is to compile a hello-world CUDA code, helloworld.cu, to generate an executable helloworld:

$ cat helloworld.cu
#include <stdio.h>

__global__ void helloworld() {
  printf("Hello, World!\n");
}

int main() {
  helloworld<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}

$ nvcc -o helloworld helloworld.cu

OpenMP, OpenACC and CUDA

If OpenMP and CUDA code coexist in the same program, the OpenMP runtime and the CUDA runtime use the same CUDA context on each GPU. To enable this coexistence, use the compilation and linking option -cuda, as shown below.

$ cat cuda_interop.cpp      # offload code calling a function in a CUDA code
...
#pragma omp target data map(from:array2D[0:M][0:N])
{
  ...
#pragma omp target data use_device_ptr(p)
  {
    add_i_slice(p, i, N);
  }
  ...
}
...

$ cat interop_kernel.cu     # CUDA code where the called function is defined
...
__global__ void add_kernel(int *slice, int t, int n)
{
  ...
}

void add_i_slice(int *slice, int i, int n)
{
  add_kernel<<<n/128, 128>>>(slice, i, n);
}
...

$ nvc++ -Minfo -mp -target=gpu -c cuda_interop.cpp
$ nvcc -c interop_kernel.cu

$ nvc++ -mp -target=gpu -cuda interop_kernel.o cuda_interop.o

where -mp is to enable OpenMP and -target=gpu is to offload the OpenMP construct to GPUs.

Note that, in the above non-MPI code example, the HPC compiler nvc++ is used, but the Cray compiler wrapper, CC, can be used instead. In that case, drop the -target=gpu flag from the CC commands as the offload target is correctly set by the craype-accel-nvidia80 module. MPI codes must be compiled with the Cray compiler wrapper if Cray MPI is to be used.

The HPC compilers support OpenMP and OpenACC offloading. Invoking OpenACC in the HPC compilers, for example, looks like:

nvfortran -acc=gpu -Minfo=acc -o main.ex main.f90

or

nvfortran -acc -target=gpu -Minfo=acc -o main.ex main.f90

where the flag -acc is to enable OpenACC for GPU execution only, and -Minfo=acc prints diagnostic information to STDERR regarding whether the compiler was able to produce GPU code successfully.

Note that, when the HPE Cray compiler wrappers are used, replace the -target=gpu flag with -target-accel=nvidia80.

C++17 introduced parallel STL algorithms ("pSTL"), such that standard C++ code can express parallelism when using many of the STL algorithms. The NVIDIA HPC compilers supports GPU-accelerated pSTL algorithms, which can be activated by invoking nvc++ with the flag -stdpar=gpu. See the documentation regarding pSTL for the HPC SDK.

GPU acceleration of Fortran's DO CONCURRENT is enabled also with the -stdpar option. If the flag is specified, the compiler does the parallelization of the DO CONCURRENT loops and offloads them to the GPU. All data movement between host memory and GPU device memory is performed implicitly and automatically under the control of CUDA Unified Memory. It is also possible to target a multi-core CPU with -stdpar=multicore. For more info, check the NVIDIA blog, Fortran Standard Parallelism.

The NVIDIA HPC SDK provides cuTENSOR extensions so that some Fortran intrinsic math functions can be accelerated on GPUs. Accelerated functions include MATMUL, TRANSPOSE, and several others. The nvfortran compile provides access to these GPU-accelerated functions via the module cutensorEx. See the documentation about the cutensorEx module in nvfortran.

CUDA Math libraries (cuBLAS, cuFFT, cuFFTW, cuSOLVER, etc.) can be linked easily by specifying the name of the library with the -cudalib flag:

nvfortran -Minfo -mp -target=gpu -cudalib=cublas mp_cublas.f90

Note again that, when the HPE Cray compiler wrapper ftn is used, replace the -target=gpu flag with -target-accel=nvidia80.

Full documentation of the NVIDIA compilers can be found in the NVIDIA HPC Compilers, User's Guide and the CUDA C++ Programming Guide.

Please check the NVIDIA HPC SDK - OpenMP Target Offload Training, December 2020 for useful information on the HPC compilers.

AOCC

The AOCC (AMD Optimizing C/C++ Compiler) compiler suite is based on LLVM and includes many optimizations for the AMD processors. It supports Flang as the Fortran front-end compiler. The AOCC suite is available via the PrgEnv-aocc module, which will load the aocc module for the AOCC base compilers. The base compilers in this suite are:

  • C: clang
  • C++: clang++
  • Fortran: flang

Full documentation of the AOCC compilers is provided at AOCC webpage, where you can find user manuals and a quick reference guide: AOCC User Guide, Clang – the C, C++ Compiler, Flang – the Fortran Compiler and Compiler Options Reference Guide for AMD EPYC 7xx3 Series Processors.

OpenMP and OpenACC

The compilers can generate the OpenMP parallel code for the host CPU only, and do not support offloading to NVIDIA GPUs. To enable OpenMP, add the compiler flag -fopenmp for C and C++ and -mp for Fortran:

clang -fopenmp -o my_openmp_code.ex my_openmp_code.c

clang++ -fopenmp -o my_openmp_code.ex my_openmp_code.cpp

flang -mp -o my_openmp_code.ex my_openmp_code.f90

When using the HPE Cray compiler wrappers, add the target flag -target-accel=nvidia80 for offloading to GPUs.

OpenACC is not supported.

Mixture of C/C++/Fortran and CUDA codes

The programming environment allows a mixture of C/C++/Fortran and CUDA codes. In this case CUDA and CPU codes should be in separate files. Cray compiler wrapper commands must be used at link time, and CUDA runtime must be included:

CC -c main.cxx
nvcc -c cuda_code.cu
CC -o main.ex main.o cuda_code.o -lcudart

LLVM

Note

The information below is about the LLVM compilers on Cori. Information for Perlmutter can be found in the NPE and PrgEnv-llvm web page.

The LLVM core libraries along with the compilers are locally built by NERSC, not HPE Cray. It is compiled against the GCC compiler suite and thus cannot be used with the Intel or HPE Cray programming environments.

The base compilers in this suite are:

  • C: clang
  • C++: clang++

In order to enable the clang compiler, first make sure to load the gnu programming environment

module load gcc
module load llvm/<version>

where module avail llvm displays which versions are currently installed.

Using the clang++ compiler

The clang++ compiler will fail unless you add a compiler option to use an official C++ standard, e.g., -std=c++11. The issue seems to be related to GPU-offload support for GCC extensions, e.g., __float128 type.

The LLVM/clang compiler is also a valid CUDA compiler. One can replace NVIDIA's nvcc command with clang --cuda-gpu-arch=<arch>, where <arch> on the Cori GPU nodes is sm_80. If using clang as a CUDA compiler, one usually will also need to add the -I/path/to/cuda/include and -L/path/to/cuda/lib64 flags manually, since nvcc includes them implicitly.

For documentation of the LLVM compilers, see LLVM, Clang, and Flang websites. Additionally, compiler documentation is provided through man pages (e.g., man clang) and through the -help flag to each compiler (e.g., clang -help).

Common compiler options

Below is a table documenting common flags for each of the compilers.

Intel GNU Cray NVIDIA AOCC LLVM comment
Overall optimization -O<n>, -Ofast -O<n>, -Ofast -O<n>, -Ofast -O<n> -O<n>, -Ofast Replace <n> with 1, 2, 3, etc.
Enable OpenMP -qopenmp -fopenmp -fopenmp for C/C++ with CCE 9.0 or later; -h omp, otherwise -mp[=multicore*|[no]align] *: default C/C++: -fopenmp; Fortran: -mp -fopenmp OpenMP enabled by default in Cray.
Enable OpenACC - -fopenacc Fortran: -h acc -acc - - OpenACC not supported by clang/clang++.
Free-form Fortran -free -ffree-form -f free -Mfree -Mfreeform Also determined by file suffix (.f, .F, .f90, etc.)
Fixed-form Fortran -fixed -ffixed-form -f fixed -Mfixed -Mfixed Also determined by file suffix (.f, .F, .f90, etc.)
Debug symbols -g -g N/A HPC compilers: -g, -gopt; CUDA: -g (or --debug) for host code and -G (or --device-debug) for device code -g -g Debug symbols enabled by default in Cray.