Base compilers on NERSC systems¶
Introduction¶
There are several options for compilers that can be used on NERSC compute systems. Some of the compilers are open-source products, while others are commercial. These compilers may have different features, optimize some codes better than others, and/or support different architectures or standards. It is up to the user to decide which compiler is best for their particular application.
These base compilers are loaded into the user environment via the programming environment modules. They can then be invoked through compiler wrappers (recommended) or on their own. All compilers on NERSC machines are able to compile codes written in C, C++, or Fortran, and provide support for OpenMP.
There are several vendor-provided base compilers available on Perlmutter, with varying levels of support for GPU code generation: Cray, GNU, AOCC (AMD Optimizing C/C++ Compiler), and NVIDIA. NERSC also provides LLVM compilers on Perlmutter.
LLVM compilers not compatible with all vendor software
The LLVM compilers are not supported by HPE Cray and therefore are not compatible with all of the same software and libraries that the vendor-provided compiler suites are, but may nevertheless be useful for users who require an open-source LLVM-based compiler toolchain.
Below is a table listing the available compilers on Perlmutter, with the default compilers indicated.
Compilers | Perlmutter |
---|---|
Intel | ✓ |
GNU | ✓ (Default) |
Cray | ✓ |
NVIDIA | ✓ |
AOCC | ✓ |
LLVM | ✓ (Provided by NERSC) |
All vendor-supplied compilers are provided via the "programming environments" that are accessed via the module
utility. Each programming environment contains the full set of compatible compilers and libraries. To change from one compiler suite to another, you change the programming environment via the module swap
command. For example, the following command changes from the GNU programming environment to the Cray environment. Since Perlmutter uses Lmod, loading rather than explicit swapping works there as well.
module swap PrgEnv-gnu PrgEnv-cray #
module load PrgEnv-cray # same functionality
Programming environment for using GPUs on Perlmutter¶
To compile a CUDA source code in any of the supported programming environments, the cudatoolkit
module is required to make the CUDA Toolkit accessible. The toolkit includes GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler, and a runtime library to build and deploy your application. For information about the CUDA Toolkit, see the documentation. Note that this module is not loaded by default.
To set the NVIDIA GPUs as the OpenMP and OpenACC offloading target while using the Cray compiler wrappers, use the compiler flag -target-accel=nvidia80
or set the environment variable CRAY_ACCEL_TARGET
to nvidia80
. To set the acceleration target to host CPUs instead, use the -target-accel=host
flag, set the environment variable to host
, or load the craype-accel-host
module.
Do not use base compilers' target flag with the Cray compiler wrappers
The base compiler's target flag (e.g., NVIDIA's -target=gpu
) will not work with the Cray compiler wrappers.
Using compatible gcc
for CUDA compiler drivers with PrgEnv-gnu
¶
When using the PrgEnv-gnu
environment in conjunction with the cudatoolkit
module (i.e., if compiling any application for both host and device side), one must note that not every version of gcc
is compatible with every version of nvcc
.
Older versions of the cudatoolkit may not support the default GCC compiler (see document outlining supported host compilers for each nvcc installation). For older versions, one can use the cpe-cuda
module available on the system to automatically downgrade the gcc
version or manually load the version of GCC that is supported by the older cudatoolkit.
If using the cpe-cuda
module, it must be loaded after loading the PrgEnv-gnu:
module load PrgEnv-gnu
module load cudatoolkit
module load cpe-cuda
Compilers¶
Intel¶
The Intel compiler suite is available via the PrgEnv-intel
module, which will load the intel
module for Intel base compilers. The base compilers in this suite are:
- C:
icc
- C++:
icpc
- Fortran:
ifort
See the full documentation of the Intel compilers. Additionally, compiler documentation is provided through man
pages (e.g., man icpc
) and through the -help
flag to each compiler (e.g., ifort -help
).
OpenMP and OpenACC¶
To enable OpenMP, use the -qopenmp
flag.
The Intel compilers do not support OpenACC.
GNU¶
The GCC compiler suite is available via the PrgEnv-gnu
module, which will load the gcc
module for the GNU base compilers. The base compilers in this suite are:
- C:
gcc
- C++:
g++
- Fortran:
gfortran
See the full documentation of the GCC compilers. Additionally, compiler documentation is provided through man
pages (e.g., man g++
) and through the --help
flag to each compiler (e.g., gfortran --help
).
Backward Compatibility¶
For backward compatibility, the following tips may be helpful for compiling older codes (that worked on Cori) with the newer GCC compiler versions on Perlmutter:
- Fortran: Try
-fallow-argument-mismatch
first, followed by the more extensive flag-std=legacy
to reduce strictness. - C/C++: Look for flags that reduce strictness, such as
-fpermissive
. - C/C++:
-Wpedantic
can warn you about lines that break code standards.
OpenMP and OpenACC¶
To enable OpenMP for CPU code, use the -fopenmp
flag.
OpenMP/OpenACC offloading to GPUs not supported yet
Offloading to GPUs with OpenMP/OpenACC is not supported in the PrgEnv-gnu
environment on Perlmutter at the moment. The offloading-related information below is for future references only, and can be updated.
GCC has support for OpenMP and OpenACC offloading to GPUs. OpenMP offloading with gcc
looks something like:
gcc -fopenmp -foffload=nvptx-none="-Ofast -lm -misa=sm_80" base.c -c
where -misa=sm_80
is for the NVIDIA A100 GPU. The extra compile flags of -Ofast -lm
are passed for building a binary for the architecture.
Note that, if the Cray compiler wrapper, cc
, is used instead, use the -target-accel=nvidia80
flag instead.
cc -fopenmp -target-accel=nvidia80 base.c -c
OpenMP/OpenACC GPU offload support in GCC is limited
The GCC compiler's offload capabilities for GPU code generation may be limited, in terms of both functionality and performance. Users are advised to try different compilers for C/C++ codes, which also includes a Fortran compiler with OpenMP offload capability.
Mixture of C/C++/Fortran and CUDA codes¶
The programming environment supports a mixture of C/C++/Fortran and CUDA codes. CUDA and CPU codes should be in separate files, and Cray compiler wrapper commands must be used at link time:
CC -c main.cxx
nvcc -c cuda_code.cu
CC -o main.ex main.o cuda_code.o
Compatibility between nvcc
host compiler and gcc
compiler
To make the above work, the GCC version needs to be 9.x due to compatibility issues between the compilers.
Cray¶
The HPE Cray compiler suite is available via the PrgEnv-cray
module, which will load the cce
module for the Cray base compilers. The base compilers in this suite are:
- C:
cc
- C++:
CC
- Fortran:
ftn
Full documentation of the Cray compilers is provided in the HPE Cray Clang C and C++ Quick Reference for the C/C++ compilers, and the HPE Cray Fortran Reference Manual for the Fortran compiler. Additionally, compiler documentation is provided through man
pages (e.g., man clang
or man crayftn
) or the help page (cc -help
, etc.) and users may wish to read the online Cray Compiler Environment documentation.
Cray base compilers and Cray compiler wrappers are not the same
It is easy to confuse the Cray base compilers and the compiler wrappers that wrap all compilers, since their names are identical. The underlying compiler that is currently loaded is based on the programming environment that has been loaded; for example, if PrgEnv-gnu has been loaded, then invoking cc
ultimately invokes gcc
, not the Cray C compiler.
Major changes to Cray compilers starting in version 9.0
Version 8.7.9 of the Cray compiler (CCE) is the last version based on the old compiler environment and default settings. Starting in version 9.0, Cray made major changes to the C/C++ compilers, and smaller changes to the Fortran compiler. In particular:
- The C/C++ compilers have been replaced with LLVM and clang, with some additional Cray enhancements. This means that nearly all of the compiler flags have changed, and some capabilities available in CCE 8 and previous versions are no longer available in CCE 9. It may also result in performance differences in code generated using CCE 8 vs CCE 9, due to the two versions using different optimizers.
- OpenMP has been disabled by default in the C, C++, and Fortran compilers. This behavior is more consistent with other compilers. To enable OpenMP, one can use the following flags:
- C/C++:
-fopenmp
- Fortran:
-h omp
- C/C++:
Cray provides a migration guide for users switching from CCE 8 to CCE 9.
For users who are unable to migrate their workflows to the clang/LLVM-based CCE 9 C/C++ compilers, Cray has simultaneously released a CCE 9 "classic" version, which continues to use the same compiler technology in CCE 8 and older versions. This version of CCE is available as the module cce/<version>-classic
. However, users should be aware that "classic" CCE is now considered "legacy," and that all future versions of CCE are based on clang/LLVM. See the the Cray Classic C and C++ Reference Manual.
OpenMP and OpenACC¶
To enable OpenMP for CPU code, use the -fopenmp
flag.
OpenMP/OpenACC offloading to GPUs not supported yet
Offloading to GPUs with OpenMP/OpenACC is not supported in the PrgEnv-cray
environment on Perlmutter at the moment. The offloading-related information below is for future references only, and can be updated.
The Cray compilers have a mature OpenMP offloading implementation.
Compiling codes using OpenMP offload capabilities on Perlmutter requires different flags for C and C++ codes than for Fortran codes. The C and C++ compilers are based on clang, and thus use similar flags that one would use for clang to generate OpenMP offload code:
cc -fopenmp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.c
CC -fopenmp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.cpp
For Fortran codes, the flag is different, and the environment variable CRAY_ACCEL_TARGET
must be set to nvidia80
at compile time, or use the `-target-accel=nvidia80 compiler flag. Then, build as follows:
ftn -h omp -target-accel=nvidia80 -o my_openmp_code.ex my_openmp_code.f90
Only the Fortran compiler supports OpenACC.
The compiler flag for enabling OpenACC in Fortran codes is -h acc
. To offload to GPUs, use the -target-accel=nvidia80
compiler flag, or set the CRAY_ACCEL_TARGET
environment variable to nvidia80
.
Explicitly set the target to host CPUs when compiling OpenMP/OpenACC code for the host on Perlmutter
Due to an issue with the PrgEnv-cray
compiler wrappers, you must add -target-accel=host
compiler option or load the craype-accel-host
module in order to successfully compile any OpenMP/OpenACC code for the host.
ftn -h omp -target-accel=host -o my_openmp_code.ex my_openmp_code.f90
Mixture of C/C++/Fortran and CUDA codes¶
The programming environment allows a mixture of C/C++/Fortran and CUDA codes. In this case CUDA and CPU codes should be in separate files. Cray compiler wrapper commands must be used at link time, and CUDA runtime must be included:
CC -c main.cxx
nvcc -c cuda_code.cu
CC -o main.ex main.o cuda_code.o -lcudart
NVIDIA¶
The NVIDIA compiler suite is available via the PrgEnv-nvidia
module, which will load the nvidia
module for the NVIDIA base compilers. The base compilers in this suite are:
- CUDA compiler drivers
- CUDA C/C++:
nvcc
- CUDA Fortran:
nvfortran
- CUDA C/C++:
- HPC compilers: for host multithreading and GPU offloading with OpenMP, OpenACC, C++17 Parallel Algorithms and Fortran's
DO-CONCURRENT
; part of the NVIDIA HPC SDK:- C:
nvc
- C++:
nvc++
- Fortran:
nvfortran
- C:
The CUDA compiler drivers are used to compile CUDA codes. Below is to compile a hello-world CUDA code, helloworld.cu
, to generate an executable helloworld
:
$ cat helloworld.cu
#include <stdio.h>
__global__ void helloworld() {
printf("Hello, World!\n");
}
int main() {
helloworld<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -o helloworld helloworld.cu
Note
If you see a warning message about executable stacks like below:
/usr/bin/ld: warning: /tmp/pgcudafatvEZ0-TH7-jvs.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
-Wl,-znoexecstack
or -Wl,--no-warn-execstack
flag to the link command. OpenMP, OpenACC and CUDA¶
If OpenMP and CUDA code coexist in the same program, the OpenMP runtime and the CUDA runtime use the same CUDA context on each GPU. To enable this coexistence, use the compilation and linking option -cuda
, as shown below.
$ cat cuda_interop.cpp # offload code calling a function in a CUDA code
...
#pragma omp target data map(from:array2D[0:M][0:N])
{
...
#pragma omp target data use_device_ptr(p)
{
add_i_slice(p, i, N);
}
...
}
...
$ cat interop_kernel.cu # CUDA code where the called function is defined
...
__global__ void add_kernel(int *slice, int t, int n)
{
...
}
void add_i_slice(int *slice, int i, int n)
{
add_kernel<<<n/128, 128>>>(slice, i, n);
}
...
$ nvc++ -Minfo -mp -target=gpu -c cuda_interop.cpp
$ nvcc -c interop_kernel.cu
$ nvc++ -mp -target=gpu -cuda interop_kernel.o cuda_interop.o
where -mp
is to enable OpenMP and -target=gpu
is to offload the OpenMP construct to GPUs.
Note that, in the above non-MPI code example, the HPC compiler nvc++
is used, but the Cray compiler wrapper, CC
, can be used instead. In that case, drop the -target=gpu
flag from the CC
commands as the offload target is correctly set by the craype-accel-nvidia80
module. MPI codes must be compiled with the Cray compiler wrapper if Cray MPI is to be used.
The HPC compilers support OpenMP and OpenACC offloading. Invoking OpenACC in the HPC compilers, for example, looks like:
nvfortran -acc=gpu -Minfo=acc -o main.ex main.f90
or
nvfortran -acc -target=gpu -Minfo=acc -o main.ex main.f90
where the flag -acc
is to enable OpenACC for GPU execution only, and -Minfo=acc
prints diagnostic information to STDERR regarding whether the compiler was able to produce GPU code successfully.
Note that, when the HPE Cray compiler wrappers are used, replace the -target=gpu
flag with -target-accel=nvidia80
.
C++17 introduced parallel STL algorithms ("pSTL"), such that standard C++ code can express parallelism when using many of the STL algorithms. The NVIDIA HPC compilers supports GPU-accelerated pSTL algorithms, which can be activated by invoking nvc++
with the flag -stdpar=gpu
. See the documentation regarding pSTL for the HPC SDK.
GPU acceleration of Fortran's DO CONCURRENT
is enabled also with the -stdpar
option. If the flag is specified, the compiler does the parallelization of the DO CONCURRENT
loops and offloads them to the GPU. All data movement between host memory and GPU device memory is performed implicitly and automatically under the control of CUDA Unified Memory. It is also possible to target a multi-core CPU with -stdpar=multicore
. For more info, check the NVIDIA blog, Fortran Standard Parallelism.
The NVIDIA HPC SDK provides cuTENSOR extensions so that some Fortran intrinsic math functions can be accelerated on GPUs. Accelerated functions include MATMUL
, TRANSPOSE
, and several others. The nvfortran
compile provides access to these GPU-accelerated functions via the module cutensorEx
. See the documentation about the cutensorEx
module in nvfortran
.
CUDA Math libraries (cuBLAS, cuFFT, cuFFTW, cuSOLVER, etc.) can be linked easily by specifying the name of the library with the -cudalib
flag:
nvfortran -Minfo -mp -target=gpu -cudalib=cublas mp_cublas.f90
Note again that, when the HPE Cray compiler wrapper ftn
is used, replace the -target=gpu
flag with -target-accel=nvidia80
.
Full documentation of the NVIDIA compilers can be found in the NVIDIA HPC Compilers, User's Guide and the CUDA C++ Programming Guide.
Please check the NVIDIA HPC SDK - OpenMP Target Offload Training, December 2020 for useful information on the HPC compilers.
AOCC¶
The AOCC (AMD Optimizing C/C++ Compiler) compiler suite is based on LLVM and includes many optimizations for the AMD processors. It supports Flang as the Fortran front-end compiler. The AOCC suite is available via the PrgEnv-aocc
module, which will load the aocc
module for the AOCC base compilers. The base compilers in this suite are:
- C:
clang
- C++:
clang++
- Fortran:
flang
Full documentation of the AOCC compilers is provided at AOCC webpage, where you can find user manuals and a quick reference guide: AOCC User Guide, Clang – the C, C++ Compiler, Flang – the Fortran Compiler and Compiler Options Reference Guide for AMD EPYC 7xx3 Series Processors.
OpenMP and OpenACC¶
The compilers can generate the OpenMP parallel code for the host CPU only, and do not support offloading to NVIDIA GPUs. To enable OpenMP, add the compiler flag -fopenmp
for C and C++ and -mp
for Fortran:
clang -fopenmp -o my_openmp_code.ex my_openmp_code.c
clang++ -fopenmp -o my_openmp_code.ex my_openmp_code.cpp
flang -mp -o my_openmp_code.ex my_openmp_code.f90
When using the HPE Cray compiler wrappers, add the target flag -target-accel=nvidia80
for offloading to GPUs.
OpenACC is not supported.
Mixture of C/C++/Fortran and CUDA codes¶
The programming environment allows a mixture of C/C++/Fortran and CUDA codes. In this case CUDA and CPU codes should be in separate files. Cray compiler wrapper commands must be used at link time, and CUDA runtime must be included:
CC -c main.cxx
nvcc -c cuda_code.cu
CC -o main.ex main.o cuda_code.o -lcudart
LLVM¶
The LLVM core libraries along with the compilers are locally built by NERSC, not HPE Cray. It is compiled against the GCC compiler suite and thus cannot be used with the Intel or HPE Cray programming environments.
The LLVM/clang compiler is a valid CUDA compiler. One can replace NVIDIA's nvcc
command with clang --cuda-gpu-arch=<arch>
, where <arch>
is sm_80
. If using clang as a CUDA compiler, one usually will also need to add the -I/path/to/cuda/include
and -L/path/to/cuda/lib64
flags manually, since nvcc
includes them implicitly.
For documentation of the LLVM compilers, see LLVM, Clang, and Flang websites. Additionally, compiler documentation is provided through man pages (e.g., man clang
) and through the -help
flag to each compiler (e.g., clang -help
).
Common compiler options¶
Below is a table documenting common flags for each of the compilers.
Intel | GNU | Cray | NVIDIA | AOCC | LLVM | comment | |
---|---|---|---|---|---|---|---|
Overall optimization | -O<n> , -Ofast | -O<n> , -Ofast | -O<n> , -Ofast | -O<n> | -O<n> , -Ofast | Replace <n> with 1 , 2 , 3 , etc. | |
Enable OpenMP | -qopenmp | -fopenmp | -fopenmp for C/C++ with CCE 9.0 or later; -h omp , otherwise | -mp[=multicore*|[no]align] *: default | C/C++: -fopenmp ; Fortran: -mp | -fopenmp | OpenMP enabled by default in Cray. |
Enable OpenMP-offload | N/A | N/A | -fopenmp for C/C++ with CCE 9.0 or later; -h omp , otherwise | -mp=gpu | N/A | -fopenmp -fopenmp-targets=nvpts64 | OpenACC not supported by clang/clang++. |
Enable OpenACC | N/A | -fopenacc | Fortran: -h acc | -acc | N/A | N/A | OpenACC not supported by clang/clang++. |
Free-form Fortran | -free | -ffree-form | -f free | -Mfree | -Mfreeform | Also determined by file suffix (.f , .F , .f90 , etc.) | |
Fixed-form Fortran | -fixed | -ffixed-form | -f fixed | -Mfixed | -Mfixed | Also determined by file suffix (.f , .F , .f90 , etc.) | |
Debug symbols | -g | -g | N/A | HPC compilers: -g , -gopt ; CUDA: -g (or --debug ) for host code and -G (or --device-debug ) for device code | -g | -g | Debug symbols enabled by default in Cray. |