CUDA¶
Warning
This page is currently under active development. Check back soon for more content.
CUDA is a general purpose parallel computing platform and programming model that leverages the parallel compute engine in NVIDIA GPUs to solve many complex computational problems in a more efficient way than on a CPU.
For full documentation:
CUDA C¶
A vector addition example written in CUDA C is provided in this NVIDIA blog and can be compiled with the nvcc
compiler provided in the PrgEnv-nvidia
environment on Perlmutter.
nvcc -o saxpy.ex saxpy.cu
CUDA Fortran¶
A vector addition example written in CUDA Fortran is provided in this NVIDIA blog and can be compiled with the nvfortran
compiler provided in the PrgEnv-nvidia
environment on Perlmutter.
nvfortran -o saxpy.ex saxpy.cuf
Using CUDA on Perlmutter¶
On Perlmutter CUDA is available via the cudatoolkit
modules. The toolkit modules contain GPU-accelerated libraries, profiling tools (nsight compute & systems), debugger tools (cuda-gdb & cuda-memcheck) a runtime library and nvcc
CUDA compiler.
NVIDIA maintains extensive documentation for CUDA toolkits.
For info on CUDA for Perlmutter, please see the Native CUDA C/C++ and Memory Management, and other sections in the Perlmutter Readiness page.
PrgEnv-nvidia¶
The host compilers nvc
/ nvc++
(accessible through the cc
/ CC
wrapper) in NVIDIA SDK has CUDA opt-in support. To compile a single source C / C++ code (host & device code in the same source file) with the Cray wrappers you must add the -cuda
flag to their compilation step which notifies the nvc
/ nvc++
compiler to accept CUDA runtime APIs. Omitting the -cuda
flag will result in your application compiling without any of the CUDA API calls, and will generate an executable with undefined behavior.
PrgEnv-gnu¶
When using the PrgEnv-gnu
environment in conjunction with the cudatoolkit
module (i.e., if compiling any application for both host and device side), you must note that not every version of gcc
is compatible with every version of nvcc
- supported host compilers for each nvcc installation.
Versions¶
NERSC generally aims to make the latest versions of cudatoolkit
available. In some cases a specific version other than what is installed is needed.
In this situation one should first check if the version needed is compatible. Generally CUDA is forward compatible. For example code written for 11.3 should work with 11.7.
See the CUDA Compatibility Document which describes the details.
If this is not an option next one should consider using containers through Shifter for the specific desired CUDA version.
Tutorials¶
-
CUDA Training Series, 2020:
- Part 1: Introduction to CUDA C++, January 15, 2020
- Part 2: CUDA Shared Memory, February 19, 2020
- Part 3: Fundamental CUDA Optimization (Part 1), March 18, 2020
- Part 4: Fundamental CUDA Optimization (Part 2), April 16, 2020
- Part 5: CUDA Atomics, Reductions, and Warp Shuffle, May 13, 2020
- Part 6: Managed Memory, June 18, 2020
- Part 7: CUDA Concurrency, July 21, 2020
- Part 8: GPU Performance Analysis, August 18, 2020
- Part 9: Cooperative Groups, September 17, 2020
- Part 10: CUDA Multithreading with Streams, July 16, 2021
- Part 11: CUDA Muti Process Service, August 17, 2021
- Part 12: CUDA Debugging, September 14, 2021
- Part 13: CUDA Graphs, October 13, 2021
- An Easy Introduction to CUDA C and C++
- An Even Easier Introduction to CUDA