This page is currently under active development. Check back soon for more content.
CUDA is a general purpose parallel computing platform and programming model that leverages the parallel compute engine in NVIDIA GPUs to solve many complex computational problems in a more efficient way than on a CPU.
For full documentation:
A vector addition example written in CUDA C is provided in this NVIDIA blog and can be compiled with the
nvcc compiler provided in the
PrgEnv-nvidia environment on Perlmutter.
nvcc -o saxpy.ex saxpy.cu
A vector addition example written in CUDA Fortran is provided in this NVIDIA blog and can be compiled with the
nvfortran compiler provided by any
nvhpc module on Cori GPU:
nvfortran -o saxpy.ex saxpy.cuf
Preparing for Perlmutter¶
For info on CUDA for Perlmutter, please see the Native CUDA C/C++ and Memory Management, and other sections in the Perlmutter Readiness page.
CUDA Training Series, 2020:
- Part 1: Introduction to CUDA C++, January 15, 2020
- Part 2: CUDA Shared Memory, February 19, 2020
- Part 3: Fundamental CUDA Optimization (Part 1), March 18, 2020
- Part 4: Fundamental CUDA Optimization (Part 2), April 16, 2020
- Part 5: CUDA Atomics, Reductions, and Warp Shuffle, May 13, 2020
- Part 6: Managed Memory, June 18, 2020
- Part 7: CUDA Concurrency, July 21, 2020
- Part 8: GPU Performance Analysis, August 18, 2020
- Part 9: Cooperative Groups, September 17, 2020
- Part 10: CUDA Multithreading with Streams, July 16, 2021
- Part 11: CUDA Muti Process Service, August 17, 2021
- Part 12: CUDA Debugging, September 14, 2021
- Part 13: CUDA Graphs, October 13, 2021
- An Easy Introduction to CUDA C and C++
- An Even Easier Introduction to CUDA