C++¶
C++ is a high-performance programming language widely used in scientific computing, offering both low-level control and high-level abstractions.
Compiler Support¶
Multiple C++ compilers are available on NERSC systems.
| Vendor | PrgEnv | Base Compiler | Wrapper |
|---|---|---|---|
| GNU | PrgEnv-gnu | g++ | CC |
| NVIDIA | PrgEnv-nvidia | nvc++ | CC |
| Cray | PrgEnv-cray | CC | CC |
| LLVM | PrgEnv-llvm | clang++ | mpic++ |
| Intel | PrgEnv-intel | icpx | CC |
| AOCC | PrgEnv-aocc | clang++ | CC |
For detailed information about each compiler, see the base compilers and compiler wrappers documentation.
GPU Programming with C++¶
Several programming models are available for GPU programming in C++ on Perlmutter.
CUDA¶
CUDA is NVIDIA's native GPU programming model. It provides maximum performance on NVIDIA hardware and is the basis for many other GPU programming frameworks.
- Best for: Codes targeting primarily NVIDIA systems, or requiring maximum GPU performance
- Compile with:
nvccornvc++ -cuda - Documentation: CUDA C++ Programming Guide
OpenMP Offload¶
OpenMP provides directive-based GPU offloading that can be added incrementally to existing code.
- Best for: Incremental porting of existing CPU code to GPUs
- See: Compiling OpenMP Code
- Documentation: OpenMP Specification
Kokkos¶
Kokkos is a C++ performance portability framework that provides abstractions for parallel execution and data management across different architectures.
- Best for: Codes requiring portability across NVIDIA, AMD, and Intel GPUs
- Backends on Perlmutter: CUDA, OpenACC
- Architecture flags:
Kokkos_ARCH_AMPERE80=ON,Kokkos_ARCH_ZEN3=ON - Documentation: Kokkos Documentation
- Tutorials: Kokkos Lecture Series
SYCL¶
SYCL is a cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard C++, with host and kernel code in the same source file.
module load intel/2024.1.0
icpx -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
-Xsycl-target-backend '--cuda-gpu-arch=sm_80' -o program.x program.cpp
- Best for: Portable GPU code using standard C++ without language extensions
- Compiler: Intel DPC++ (
icpx) with-fsyclflag - GPU targets: NVIDIA (
nvptx64-nvidia-cuda), AMD, Intel - Documentation: SYCL 2020 Specification
- Tutorials: DPC++ Tutorial, Codeplay SYCL Academy
- Book: Data Parallel C++ (free)
- Portal: sycl.tech
HPX¶
HPX is a C++ Standard Library for Concurrency and Parallelism. It implements C++11/14/17/20/23 standard concurrency facilities and extends them to distributed systems.
- Best for: Codes that benefit from asynchronous task-based parallelism
- Key features: Futures, async execution, distributed computing
- Build options: CUDA support (
HPX_WITH_CUDA), MPI parcelport (HPX_WITH_PARCELPORT_MPI) - Documentation: HPX Documentation
- Building: Building HPX
- Projects: Creating HPX Projects
HIP¶
HIP is AMD's GPU portability layer that closely mimics CUDA's programming model. Code written in HIP can be compiled for both NVIDIA and AMD GPUs.
- Best for: Codes targeting both NVIDIA and AMD systems.
- Documentation: HIP Documentation
Standard C++ Parallelism (stdpar)¶
C++17 introduced parallel execution policies (std::execution::par, std::execution::par_unseq) for standard library algorithms. NVIDIA's nvc++ compiler can offload these parallel algorithms to GPUs.
#include <algorithm>
#include <execution>
std::transform(std::execution::par, vec.begin(), vec.end(),
vec.begin(), [](double x){ return std::tan(x); });
- Best for: Simple parallelism in standard algorithms without explicit GPU programming
- GPU offload:
nvc++ -stdpar=gpu - CPU multicore:
nvc++ -stdpar=multicoreor use GCC/Clang with TBB - Documentation: NVIDIA C++ Parallel Algorithms
- Benchmark: parSTL GitHub
Parallel (Distributed)¶
MPI¶
MPI is a C library. The official C++ bindings were removed in MPI 3.0 (2012), but several third-party libraries provide modern C++ interfaces:
- mpl - Header-only C++17 library for idiomatic MPI usage
- Boost.MPI - Near one-to-one mapping of MPI-1 with serialization support
- B-MPI3 - MPI-3 wrapper emphasizing const-correctness and iterator-based ranges
- RWTH-MPI - MPI 4.0 bindings with STL container support and automatic type inference
- KaMPIng - Modern C++ bindings with move semantics and memory-safe non-blocking communication
- EMPI - Enhanced MPI with RAII support built on Open MPI
- Kokkos Comm - MPI support for Kokkos Views
- MPI Advance - Lightweight libraries complementing system MPI with newest standard features
For a detailed discussion of design considerations for modern C++ MPI interfaces, see Concepts for Designing Modern C++ Interfaces for MPI.
UPC++¶
UPC++ is a C++ library for Partitioned Global Address Space (PGAS) programming. It provides low-overhead, fine-grained communication including Remote Memory Access (RMA) and Remote Procedure Call (RPC), and interoperates with MPI, OpenMP, and CUDA.
module load contrib upcxx
For GPU memory support, use the upcxx-cuda variant. See the UPC++ documentation and Perlmutter-specific guidance.
See Also¶
- cppreference.com - Comprehensive C++ language and library reference
- C++ Core Guidelines
- Compiler Wrappers
- Perlmutter Architecture