C++¶

C++ is a high-performance programming language widely used in scientific computing, offering both low-level control and high-level abstractions.

Compiler Support¶

Multiple C++ compilers are available on NERSC systems.

Vendor	PrgEnv	Base Compiler	Wrapper
GNU	`PrgEnv-gnu`	`g++`	`CC`
NVIDIA	`PrgEnv-nvidia`	`nvc++`	`CC`
Cray	`PrgEnv-cray`	`CC`	`CC`
LLVM	`PrgEnv-llvm`	`clang++`	`mpic++`
Intel	`PrgEnv-intel`	`icpx`	`CC`
AOCC	`PrgEnv-aocc`	`clang++`	`CC`

For detailed information about each compiler, see the base compilers and compiler wrappers documentation.

GPU Programming with C++¶

Several programming models are available for GPU programming in C++ on Perlmutter.

CUDA¶

CUDA is NVIDIA's native GPU programming model. It provides maximum performance on NVIDIA hardware and is the basis for many other GPU programming frameworks.

Best for: Codes targeting primarily NVIDIA systems, or requiring maximum GPU performance
Compile with: nvcc or nvc++ -cuda
Documentation: CUDA C++ Programming Guide

OpenMP Offload¶

OpenMP provides directive-based GPU offloading that can be added incrementally to existing code.

Best for: Incremental porting of existing CPU code to GPUs
See: Compiling OpenMP Code
Documentation: OpenMP Specification

Kokkos¶

Kokkos is a C++ performance portability framework that provides abstractions for parallel execution and data management across different architectures.

Best for: Codes requiring portability across NVIDIA, AMD, and Intel GPUs
Backends on Perlmutter: CUDA, OpenACC
Architecture flags: Kokkos_ARCH_AMPERE80=ON, Kokkos_ARCH_ZEN3=ON
Documentation: Kokkos Documentation
Tutorials: Kokkos Lecture Series

SYCL¶

SYCL is a cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard C++, with host and kernel code in the same source file.

module load intel/2024.1.0
icpx -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
    -Xsycl-target-backend '--cuda-gpu-arch=sm_80' -o program.x program.cpp

Best for: Portable GPU code using standard C++ without language extensions
Compiler: Intel DPC++ (icpx) with -fsycl flag
GPU targets: NVIDIA (nvptx64-nvidia-cuda), AMD, Intel
Documentation: SYCL 2020 Specification
Tutorials: DPC++ Tutorial, Codeplay SYCL Academy
Book: Data Parallel C++ (free)
Portal: sycl.tech

HPX¶

HPX is a C++ Standard Library for Concurrency and Parallelism. It implements C++11/14/17/20/23 standard concurrency facilities and extends them to distributed systems.

Best for: Codes that benefit from asynchronous task-based parallelism
Key features: Futures, async execution, distributed computing
Build options: CUDA support (HPX_WITH_CUDA), MPI parcelport (HPX_WITH_PARCELPORT_MPI)
Documentation: HPX Documentation
Building: Building HPX
Projects: Creating HPX Projects

HIP¶

HIP is AMD's GPU portability layer that closely mimics CUDA's programming model. Code written in HIP can be compiled for both NVIDIA and AMD GPUs.

Best for: Codes targeting both NVIDIA and AMD systems.
Documentation: HIP Documentation

Standard C++ Parallelism (stdpar)¶

C++17 introduced parallel execution policies (std::execution::par, std::execution::par_unseq) for standard library algorithms. NVIDIA's nvc++ compiler can offload these parallel algorithms to GPUs.

#include <algorithm>
#include <execution>

std::transform(std::execution::par, vec.begin(), vec.end(),
    vec.begin(), [](double x){ return std::tan(x); });

Best for: Simple parallelism in standard algorithms without explicit GPU programming
GPU offload: nvc++ -stdpar=gpu
CPU multicore: nvc++ -stdpar=multicore or use GCC/Clang with TBB
Documentation: NVIDIA C++ Parallel Algorithms
Benchmark: parSTL GitHub

Parallel (Distributed)¶

MPI¶

MPI is a C library. The official C++ bindings were removed in MPI 3.0 (2012), but several third-party libraries provide modern C++ interfaces:

mpl - Header-only C++17 library for idiomatic MPI usage
Boost.MPI - Near one-to-one mapping of MPI-1 with serialization support
B-MPI3 - MPI-3 wrapper emphasizing const-correctness and iterator-based ranges
RWTH-MPI - MPI 4.0 bindings with STL container support and automatic type inference
KaMPIng - Modern C++ bindings with move semantics and memory-safe non-blocking communication
EMPI - Enhanced MPI with RAII support built on Open MPI
Kokkos Comm - MPI support for Kokkos Views
MPI Advance - Lightweight libraries complementing system MPI with newest standard features

For a detailed discussion of design considerations for modern C++ MPI interfaces, see Concepts for Designing Modern C++ Interfaces for MPI.

UPC++¶

UPC++ is a C++ library for Partitioned Global Address Space (PGAS) programming. It provides low-overhead, fine-grained communication including Remote Memory Access (RMA) and Remote Procedure Call (RPC), and interoperates with MPI, OpenMP, and CUDA.

module load contrib upcxx

For GPU memory support, use the upcxx-cuda variant. See the UPC++ documentation and Perlmutter-specific guidance.