Skip to content

C++

C++ is a high-performance programming language widely used in scientific computing, offering both low-level control and high-level abstractions.

Compiler Support

Multiple C++ compilers are available on NERSC systems.

Vendor PrgEnv Base Compiler Wrapper
GNU PrgEnv-gnu g++ CC
NVIDIA PrgEnv-nvidia nvc++ CC
Cray PrgEnv-cray CC CC
LLVM PrgEnv-llvm clang++ mpic++
Intel PrgEnv-intel icpx CC
AOCC PrgEnv-aocc clang++ CC

For detailed information about each compiler, see the base compilers and compiler wrappers documentation.

GPU Programming with C++

Several programming models are available for GPU programming in C++ on Perlmutter.

CUDA

CUDA is NVIDIA's native GPU programming model. It provides maximum performance on NVIDIA hardware and is the basis for many other GPU programming frameworks.

  • Best for: Codes targeting primarily NVIDIA systems, or requiring maximum GPU performance
  • Compile with: nvcc or nvc++ -cuda
  • Documentation: CUDA C++ Programming Guide

OpenMP Offload

OpenMP provides directive-based GPU offloading that can be added incrementally to existing code.

Kokkos

Kokkos is a C++ performance portability framework that provides abstractions for parallel execution and data management across different architectures.

  • Best for: Codes requiring portability across NVIDIA, AMD, and Intel GPUs
  • Backends on Perlmutter: CUDA, OpenACC
  • Architecture flags: Kokkos_ARCH_AMPERE80=ON, Kokkos_ARCH_ZEN3=ON
  • Documentation: Kokkos Documentation
  • Tutorials: Kokkos Lecture Series

SYCL

SYCL is a cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard C++, with host and kernel code in the same source file.

module load intel/2024.1.0
icpx -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
    -Xsycl-target-backend '--cuda-gpu-arch=sm_80' -o program.x program.cpp

HPX

HPX is a C++ Standard Library for Concurrency and Parallelism. It implements C++11/14/17/20/23 standard concurrency facilities and extends them to distributed systems.

  • Best for: Codes that benefit from asynchronous task-based parallelism
  • Key features: Futures, async execution, distributed computing
  • Build options: CUDA support (HPX_WITH_CUDA), MPI parcelport (HPX_WITH_PARCELPORT_MPI)
  • Documentation: HPX Documentation
  • Building: Building HPX
  • Projects: Creating HPX Projects

HIP

HIP is AMD's GPU portability layer that closely mimics CUDA's programming model. Code written in HIP can be compiled for both NVIDIA and AMD GPUs.

  • Best for: Codes targeting both NVIDIA and AMD systems.
  • Documentation: HIP Documentation

Standard C++ Parallelism (stdpar)

C++17 introduced parallel execution policies (std::execution::par, std::execution::par_unseq) for standard library algorithms. NVIDIA's nvc++ compiler can offload these parallel algorithms to GPUs.

#include <algorithm>
#include <execution>

std::transform(std::execution::par, vec.begin(), vec.end(),
    vec.begin(), [](double x){ return std::tan(x); });
  • Best for: Simple parallelism in standard algorithms without explicit GPU programming
  • GPU offload: nvc++ -stdpar=gpu
  • CPU multicore: nvc++ -stdpar=multicore or use GCC/Clang with TBB
  • Documentation: NVIDIA C++ Parallel Algorithms
  • Benchmark: parSTL GitHub

Parallel (Distributed)

MPI

MPI is a C library. The official C++ bindings were removed in MPI 3.0 (2012), but several third-party libraries provide modern C++ interfaces:

  • mpl - Header-only C++17 library for idiomatic MPI usage
  • Boost.MPI - Near one-to-one mapping of MPI-1 with serialization support
  • B-MPI3 - MPI-3 wrapper emphasizing const-correctness and iterator-based ranges
  • RWTH-MPI - MPI 4.0 bindings with STL container support and automatic type inference
  • KaMPIng - Modern C++ bindings with move semantics and memory-safe non-blocking communication
  • EMPI - Enhanced MPI with RAII support built on Open MPI
  • Kokkos Comm - MPI support for Kokkos Views
  • MPI Advance - Lightweight libraries complementing system MPI with newest standard features

For a detailed discussion of design considerations for modern C++ MPI interfaces, see Concepts for Designing Modern C++ Interfaces for MPI.

UPC++

UPC++ is a C++ library for Partitioned Global Address Space (PGAS) programming. It provides low-overhead, fine-grained communication including Remote Memory Access (RMA) and Remote Procedure Call (RPC), and interoperates with MPI, OpenMP, and CUDA.

module load contrib upcxx

For GPU memory support, use the upcxx-cuda variant. See the UPC++ documentation and Perlmutter-specific guidance.

See Also