Portability Across DOE Office of Science HPC Facilities¶

As the HPC community prepares for exascale and the semiconductor industry approaches the end of Moore's Law in terms of transistor size, we have entered a period of time of increased diversity in computer architecture for High Performance Computing (HPC) with relatively new designs joining mature processor and memory technologies. These technologies include GPUs, Many-Core Processors, ARM, FPGAs, and ASICs, as well as new memory technologies in the form of High-Bandwidth Memory (HBM) often incorporated on the processor die as well as Non-Volatile memory (NVRAM) and Solid-State Disk (SSD) technology for accelerated I/O.

The DOE Office of Science operates three world-leading HPC facilities located at the Argonne Leadership Computing Facility (ALCF), the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Lab, and the Oak Ridge Leadership Computing Center (OLCF). These facilities field three of the most powerful supercomputers in world. These machines are used by scientists throughout the DOE Office of Science and the world for solving a number of important problems in domains including materials science and chemistry to nuclear, particle, and astrophysics.

These facilities, with their latest systems, have begun the transition for DOE users to energy-efficient HPC architectures. The facilities are currently fielding systems with two-distinct "pre-exascale" and "exascale" architectures that we discuss in detail here.

System	Frontier
Location	OLCF
Architecture	CPU + GPU
CPU	AMD EPYC 3^rd Gen
GPU	AMD MI 250X
Scale	9408 Compute Nodes
Notes	User Guide

System	Perlmutter
Location	NERSC
Architecture	CPU + GPU
CPU	AMD Milan
GPU	NVIDIA A100
Scale	4608 Nodes
Notes	Introduction to Perlmutter

System	Aurora
Location	ALCF
Architecture	CPU + GPU
CPU	Intel Xeon CPU Max Series
GPU	Intel Data Center GPU Max Series
Scale	3624 Nodes
Notes	Aurora webpage

The two processor architectures deployed on these systems are the CPU+GPU hybrid architecture. These two architectures, while seemingly quite different at first, have a number of similarities that we believe represent general trends in exascale-like architectures:

Increase parallelism (Cores, Threads, Warps/SMs/Blocks)
Vectorization (AVX512 8 Wide Vector Units, 32 Wide Warps)
Small Amount High-bandwidth Coupled with Large Amounts of Traditional DDR

While the details of the architectures are distinct, and vendor specific programming libraries/languages (CUDA, AVX512 Intrinsics etc.) exist to address specific architecture features; the commonalities are significant enough that a number of portable programming approaches are emerging for writing code that supports both architectures.

This website is intended to be a living/growing documentation hub and guide for applications teams targeting systems at multiple DOE Office of Science facilities. In these pages, we discuss the differences between the systems, the software environment, and the job-submission process. We discuss how to define and measure performance portability and we provide recommendations based on case studies for the most promising performance-portable programming approaches. A summary of the high level findings and recommendations will be maintained.