Performance of C++ Parallel Programming Models on Perlmutter using Lulesh¶
In this study, we evaluate Lulesh performance with different C++ parallel programming models on Perlmutter, including OpenMP, HPX, Kokkos, and NVC++ stdpar. We also use different compilers, such as gcc@11.2.0, clang@16.0.0, and nvhpc@22.9, to compile the applications.
Lulesh is a widely used benchmark application that assesses the efficiency of parallel computing architectures in solving partial differential equations related to solid mechanics. For further details about Lulesh, please refer to
If you are interested in any C++ parallel algorithm or require a performance report, please feel free to contact us via
Performance results¶
CPU-based Performance¶
Lulesh benchmark with problem Size 30
Lulesh benchmark with problem Size 60
Lulesh benchmark with problem Size 90
GPU-based Performance¶
Lulesh benchmark with nvhpc gpu (There is no control over the number of threads for NVC++ -stdpar=gpu version.)
Source code used in this study¶
This study utilizes the following open-source repositories, each of which is accompanied by build instructions provided within their repo.
Lulesh OpenMP version¶
Lulesh HPX version¶
Lulesh Kokkos version¶
Lulesh NVC++ version¶
- To obtain correct computation results for NVC++ version, the following changes are needed to the original source code:
- To enable multi-threaded execution for NVC++ version, the extra C++ flag
is needed, for example:--gcc-toolchain=/opt/cray/pe/gcc/11.2.0/bin/gcc
. The NVC++ -stdpar=gpu version does not provide control over the number of threads.
- To enable multi-threaded execution for NVC++ version, the extra C++ flag
Example Run Scripts¶
#SBATCH -C gpu
#SBATCH -t 10:00:00
#SBATCH -q regular
#SBATCH --ntasks-per-node=1
#SBATCH -o lulesh.out
#SBATCH -e lulesh.err
for SIZE in 30 60 90
for NUM_THREADS in 1 2 4 8 16 32 64 128
echo "running ref_gcc_openmp with $SIZE workload and $NUM_THREADS" threads
OMP_NUM_THREADS=$NUM_THREADS OMP_PROC_BIND=spread OMP_PLACES=threads ./ref_gcc_openmp -s $SIZE
echo "running ref_clang_openmp with $SIZE workload and $NUM_THREADS" threads
OMP_NUM_THREADS=$NUM_THREADS OMP_PROC_BIND=spread OMP_PLACES=threads ./ref_clang_openmp -s $SIZE
echo "running hpx_gcc with $SIZE workload and $NUM_THREADS" threads
./hpx_gcc -s $SIZE --hpx:threads=$NUM_THREADS
echo "running hpx_clang with $SIZE workload and $NUM_THREADS" threads
./hpx_clang -s $SIZE --hpx:threads=$NUM_THREADS
echo "running kokkos_gcc_openmp with $SIZE workload and $NUM_THREADS" threads
OMP_NUM_THREADS=$NUM_THREADS OMP_PROC_BIND=spread OMP_PLACES=threads ./kokkos_gcc_openmp -s $SIZE
echo "running kokkos_clang_openmp with $SIZE workload and $NUM_THREADS" threads
OMP_NUM_THREADS=$NUM_THREADS OMP_PROC_BIND=spread OMP_PLACES=threads ./kokkos_clang_openmp -s $SIZE
echo "running lulesh nvc++ multicore with $NUM_THREADS threads and workload $SIZE"
OMP_NUM_THREADS=$NUM_THREADS OMP_PROC_BIND=spread OMP_PLACES=threads ./multicoreLulesh2.0 -s $SIZE
echo ""
echo "running lulesh nvc++ gpu with workload $SIZE"
echo ""
echo "finished running $SIZE workload size"