Selected Vendor Bug Reports¶

Updated on May 21, 2025.

Active Bugs¶

Bug in `MPICH_SMP_SINGLE_COPY_MODE=XPMEM` slows down application codes¶

Vendor: HPE
Description: With the MPICH_SMP_SINGLE_COPY_MODE environment variable set to XPMEM, the walltime per model time step increases over time for some apps when it is expected to remain generally constant. The variable is to select the on-node implementation for large MPI messages, and the setting chooses the single-copy-based implementation via XPMEM, which is the default choice.
Status: In progress
Workaround: NERSC temporarily sets MPICH_SMP_SINGLE_COPY_MODE to CMA

Some apps crash nodes with `MPICH_SMP_SINGLE_COPY_MODE=XPMEM`¶

Vendor: HPE
Description: With the MPICH_SMP_SINGLE_COPY_MODE environment variable set to XPMEM, some apps crash compute nodes. The variable is to select the on-node implementation for large MPI messages, and the setting chooses the single-copy-based implementation via XPMEM, which is the default choice.
Status: In progress
Workaround: NERSC temporarily sets MPICH_SMP_SINGLE_COPY_MODE to CMA to prevent node failures

Floating-point exception with Fortran code using netCDF when built with `-ffpe-trap=invalid`¶

Vendor: HPE

Description: A code runs into a floating-point exception when built with -ffpe-trap=invalid (in case of using GNU compilers).

$ cat nf90_open.f90
program sr
use netcdf
implicit none

character infile*80
integer file_mode, ncid, status

call get_command_argument (1, infile)
print *, 'Open file: ' // trim (infile)

file_mode = nf90_nowrite
status = nf90_open (infile, file_mode, ncid)

print *, ' status = ', status
print *, ' ncid = ', ncid

status = nf90_close (ncid)

end program sr

$ ftn -cpp -g -Wall -fbacktrace -fcheck=bounds,pointer -ffpe-trap=invalid,zero,overflow nf90_open.f90
$ ./a.out
Open file:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0 0x7f0dd33d82e2 in ???
#1 0x7f0dd33d7475 in ???
#2 0x7f0dd3098dbf in ???
...
#15 0x7f0dd3789e80 in ???
#16 0x400bf1 in sr
at /...some/path.../nf90_open.f90:12
#17 0x400d42 in main
at /...some/path.../nf90_open.f90:2
Floating exception

Status: Fixed in HDF5 1.14.4.2; the HPE Case still in progress

Error `aspacem Valgrind: FATAL: M_PROCMAP_BUF is too low` with Valgrind4hpc¶

Vendor: HPE

Description: When a user MPI code is used with Valgrind4hpc, the following error occurs:

$ valgrind4hpc -n 128 -l "--cpu-bind=cores -c 2" --valgrind-args="--quiet --leak-check=yes" ./some_executable ...
...
--807515:0: aspacem Valgrind: FATAL: M_PROCMAP_BUF is too low.
--807515:0: aspacem Increase it and rebuild. Exiting now.
...

Status: Fixed in 25.09
Availability: The fixed version not available yet

Nvidia Fortran compiler shared library link error¶

Vendor: HPE

Description: A test code generate a nvlink error in the PrgEnv-nvidia environment:

$ ftn -c -fPIC example_mod.f90 -o example_mod.o
$ ftn -shared -fPIC example_mod.o -o libexample.so
/usr/bin/ld: warning: /tmp/pgcudafat8h91h0VUoppmP.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker

$ ftn -c -fPIC example_prog.f90 -o example_prog.o
$ ftn -fPIC example_prog.o libexample.so -o example_prog.exe
nvlink error : Undefined reference to '_example_mod_21' in 'example_prog.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/tools/nvdd

When the cpu module is loaded instead of the gpu module, the code compiles and runs OK.

Status: In progress

MPI-IO error when using `MPI_Type_indexed`¶

Vendor: HPE
Description: Using Cray MPI to compile an MPI program that calls MPI_Type_indexed to concatenate multiple subarray MPI data types produces an incorrect result. The concatenated datatype is used to set the MPI fileview. The following error is seen with a test code:
```
$ srun -N1 --ntasks-per-node=4 ./indexed_fsize -f dummy
Error: expecting file size 800, but got 1200
srun: error: nid004481: tasks 0-3: Exited with exit code 1
srun: Terminating StepId=3XXXXXXX.0
```
The same problem happens when calling MPI_Type_create_hindexed.

This problem is seen with PrgEnv-gnu, PrgEnv-cray and PrgEnv-nvidia.

This turns out to be due to a bug in MPICH which Cray's MPI-IO implementation is based on. The error can be reproduced using MPICH versions 4.0.3 and prior. The user who reported the problem subsequently opened an issue with MPICH.
Status: Fixed in cray-mpich "9.x"
Availability: cray-mpich "9.x" not available yet

`crayftn` error on unlimited polymorphic assumed rank argument¶

Vendor: HPE

Description: Trying to use the SELECT TYPE and SELECT RANK constructs together in order to make use of an unlimited polymorphic assumed rank argument generates a compile error. A reproducer is provided.

$ module load PrgEnv-cray
$ ftn combined.f90

module a
       ^
ftn-855 ftn: ERROR A, File = combined.f90, Line = 1, Column = 8
  The compiler has detected errors in module "A".  No module information file will be created for this module.

      select type (x)
                   ^
ftn-1871 ftn: ERROR FOO, File = combined.f90, Line = 7, Column = 20
  The selector in a SELECT TYPE statement must be polymorphic.

        print *, x
             ^
ftn-620 ftn: ERROR FOO, File = combined.f90, Line = 9, Column = 18
  This reference to assumed-rank variable "X" is not valid.

  use a, only: foo
      ^
ftn-894 ftn: ERROR MAIN, File = combined.f90, Line = 15, Column = 7
  Module "A" has compile errors, therefore declarations obtained from the module via the USE statement may be incomplete.

Cray Fortran : Version 15.0.1 (20230120205242_66f7391d6a03cf932f321b9f6b1d8612ef5f362c)
Cray Fortran : Compile time:  0.0027 seconds
Cray Fortran : 17 source lines
Cray Fortran : 4 errors, 0 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.

Status: In progress (Reopened)

CMake config files for `cray-fftw` module¶

Vendor: HPE
Description: The CMake files in cray-fftw module (e.g., /opt/cray/pe/fftw/3.3.10.6/x86_milan/lib/cmake/fftw3) have an incorrect path:
```
set (FFTW3f_INCLUDE_DIRS /tmp/tmp.VFDTh6VYy8/rpm/BUILDROOT/opt/cray/pe/fftw/3.3.10.3/x86_genoa/include)
```
This causes a problem with find_package(fftw3 REQUIRED).
Status: Fixed in cray-fftw-3.3.10.10 released in CPE 25.03
Availability: CPE 25.3 not available yet

`crayftn` passing procedure pointer derived type components to associated function call¶

Vendor: HPE

Description: An ICE (internal compiler error) is triggered when there is an associated function call whose arguments are two procedure pointer derived type components. A reproducer is available.

$ crayftn --version
Cray Fortran : Version 17.0.0

$ ftn -c example.f90
   Struct_Opr  idx = 17  Cray parcel pointer   rank = 0; line = 49, col = 34
   Left opnd is IR_Tbl_Idx;  line = 49, col = 31
      Dv_Deref_Opr  idx = 81  type(TYPE_T)    typ_idx (587)   dim = 0 rank = 0; line = 49, col = 31
      Left opnd is AT_Tbl_Idx;  line = 49, col = 31
         LHS  idx = 701  derived-type * 587
      Right operand is NO_Tbl_Idx;
   Right operand is AT_Tbl_Idx;  line = 49, col = 35
      TEST_FUNCTION_  idx = 591  Cray parcel pointer * Cray_Parcel_Ptr_8
ftn-1716 ftn: INTERNAL EQUALS, File = example.f90, Line = 49, Column = 34
  A multiparented node was encountered.
ftn-2116 ftn: INTERNAL
  "/opt/cray/pe/cce/17.0.0/cce/x86_64/bin/ftnfe" was terminated due to receipt of signal 06:  Aborted.

Status: Fixed in CCE 19.0.0
Availability: CCE 19.0.0 not available yet

`crayftn` argument that has `pointer` attribute causes a compile-time error about needing `pointer` attribute¶

Vendor: HPE
Description: When there is a procedure argument to a derived type user defined constructor that has the pointer attribute, the interface for the constructor is in a module and the definition is in a submodule, a compiler error occurs which says that the procedure argument needs the pointer attribute, even though it already has it. See the error message with a reproducer code:
```
$ crayftn --version
Cray Fortran : Version 17.0.0

$ ftn -c example.f90

submodule(pointer_attribute_bug_m) pointer_attribute_bug_s
          ^
ftn-1800 ftn: ERROR CONSTRUCT, File = example.f90, Line = 32, Column = 11
  Procedure "TEST_FUNCTION" has the INTENT attribute, so it must be a procedure pointer.  Add the POINTER attribute.
```
Status: Fixed in CCE 19.0.0
Availability: CCE 19.0.0 not available yet

Apps instrumented with `perftools-lite-gpu` get an MPI error¶

Vendor: HPE

Description: When MPI apps offloading with OpenMP are instrumented with perftools-lite-gpu, they get an MPI error:

$ srun -n 4 -c 32 --cpu-bind=cores --gpus-per-task=1 --gpu-bind=none ./a.out
CrayPat/X:  Version 23.12.0 Revision 67ffc52e7 sles15.4_x86_64  11/13/23 21:04:20
(GTL DEBUG: 1) cuPointerGetAttribute: (null), (null), line no 327
MPICH ERROR [Rank 1] [job id 28173016.1] [Mon Jul 15 18:10:36 2024] [nid001161] - Abort(606152194) (rank 1 in comm 0): Fatal error in PMPI_Barrier: Invalid count, error stack:
PMPI_Barrier(280)....................: MPI_Barrier(comm=comm=0x84000001) failed
PMPI_Barrier(265)....................:
MPIR_CRAY_Barrier(124)...............:
MPIDI_Cray_shared_mem_coll_bcast(518):
MPIR_Localcopy(95)...................:
(unknown)(): Invalid count

aborting job:
Fatal error in PMPI_Barrier: Invalid count, error stack:
PMPI_Barrier(280)....................: MPI_Barrier(comm=comm=0x84000001) failed
PMPI_Barrier(265)....................:
MPIR_CRAY_Barrier(124)...............:
MPIDI_Cray_shared_mem_coll_bcast(518):
MPIR_Localcopy(95)...................:
(unknown)(): Invalid count
(GTL DEBUG: 2) cuPointerGetAttribute: (null), (null), line no 327
MPICH ERROR [Rank 2] [job id 28173016.1] [Mon Jul 15 18:10:36 2024] [nid001161] - Abort(606152194) (rank 2 in comm 0): Fatal error in PMPI_Barrier: Invalid count, error stack:
PMPI_Barrier(280)....................: MPI_Barrier(comm=comm=0x84000001) failed
PMPI_Barrier(265)....................:
MPIR_CRAY_Barrier(124)...............:
MPIDI_Cray_shared_mem_coll_bcast(463):
MPIR_Localcopy(95)...................:
(unknown)(): Invalid count
...

Uninstrumented executables run fine.

Status: In progress

`cray-libsci` segfaults when using multiple OpenMP threads¶

Vendor: HPE

Description: An OpenMP code using cray-libsci segfaults when run with multiple threads on a single node:

$ srun -n 1 -c 16 --cpu_bind=cores -G 1 --gpu-bind=none ./code/STRUMPACK/build/examples/sparse/testPoisson3d 50 --sp_disable_gpu
...
srun: error: nid001036: task 0: Segmentation fault
srun: Terminating StepId=27511778.0

Status: In progress

Valgrind4hpc needs to drop support for exp-sgcheck¶

Vendor: HPE
Description: Valgrind doesn't support exp-sgcheck any more but Valgrind4hpc lists it as a supported tool.
Status: Fixed in Valgrind4hpc 2.13.4, released in CPE 24.11
Availability: Valgrind4hpc 2.13.4 not available yet

`MPI_Allgatherv` fails for device buffers within a node¶

Vendor: HPE

Description: A user code fails with the function call where the device buffer returned by the omp_get_mapped_ptr function is used in a single-node job in the PrgEnv-cray and PrgEnv-nvidia environments:

MPICH ERROR [Rank 1] [job id 26614636.0] [Sun Jun  9 07:53:26 2024] [nid001236] - Abort(86580482) (rank 1 in comm 0): Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(491)......................: MPI_Allgatherv(sbuf=MPI_IN_PLACE, scount=0, MPI_DATATYPE_NULL, rbuf=0x7ff2db800000, rcounts=0xe34b3e0, displs=0xe339770, datatype=MPI_DOUBLE_COMPLEX, comm=MPI_COMM_WORLD) failed
MPIR_CRAY_Allgatherv(466).................:
MPIR_Allgatherv_impl(277).................:
MPIR_Allgatherv_intra_auto(191)...........: Failure during collective
MPIR_Allgatherv_intra_auto(186)...........:
MPIR_Allgatherv_intra_ring(166)...........:
MPIC_Sendrecv(338)........................:
MPIC_Wait(71).............................:
MPIR_Wait_impl(41)........................:
MPID_Progress_wait(201)...................:
MPIDI_Progress_test(105)..................:
MPIDI_SHMI_progress(118)..................:
MPIDI_POSIX_progress(412).................:
MPIDI_CRAY_Common_lmt_ctrl_send_rts_cb(64):
MPIDI_CRAY_Common_lmt_handle_recv(44).....:
MPIDI_CRAY_Common_lmt_import_mem(218).....:
(unknown)(): Invalid count
...

Status: In progress

Apps instrumented with `perftools-lite` or `perftools` hang or fail¶

Vendor: HPE

Description: When apps are instrumented with perftools-lite or perftools, they hang, segfault or fail for a unknown reason:

# Hang
$ ls -lrt | tail -2; sacct -o jobid,jobname,start,end,elapsed,state -j 26530829; date
-rw------- 1 elvis elvis       5819 Jun  6 13:04 rsl.out.0000
-rw------- 1 elvis elvis       5831 Jun  6 13:04 rsl.error.0000
JobID           JobName               Start                 End    Elapsed      State
------------ ---------- ------------------- ------------------- ---------- ----------
...
26530829.0      wrf.exe 2024-06-06T13:04:11             Unknown   00:29:02    RUNNING
Thu 06 Jun 2024 01:33:13 PM PDT

# Segfault
srun: error: nid004203: task 9: Segmentation fault
srun: Terminating StepId=26197475.0

# Fail for a unknown reason
srun: error: nid006953: tasks 0-127: Exited with exit code 255
srun: Terminating StepId=26230127.0

Status: Fixed in perftools 24.11.0
Workaround: Set the PAT_RT_CALLSTACK_MODE environment variable before your srun command:
```
export PAT_RT_CALLSTACK_MODE=frames
```
Availability: perftools 24.11.0 not available yet

Codes fail with '`cxil_map: write error`'¶

Vendor: HPE

Description: When built with the -O2 or -O3 flag in the PrgEnv-gnu environment, a CPU code called CROCO (Coastal and Regional Ocean Community model) fails with the error:

cxil_map: write error
cxil_map: write error
cxil_map: write error
...
cxil_map: write error
MPICH ERROR [Rank 132] [job id 26513451.0] [Thu Jun  6 04:27:18 2024] [nid005207] - Abort(538553615) (rank 132 in comm 0): Fatal error in PMPI_Irecv: Other MPI error, error stack:
PMPI_Irecv(166)........: MPI_Irecv(buf=0x7ffda83b8580, count=630, MPI_DOUBLE_PRECISION, src=115, tag=6, MPI_COMM_WORLD, request=0x7ffda83b84d0) failed
MPID_Irecv(529)........:
MPIDI_irecv_unsafe(163):
MPIDI_OFI_do_irecv(356): OFI tagged recv failed (ofi_recv.h:356:MPIDI_OFI_do_irecv:Bad address)

aborting job:
Fatal error in PMPI_Irecv: Other MPI error, error stack:
PMPI_Irecv(166)........: MPI_Irecv(buf=0x7ffda83b8580, count=630, MPI_DOUBLE_PRECISION, src=115, tag=6, MPI_COMM_WORLD, request=0x7ffda83b84d0) failed
MPID_Irecv(529)........:
MPIDI_irecv_unsafe(163):
MPIDI_OFI_do_irecv(356): OFI tagged recv failed (ofi_recv.h:356:MPIDI_OFI_do_irecv:Bad address)
MPICH ERROR [Rank 137] [job id 26513451.0] [Thu Jun  6 04:27:18 2024] [nid005207] - Abort(941206799) (rank 137 in comm 0): Fatal error in PMPI_Irecv: Other MPI error, error stack:
PMPI_Irecv(166)........: MPI_Irecv(buf=0x7ffdc3523ea0, count=630, MPI_DOUBLE_PRECISION, src=122, tag=8, MPI_COMM_WORLD, request=0x7ffdc35215f8) failed
MPID_Irecv(529)........:
MPIDI_irecv_unsafe(163):
MPIDI_OFI_do_irecv(356): OFI tagged recv failed (ofi_recv.h:356:MPIDI_OFI_do_irecv:Bad address)
...

With a lower optimization level, the code runs fine. A similar error is observed with an old version of WRF.

Status: In progress. Vendor reports a successful run with a reported app when using cray-mpich/8.1.30 with SHS 11.1. But a different app fails in the same environment.

`cray-netcdf` and `cray-parallel-netcdf` module issues with wrong lib directories¶

Vendor: HPE
Description: Many model build systems rely on nc-config and pnetcdf-config to get the correct compiler flags and link libraries for compiling with these tools. However the modules cray-parallel-netcdf/1.12.2.1, cray-netcdf/cray-netcdf/4.8.1.1, and cray-netcdf-hdf5parallel/4.8.1.1 all give incorrect results. For example,
```
$ pnetcdf-config --libdir
/opt/cray/pe/parallel-netcdf/1.12.2.1/gnu/8.2/lib
```
when the correct path is /opt/cray/pe/parallel-netcdf/1.12.2.1/INTEL/19.1/lib. With cray-netcdf-hdf5parallel/4.8.1.1, nc-config gives several wrong results:
```
--cflags    -> -DpgiFortran
--cxx4flags -> -DpgiFortran
--libdir    -> /opt/cray/pe/netcdf-hdf5parallel/4.8.1.1/gnu/8.2/lib
```
The same problem is observed in later versions.
Status: In progress

`crayftn` ICE on `WHERE` statement with defined assignment¶

Vendor: HPE

Description: The Cray Fortran compiler hits an ICE with a test code when it encounters a WHERE statement that uses defined assignment.

...
Creating internal compiler error backtrace (please wait):
[0x000000012d4e69] linux_backtrace /home/jenkins/crayftn/pdgcs/v_util.c:186
[0x000000012d53a1] pdgcs_internal_error(char const*, char const*, int) /home/jenkins/crayftn/pdgcs/v_util.c:663
[0x00000001ddcbc5] verify_binary_args(EXP_OP, EXP_INFO, EXP_INFO, TYPE, char const*, EXP_T_TYPE, bool, bool, bool) [clone .constprop.0] /home/jenkins/crayftn/pdgcs/v_expr_tbl.c:413
...
[0x000000008a9fc9] _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120
ftn-7991 ftn: INTERNAL EXAMPLE, File = example.f90, Line = 34
  INTERNAL COMPILER ERROR:  "Array syntax flags do not match" (/home/jenkins/crayftn/pdgcs/v_expr_tbl.c, line 413, version b59b7a8e9169719529cf5ab440f3c301e515d047)
ftn-2116 ftn: INTERNAL
  "/opt/cray/pe/cce/17.0.0/cce/x86_64/bin/optcg" was terminated due to receipt of signal 06:  Aborted.
...

Status: In progress

Code fails with '`MPIDI_OFI_send_normal:Resource temporarily unavailable)`'¶

Vendor: HPE

Description: A user app with GPU-aware MPI gets the following error with cray-mpich/8.1.25 in multi-node jobs when GDRCopy is used:

...
MPICH ERROR [Rank 10] [job id 24282816.0] [Thu Apr 11 15:49:07 2024] [nid008436] - Abort(739891471) (rank 10 in comm 0): Fatal error in PMPI_Send: Other MPI error, error stack:
PMPI_Send(163)............: MPI_Send(buf=0x7fc1755b4f90, count=6300, MPI_DOUBLE, dest=12, tag=1, MPI_COMM_WORLD) failed
MPID_Send(499)............:
MPIDI_send_unsafe(58).....:
MPIDI_OFI_send_normal(368): OFI tagged senddata failed (ofi_send.h:368:MPIDI_OFI_send_normal:Resource temporarily unavailable)
...

The user was using PrgEnv-gnu.

Status: In progress

`sanitizers4hpc` with Compute Sanitizer's memcheck produces output that is not aggregated¶

Vendor: HPE
Description: Compute Sanitizer output aggregation needs improvement.
Status: Fixed in PE 24.07; closed
Availability: PE 24.07 not available yet

No source line number displayed when run with MemorySanitizer in `PrgEnv-cray`¶

Vendor: HPE

Description: No info is provided by MemorySanitizer where in source code an error occurs or where the memory was allocated.

==1068200==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x3007bc  (/pscratch/sd/e/elvis/a.out+0x3007bc)
    #1 0x7fc5c262e24c  (/lib64/libc.so.6+0x3524c) (BuildId: ddc393ac74ed8f90d4fdfff796432fbafd281e1b)
    #2 0x26a849  (/pscratch/sd/e/elvis/a.out+0x26a849)
...

The same problem is seen with PrgEnv-intel, as it turned out.

Status: Fixed in CCE 18.0.0 for PrgEnv-cray; the PrgEnv-intel problem won't be fixed
Availability: CCE 18.0.0 not available yet

Workaround: Set the environment variable:

export MSAN_OPTIONS="allow_addr2line=true"

`disable_sanitizer_instrumentation` attribute doesn't work with `PrgEnv-aocc`¶

Vendor: HPE
Description: The __attribute__((disable_sanitizer_instrumentation)) attribute doesn't disable sanitizer instrumentation in the AOCC compilers although the compilers are Clang-based.
Status: In progress

CCE 17.0.0 Fortran compiler fails four Smart-Pointers tests¶

Vendor: HPE
Description: The Cray Fortran compiler fails four tests in the Smart-Pointers test suite.
Status: In progress

`crayftn` runtime error with user defined operator on associate name¶

Vendor: HPE
Description: A segmentation fault occurs in a code when calling a user defined operator on a name associated with a function/expression result.
```
lib-4968 : WARNING
  An unallocated allocatable array 'STRING_' is referenced at
  at line 20 in file 'example.f90'.
Segmentation fault
```
Status: Fixed in cce 20.0.0 and 19.0.0
Availability: cce 20.0.0 and 19.0.0 not available yet

Valid coarray code rejected by `crayftn`¶

Vendor: HPE

Description: A coarray code is incorrectly rejected with errors by the Cray Fortran compiler.

$ ftn coarrays.f90 -o coarrays.exe
       call assign_and_synchronize(lhs=u_half, rhs=u + (dt/2)*(nu*d_dx2(u,dx) - d_dx(half_uu,dx)))
                                                                         ^                          
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 15, Column = 74 
  Coarray t$14 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.
                                                                         ^                          
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 15, Column = 74 
  Coarray t$14 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.
                                                                                      ^             
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 15, Column = 87 
  Coarray t$19 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.
                                                                                      ^             
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 15, Column = 87 
  Coarray t$19 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.

        call assign_and_synchronize(lhs=u, rhs=u + dt*(nu*d_dx2(u_half,dx) - d_dx(half_uu,dx)))
                                                                ^                               
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 17, Column = 65 
  Coarray t$35 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.
                                                                ^                               
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 17, Column = 65 
  Coarray t$35 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.
                                                                                  ^             
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 17, Column = 83 
  Coarray t$40 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.
                                                                                  ^             
ftn-1587 ftn: ERROR COARRAY_BURGERS_SOLVER, File = coarrays.f90, Line = 17, Column = 83 
  Coarray t$40 must have the ALLOCATABLE attribute in order to have a deferred shape in the coarray dimensions.

Cray Fortran : Version 17.0.0 (20231107223020_b59b7a8e9169719529cf5ab440f3c301e515d047)
Cray Fortran : Compile time:  0.1774 seconds
Cray Fortran : 135 source lines
Cray Fortran : 8 errors, 0 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message.

Status: Fixed in CCE 19.0.0 released in CPE 25.03
Availability: CCE 19.0.0 not available yet

Incorrect results and poor performance with `do concurrent` reduction¶

Vendor: HPE
Description: A code that does a do concurrent reduce operation gives incorrect results when built with the Cray Fortran compiler. When compiled with the -h thread_do_concurrent flag, the code shows poor performance.
Status: In progress

TCP BTL fails to collect all interface addresses (when interfaces are on different subnets)¶

Vendor: Open MPI

Description: Multi-node Open MPI point-to-point communications using the tcp BTL component fail because, although one NIC on a Perlmutter node has two IP interfaces for different subnets (one private and one public), only one IP is used per peer kernel interface.

Open MPI detected an inbound MPI TCP connection request from a peer
that appears to be part of this MPI job (i.e., it identified itself as
part of this Open MPI job), but it is from an IP address that is
unexpected.  This is highly unusual.

The inbound connection has been dropped, and the peer should simply
try again with a different IP interface (i.e., the job should
hopefully be able to continue).

  Local host:          nid002292
  Local PID:           1273838
  Peer hostname:       nid002293 ([[9279,0],1])
  Source IP of socket: 10.249.13.210
  Known IPs of peer:
    10.100.20.22
    128.55.69.127
    10.249.13.209
    10.249.36.5
    10.249.34.5

Status: In progress

CP2K container builds with Open MPI with networking bug¶

Vendor: Nvidia
Description: CP2K container images on NGC that NERSC suggests to use were built with old Open MPI versions (4.x), and bugs there contribute to multi-node job failures. Requesting new images with Open MPI 5.x built against libfabric.
Status: In progress

`PrgEnv-nvhpc` conflicts with `cudatoolkit` module¶

Vendor: HPE

Description: The PrgEnv-nvhpc environment loads the nvhpc module which is listed as a conflict for the cudatoolkit module.

$ ml PrgEnv-nvhpc
$ ml -t
...
cpe/23.12
cudatoolkit/12.2
craype-accel-nvidia80
gpu/1.0
nvhpc/23.9
...
PrgEnv-nvhpc/8.5.0

$ ml rm cudatoolkit
$ ml cudatoolkit
Lmod has detected the following error:  Cannot load module "cudatoolkit/12.2" because these module(s) are
loaded:
   nvhpc

While processing the following module(s):
    Module fullname   Module Filename
    ---------------   ---------------
    cudatoolkit/12.2  /opt/cray/pe/lmod/modulefiles/core/cudatoolkit/12.2.lua

$ ml -t        # cudatoolkit not in the list
...
cpe/23.12
craype-accel-nvidia80
gpu/1.0
nvhpc/23.9
...
PrgEnv-nvhpc/8.5.0

Status: The nvhpc and PrgEnv-nvhpc modules will be removed in CPE 24.11, in favor of nvidia and PrgEnv-nvidia; Fixed in CPE 24.11
Workaround: Use the nvidia and PrgEnv-nvidia modules instead
Availability: CPE 24.11 not available yet

Regression in device memory growth issue with GPU-Aware MPI for XGC code¶

Vendor: HPE
Description: The code runs into a problem of device memory growth when GPU-Aware MPI is enabled.
Status: In progress
Workaround: Disable the memory registration (MR) cache:
```
export FI_MR_CACHE_MAX_COUNT=0
```

OpenACC reduction with worker gives wrong answers¶

Vendor: Nvidia
Description: A procedure declared with an OpenACC routine worker directive returns wrong reduction values in the PrgEnv-nvidia environment when called from within a loop where num_workers and vector_length are set to 32.
Status: In progress

Performance issue with `fi_write()` to GPU memory on Perlmutter¶

Vendor: HPE
Description: The GASNet-EX networking library implements RMA APIs with the vendor-provided libfabric and its cxi provider. RMA Put operations between two GPU nodes when the destination address is in remote GPU memory show unexpectedly much lower performance than MPI. For other source/destination memory and Put/Get mode combinations, the GASNet-EX and MPI benchmarks show similar performance or GASNet-EX performs better.
Status: In progress

RMA performance problems on Perlmutter with GASNet Codes¶

Vendor: HPE
Description: With the GASNet-EX networking library implementing RMA (Remote Memory Access) APIs with fi_read() and fi_write() functions of the vendor-provided libfabric and its cxi provider, it is observed that RMA operations perform very well under ideal conditions. When conditions are not ideal, the performance decreases significantly for both host and GPU memory.
Status: In progress

Internal Compiler Error¶

Vendor: HPE
Description: An internal compiler error occurs when compiling the E3SM code with the AMD compilers.
Status: Fixed in CCE 19.0.0
Availability: CCE 19.0.0 not available yet

Apprentice2's Mosaic report shows 'No point-to-point data found' for shmem code¶

Vendor: HPE
Description: Apprentice2, a tool in perftools, displays the Nopoint-to-point data found! error when the Mosaic report is selected.
Status: Fixed in Apprentice3 in perftools-25.03.0
Availability: perftools-25.03.0 not available yet

Code hangs when run on multiple nodes, sometimes showing the '`xpmem_attach error: : Cannot allocate memory`' message¶

Vendor: HPE
Description: A code that runs fine with 128 MPI tasks on a single CPU node hangs when running on multiple nodes, sometimes generating the following message but not always.
```
xpmem_attach error: : Cannot allocate memory
```
Status: In progress
Workaround: Set the FI_MR_CACHE_MONITOR environment variable as follows:
```
export FI_MR_CACHE_MONITOR=kdreg2
```

`cray-mpich` with GTL not recognising pointer to device memory, that was returned by OpenCL `clSVMAlloc`¶

Vendor: HPE
Description: When a pointer returned by the OpenCL clSVMAlloc function is used in one-sided MPI communication, it is not getting the correct data. A workaround of wrapping MPI RMA exposure epoch in clEnqueueSVMMap/clEnqueueSVMUnmap causes a large amount of data to be unnecessarily moved between the host and device memory. Asking for advice for using OpenCL with MPICH_GPU_SUPPORT_ENABLED.
Status: In progress

Resolved Bugs¶

Segfaulting when running Linaro Performance Report or MAP tool on Python code¶

Vendor: Linaro
Description: When the Linaro Performance Report or MAP tool is used with a Python code, a segmentation fault happens.
```
$ perf-report srun -n 8 python testcode.py
...
srun: error: nid005372: task 0: Segmentation fault
...
```
This probably happens because of a conflict between Linaro Forge and the locally launched LDMS (Lightweight Distributed Metric Service).
Status: Resolved

`gcc-native` module does not seem to affect version of `gcc`¶

Vendor: HPE

Description: When a non-default gcc-native version is loaded, the Cray compiler wrapper uses the loaded version while gcc, mpicc and nvcc use the default gcc version:

$ module -t list
...
PrgEnv-gnu/8.5.0
...
gcc-native/13.2      # the default gcc-native version
...
$ module load gcc-native/12.3
$ module -t list
...
gcc-native/12.3
...

$ cc hello.c
$ strings -a a.out | grep GCC
...
GCC: (SUSE Linux) 12.3.0

$ gcc hello.c
$ strings -a a.out | grep GCC
...
GCC: (SUSE Linux) 13.2.1 20240206 [revision 67ac78caf31f7cb3202177e6428a46d829b70f23]

$ mpicc hello.c
$ strings -a a.out | grep GCC
...
GCC: (SUSE Linux) 13.2.1 20240206 [revision 67ac78caf31f7cb3202177e6428a46d829b70f23]

$ nvcc hello.c
...
GCC: (SUSE Linux) 13.2.1 20240206 [revision 67ac78caf31f7cb3202177e6428a46d829b70f23]
...

Status: Fixed in CPE 24.11; a fix provided

NCCL hangs on Perlmutter in `nccl-tests` with OFI plugin and Slingshot¶

Vendor: HPE
Description: The nccl-tests all_reduce benchmark on Perlmutter with NCCL and the OFI plugin consistently gets stuck somewhere around the 4-16k message size and then all GPUs are pinned at 100% utilization until timeout. This occurs with
- cpe/23.12, cray-mpich/8.1.28
- NCCL 2.19 or 2.21
- cudatoolkit/12.2
- aws-ofi-nccl 1.6.0
Status: Closed

Workaround: Set as follows

export FI_CXI_RDZV_GET_MIN=0
export FI_CXI_SAFE_DEVMEM_COPY_THRESHOLD=16777216

Apps instrumented with `perftools-lite-gpu` get an MPI error¶

Vendor: HPE

Description: WRF fails when instrumented with perftools-lite or perftools

...
srun: error: nid004366: tasks 0-63: Exited with exit code 255
srun: Terminating StepId=25322400.0
...

Status: Closed as this is a duplicate of the active case 'Apps instrumented with perftools-lite or perftools hang or fail'

Cray Fortran 17.0.0 dummy argument type not recognized for module procedure in same module¶

Vendor: HPE

Description: A valid Fortran code produces the error shown below when compiled with the Cray Fortran compiler.

$ ftn -c unimported-dummy-arg-type.f90
module foo_m
^
ftn-855 ftn: ERROR FOO_M, File = unimported-dummy-arg-type.f90, Line = 1, Column = 8
The compiler has detected errors in module "FOO_M". No module information file will be created for this module.

module function construct(bar) result(foo)
^
ftn-1279 ftn: ERROR CONSTRUCT, File = unimported-dummy-arg-type.f90, Line = 11, Column = 31
Procedure "CONSTRUCT" is defined at line 19 (unimported-dummy-arg-type.f90). The type of this argument does not agree with dummy argument "BAR".
^
ftn-287 ftn: WARNING CONSTRUCT, File = unimported-dummy-arg-type.f90, Line = 11, Column = 43
The result of function name "FOO" in the function subprogram is not defined.

Cray Fortran : Version 17.0.0 (20231107223020_b59b7a8e9169719529cf5ab440f3c301e515d047)
Cray Fortran : Compile time: 0.0039 seconds
Cray Fortran : 21 source lines
Cray Fortran : 2 errors, 1 warnings, 0 other messages, 0 ansi
Cray Fortran : "explain ftn-message number" gives more information about each message

Status: Fixed in CCE 18.0.1
Availability: CCE 18.0.1 not available yet

`crayftn` ICE when trying to build dftd4¶

Vendor: HPE

Description: An ICE occurs when building dftd4 with the Cray compiler

ftn-1795 ftn: INTERNAL MULTICHARGE_MODEL, File = /global/homes/e/elvis/Repositories/multicharge/src/multicharge/model.F90, Line = 30, Column = 4
  FORTRAN FE ASSERT: "new_attr_idx" failed. ( /home/jenkins/crayftn/fe90/sources/module.c at line 1224).
ftn-2116 ftn: INTERNAL
  "/opt/cray/pe/cce/17.0.0/cce/x86_64/bin/ftnfe" was terminated due to receipt of signal 06:  Aborted.

Status: Fixed in CCE 18.0.0

`crayftn` ICE when trying to build `MERGE` Function call ICE when the `MASK` argument is a derived type component¶

Vendor: HPE

Description: An ICE occurs when there is a function call to the intrinsic MERGE and when the MASK argument is a component of the derived type argument to a type bound procedure and when the result of the MERGE call is then passed to the intrinsic TRIM. The error message with a Fortran code is as follows:

Creating internal compiler error backtrace (please wait):
[0x000000012d4e69] linux_backtrace /home/jenkins/crayftn/pdgcs/v_util.c:186
[0x000000012d53a1] pdgcs_internal_error(char const*, char const*, int) /home/jenkins/crayftn/pdgcs/v_util.c:663
[0x000000015e3c52] llvm_cg::get_string_address(EXP_INFO) /home/jenkins/crayftn/pdgcs/llvm-substr.c:493
...
[0x007f35e3e3e24c] ?? ??:0
[0x000000008a9fc9] _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120
ftn-7991 ftn: INTERNAL MYPROC, File = example.f90, Line = 15 
  INTERNAL COMPILER ERROR:  "get_string_address - the string is not a lsbstr" (/home/jenkins/crayftn/pdgcs/llvm-substr.c, line 493, version b59b7a8e9169719529cf5ab440f3c301e515d047)
ftn-2116 ftn: INTERNAL  
  "/opt/cray/pe/cce/17.0.0/cce/x86_64/bin/optcg" was terminated due to receipt of signal 06:  Aborted.

Status: Fixed in CCE 18.0.1

`sanitizers4hpc`'s output aggregation with ThreadSanitizer¶

Vendor: HPE
Description: Aggregation of ThreadSanitizer output by sanitizers4hpc needs improvement.
Status: Fixed in sanitizers4hpc 1.1.3

`sanitizers4hpc` produces stack traces for '`Program hit CUDA_ERROR_INVALID_VALUE error`'¶

Vendor: HPE

Description: The following error message appears when using sanitizers4hpc with Compute Sanitizer's Memcheck although the desired output is produced:

Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to "invalid argument" on CUDA API call to cuPointerGetAttribute.
Saved host backtrace up to driver entry point at error
    #0 0x2eae6f in /usr/local/cuda-12.2/compat/libcuda.so.1
    #1 0xda19 in /home/jenkins/src/gtlt/cuda/gtlt_cuda_query.c:344:gtlt_cuda_pointer_type /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0
    #2 0x4bd9 in /home/jenkins/src/comx/gtlx_query.c:25:mpix_gtl_pointer_type /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0
    #3 0x1fa2475 in MPIR_Cray_Memcpy_wrapper /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #4 0x1841ee9 in MPIDIG_handle_unexp_mrecv /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #5 0x18ac030 in MPIC_Sendrecv /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #6 0x17d6bcf in MPIR_Barrier_intra_dissemination /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #7 0x232900 in MPIR_Barrier_intra_auto /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #8 0x232ab5 in MPIR_Barrier_impl /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #9 0x1a1f15d in MPIR_CRAY_Barrier /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #10 0x1a15772 in MPIDI_Cray_shared_mem_coll_opt_cleanup /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #11 0x18cc799 in MPIDI_Cray_coll_finalize /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #12 0x1b7a65f in MPID_Finalize /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #13 0x605ab4 in MPI_Finalize /opt/cray/pe/lib64/libmpi_gnu_123.so.12
    #14 0xf94 in /pscratch/sd/e/elvis/Memcheck/main.cc:14:main /pscratch/sd/e/elvis/Memcheck/./a.out
    #15 0x3524d in __libc_start_main /lib64/libc.so.6
    #16 0xe9a in ../sysdeps/x86_64/start.S:122:_start /pscratch/sd/e/elvis/Memcheck/./a.out

This error doesn't occur when the code is run without sanitizers4hpc.

Status: Fixed in sanitizers4hpc/1.1.3

Segfaulting with calls to `MPI_Win_allocate_shared` function on multiple CPU nodes¶

Vendor: HPE
Description: The FHI-aims code uses the MPI-3 Shared Memory model. When CPU nodes (as opposed to GPU nodes) are used and the number of MPI tasks per node goes over a certain threshold (46 in case of the problem tested), a multi-node run segfaults at a function call to MPI_Win_allocate_shared.
```
...
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

#0  0x14e493c23372 in ???
#1  0x14e493c22505 in ???
#2  0x14e493253dbf in ???
...
#11  0x14e49424a3d4 in ???
#12  0x2121b80 in __mpi_shm_MOD_allocate_shm_arr_1dr_diml
        at /pscratch/sd/e/elvis/FHIaims/src/mpi_shm.f90:152
...
```
A different task-per-node count triggers a segfault at a different MPI_Win_allocate_shared call in the code.
Status: Fixed in cray-mpich/8.1.29

Error occurs when MPI window object is not freed¶

Vendor: HPE

Description: Messages about a fatal MPI finalize error are generated when a MPI window object is not freed before MPI_Finalize.

MPICH ERROR [Rank 0] [job id 23533657.21] [Tue Mar 26 16:56:36 2024] [nid006635] - Abort(806971663) (rank 0 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:
PMPI_Finalize(214)...............: MPI_Finalize failed
PMPI_Finalize(161)...............:
MPID_Finalize(710)...............:
MPIDI_OFI_mpi_finalize_hook(1046): OFI endpoint close failed (ofi_init.c:1046:MPIDI_OFI_mpi_finalize_hook:Device or resource busy)
...

Status: A fix in cray-mpich/8.1.29.34

`cray-mpich` module does not set `LD_LIBRARY_PATH`¶

Vendor: HPE

Description: Loading the module doesn't update the environment variable and this has to be done manually.

$ export MPICH_VERSION_DISPLAY=1  # Print the MPI version number that is being used

$ ml -t
...
cray-mpich/8.1.25
...
$ echo $CRAY_MPICH_VERSION
8.1.25

$ ml cray-mpich/8.1.27            # Load a different version

$ echo $CRAY_MPICH_VERSION
8.1.27

$ srun -n 1 ./a.out               # Still using the previous version
MPI VERSION    : CRAY MPICH version 8.1.25.17 (ANL base 3.4a2)
...

$ export LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib:$LD_LIBRARY_PATH

$ srun -n 1 ./a.out               # Now using the intended version
MPI VERSION    : CRAY MPICH version 8.1.27.26 (ANL base 3.4a2)
...

Status: Fixed in cray-mpich/8.1.30

`crayftn` overloaded constructor with polymorphic argument in array constructor¶

Vendor: HPE

Description: The Cray Fortran compiler generates an internal compiler error for a code that passes a child type to an overloaded structure constructor within an array constructor, where the parent type has a deferred procedure.

Creating internal compiler error backtrace (please wait):
[0x00000000c75a43] linux_backtrace ??:?
[0x00000000c76931] pdgcs_internal_error(char const*, char const*, int) ??:?
[0x0000000125c2d0] _expr_type(EXP_INFO) ??:?
...
[0x007f86c32e129c] ?? ??:0
[0x00000000729d09] _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Note:  This is a non-debug compiler.  Technical support should
       continue problem isolation using a compiler built for
       debugging.

ftn-7991 ftn: INTERNAL EXAMPLE, File = example.f90, Line = 63
  INTERNAL COMPILER ERROR:  "_expr_type: Invalid table type" (/home/jenkins/crayftn/pdgcs/v_expr_utl.c, line 7360, version 66f7391d6a03cf932f321b9f6b1d8612ef5f362c)

Status: Fixed in CCE 18.0.0

Selected Vendor Bug Reports¶

Active Bugs¶

Bug in MPICH_SMP_SINGLE_COPY_MODE=XPMEM slows down application codes¶

Some apps crash nodes with MPICH_SMP_SINGLE_COPY_MODE=XPMEM¶

Floating-point exception with Fortran code using netCDF when built with -ffpe-trap=invalid¶

Error aspacem Valgrind: FATAL: M_PROCMAP_BUF is too low with Valgrind4hpc¶

Nvidia Fortran compiler shared library link error¶

MPI-IO error when using MPI_Type_indexed¶

crayftn error on unlimited polymorphic assumed rank argument¶

CMake config files for cray-fftw module¶

crayftn passing procedure pointer derived type components to associated function call¶

crayftn argument that has pointer attribute causes a compile-time error about needing pointer attribute¶

Apps instrumented with perftools-lite-gpu get an MPI error¶

cray-libsci segfaults when using multiple OpenMP threads¶

Valgrind4hpc needs to drop support for exp-sgcheck¶

MPI_Allgatherv fails for device buffers within a node¶

Apps instrumented with perftools-lite or perftools hang or fail¶

Codes fail with 'cxil_map: write error'¶

cray-netcdf and cray-parallel-netcdf module issues with wrong lib directories¶

crayftn ICE on WHERE statement with defined assignment¶

Code fails with 'MPIDI_OFI_send_normal:Resource temporarily unavailable)'¶

sanitizers4hpc with Compute Sanitizer's memcheck produces output that is not aggregated¶

No source line number displayed when run with MemorySanitizer in PrgEnv-cray¶

disable_sanitizer_instrumentation attribute doesn't work with PrgEnv-aocc¶

CCE 17.0.0 Fortran compiler fails four Smart-Pointers tests¶

crayftn runtime error with user defined operator on associate name¶

Valid coarray code rejected by crayftn¶

Incorrect results and poor performance with do concurrent reduction¶

TCP BTL fails to collect all interface addresses (when interfaces are on different subnets)¶

CP2K container builds with Open MPI with networking bug¶

PrgEnv-nvhpc conflicts with cudatoolkit module¶

Regression in device memory growth issue with GPU-Aware MPI for XGC code¶

OpenACC reduction with worker gives wrong answers¶

Performance issue with fi_write() to GPU memory on Perlmutter¶

RMA performance problems on Perlmutter with GASNet Codes¶

Internal Compiler Error¶

Apprentice2's Mosaic report shows 'No point-to-point data found' for shmem code¶

Code hangs when run on multiple nodes, sometimes showing the 'xpmem_attach error: : Cannot allocate memory' message¶

cray-mpich with GTL not recognising pointer to device memory, that was returned by OpenCL clSVMAlloc¶

Resolved Bugs¶

Segfaulting when running Linaro Performance Report or MAP tool on Python code¶

gcc-native module does not seem to affect version of gcc¶

NCCL hangs on Perlmutter in nccl-tests with OFI plugin and Slingshot¶

Apps instrumented with perftools-lite-gpu get an MPI error¶

Cray Fortran 17.0.0 dummy argument type not recognized for module procedure in same module¶

crayftn ICE when trying to build dftd4¶

crayftn ICE when trying to build MERGE Function call ICE when the MASK argument is a derived type component¶

sanitizers4hpc's output aggregation with ThreadSanitizer¶

sanitizers4hpc produces stack traces for 'Program hit CUDA_ERROR_INVALID_VALUE error'¶

Segfaulting with calls to MPI_Win_allocate_shared function on multiple CPU nodes¶

Error occurs when MPI window object is not freed¶

cray-mpich module does not set LD_LIBRARY_PATH¶

crayftn overloaded constructor with polymorphic argument in array constructor¶

Bug in `MPICH_SMP_SINGLE_COPY_MODE=XPMEM` slows down application codes¶

Some apps crash nodes with `MPICH_SMP_SINGLE_COPY_MODE=XPMEM`¶

Floating-point exception with Fortran code using netCDF when built with `-ffpe-trap=invalid`¶

Error `aspacem Valgrind: FATAL: M_PROCMAP_BUF is too low` with Valgrind4hpc¶

MPI-IO error when using `MPI_Type_indexed`¶

`crayftn` error on unlimited polymorphic assumed rank argument¶

CMake config files for `cray-fftw` module¶

`crayftn` passing procedure pointer derived type components to associated function call¶

`crayftn` argument that has `pointer` attribute causes a compile-time error about needing `pointer` attribute¶

Apps instrumented with `perftools-lite-gpu` get an MPI error¶

`cray-libsci` segfaults when using multiple OpenMP threads¶

`MPI_Allgatherv` fails for device buffers within a node¶

Apps instrumented with `perftools-lite` or `perftools` hang or fail¶

Codes fail with '`cxil_map: write error`'¶

`cray-netcdf` and `cray-parallel-netcdf` module issues with wrong lib directories¶

`crayftn` ICE on `WHERE` statement with defined assignment¶

Code fails with '`MPIDI_OFI_send_normal:Resource temporarily unavailable)`'¶

`sanitizers4hpc` with Compute Sanitizer's memcheck produces output that is not aggregated¶

No source line number displayed when run with MemorySanitizer in `PrgEnv-cray`¶

`disable_sanitizer_instrumentation` attribute doesn't work with `PrgEnv-aocc`¶

`crayftn` runtime error with user defined operator on associate name¶

Valid coarray code rejected by `crayftn`¶

Incorrect results and poor performance with `do concurrent` reduction¶

`PrgEnv-nvhpc` conflicts with `cudatoolkit` module¶

Performance issue with `fi_write()` to GPU memory on Perlmutter¶

Code hangs when run on multiple nodes, sometimes showing the '`xpmem_attach error: : Cannot allocate memory`' message¶

`cray-mpich` with GTL not recognising pointer to device memory, that was returned by OpenCL `clSVMAlloc`¶

`gcc-native` module does not seem to affect version of `gcc`¶

NCCL hangs on Perlmutter in `nccl-tests` with OFI plugin and Slingshot¶

Apps instrumented with `perftools-lite-gpu` get an MPI error¶

`crayftn` ICE when trying to build dftd4¶

`crayftn` ICE when trying to build `MERGE` Function call ICE when the `MASK` argument is a derived type component¶

`sanitizers4hpc`'s output aggregation with ThreadSanitizer¶

`sanitizers4hpc` produces stack traces for '`Program hit CUDA_ERROR_INVALID_VALUE error`'¶

Segfaulting with calls to `MPI_Win_allocate_shared` function on multiple CPU nodes¶

`cray-mpich` module does not set `LD_LIBRARY_PATH`¶

`crayftn` overloaded constructor with polymorphic argument in array constructor¶