gdb4hpc and CCDB¶
Note
The tool may fail to work properly without loading the cray-cti
module. Until this is added automatically by the system, please load the module, too.
Parallel Debugging with gdb4hpc¶
gdb4hpc
is a GDB-based parallel debugger, developed by HPE Cray. It allows programmers to either launch an application or attach to an already-running application that was launched with srun
, to debug the parallel code in command-line mode.
Below is an example of running gdb4hpc
for a parallel application:
$ salloc -A <allocation_account> -N 1 -C cpu -t 30:00 -q debug
...
$ module load gdb4hpc
$ gdb4hpc
...
dbg all> launch $pset{8} ./hello_mpi # Launch 'hello_mpi' using 8 tasks which I name '$pset'
dbg all> viewset $pset # Display the PE set thus defined
Name Procs
pset pset{0..7}
dbg all> bt # Show where I am - the backtrace
pset{0..7}: #0 0x00000000200009c5 in main at /global/cscratch/sd/elvis/hello_mpi.c:8
dbg all> list # List the code
pset{0..7}: 8 MPI_Init(&argc,&argv);
pset{0..7}: 9 MPI_Comm_size(MPI_COMM_WORLD, &nproc);
pset{0..7}: 10 MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
pset{0..7}: 11 printf("Hello world from %d\n", myRank);
pset{0..7}: 12 MPI_Finalize();
pset{0..7}: 13 return 0;
pset{0..7}: 14 }
dbg all> break hello_mpi.c:11 # Set a breakpoint at line 11 of hello_mpi.c
dbg all> continue # Run
dbg all> print myRank # Print the value of 'myRank' for all processes
pset[0]: 0
...
pset[7]: 7
dbg all> print $pset{3}::myRank # Print the value of 'myRank' for rank 3 only
pset[3]: 3
Note that ..
, as in pset{0..7}
, is to denote a range of numbers.
Comparative Debugging¶
What makes gdb4hpc
(and CCDB) unique is the comparative debugger technology, which enables programmers to run two executing applications side by side and compare data structures between them. This allows users to run two versions of the same application simultaneously, one that you know generates the correct results and another that gives incorrect results, to identify the location where the two codes start to deviate from each other.
CCDB is a GUI tool for comparative debugging. It runs gdb4hpc
underneath. Its interface makes it easy for users to interact with gdb4hpc
for debugging. Users are advised to use CCDB over gdb4hpc
.
To compare something between two applications, you need to let gdb4hpc
and CCDB know the name of the variable, and the location where a comparison is to be made, and how the data is distributed over MPI processes. For these, gdb4hpc
and CCDB use 3 entities:
- PE set: A set of MPI processes
- Decomposition: How a variable is distributed over the MPI processes in a PE set
- Assertion script: A collection of mathematical relationships (e.g., equality of the value of a variable in two codes) to be tested
Please see the man page man gdb4hpc
for usage information about gdb4hpc
's comparative debugging feature. Cray's 'XC Series Programming Environment User Guide' provides info on how to use the tool. The tutorial manual uses example codes that are provided in the gdb4hpc
distribution package. You can build executables using the provided script as follows:
module load gdb4hpc
cp -R $GDB4HPC_DIR/demos/hpcc_demo . # copy the entire directory to the current directory
cd hpcc_demo
module swap PrgEnv-intel PrgEnv-cray # its Makefile uses the Cray compiler
./build_demo.sh
This will build two binaries, hpcc_working
and hpcc_broken
.
CCDB Example¶
To use:
$ salloc -A <allocation_account> -N 2 -C cpu -t 30:00 -q debug # request enough nodes for launching two applications
...
$ module load cray-ccdb
$ ccdb
Then, launch two applications from the CCDB window.
Below is an assertion script which tests whether the 6 variables have the same values between the applications, at line 418 of HPL_pdtest.c
. It shows that resid0
and XmormI
have different values between the applications and therefore both applications have stopped at line 418.