Skip to content

Valgrind and Valgrind4hpc

Description

The Valgrind tool suite provides a number of debugging and profiling tools that help you make your programs faster and more correct. The most popular of these tools is called Memcheck which can detect many memory-related errors and memory leaks. Other supported tools include Cachegrind (a profiler using the number of instructions executed), Callgrind (similar to Cachegrind but records the call history among functions), Helgrind (a pthreads error detector), DRD (a pthreads error detector), Massif (a heap profiler), DHAT (a dynamic heap usage analysis tool).

Valgrind4hpc is a HPE tool that aggregates Valgrind messages across MPI processes, to make them easier to understand.

Using Valgrind

Prepare Your Program

Compile your program with -g to include debugging information so that Memcheck's error messages include exact line numbers. Using -O0 is also a good idea, if you can tolerate the slowdown. With -O1 line numbers in error messages can be inaccurate, although generally speaking running Memcheck on code compiled at -O1 works fairly well, and the speed improvement compared to running -O0 is quite significant. Use of -O2 and above is not recommended as Memcheck occasionally reports uninitialized-value errors which don't really exist.

All other tools are unaffected by optimization level, and for profiling tools like Cachegrind it is better to compile your program at its normal optimization level.

Running with Valgrind

To use a Valgrind tool on a program, run as follows:

module load valgrind
valgrind --tool=<tool-name> <valgrind-options> prog <prog-options>

Memcheck

Memcheck is the most popular Valgrind tool. It detects various memory errors in your code:

  • Out of bound access of heap memory
  • Accessing uninitialized memory
  • Incorrect freeing of heap memory (e.g., double-freeing heap memory, mismatched use of malloc/new/new [] vs. free/delete/delete [])
  • Overlapping src and dst pointers in memcpy
  • Misaligned memory allocation
  • Memory leaks

Running Serial Programs

If you normally run your program like this:

./myprog arg1 arg2

Use this command line:

valgrind --leak-check=yes ./myprog arg1 arg2

Memcheck is the default tool and, therefore, the tool name is omitted above. Equivalently, you can use the following command:

valgrind --tool=memcheck --leak-check=yes ./myprog arg1 arg2

When a program dynamically allocates a block of memory but doesn't free it after its use, the block cannot be reused, thus reducing available memory for the program. The --leak-check option turns on the detailed memory leak detector.

Your program will run much slower (e.g., 20 to 30 times) than normal, and use a lot more memory. Memcheck will issue messages about memory errors and leaks that it detects.

With an example code provided in a Valgrind manual:

#include <stdlib.h>

void f(void)
{
   int* x = malloc(10 * sizeof(int));
   x[10] = 0;        // problem 1: heap block overrun
}                    // problem 2: memory leak -- x not freed

int main(void)
{
   f();
   return 0;
}

we can get the following report about the memory error and the memory leak in the code:

...
==218857== Invalid write of size 4
==218857==    at 0x400534: f (a.c:6)
==218857==    by 0x400545: main (a.c:11)
==218857==  Address 0x4a69068 is 0 bytes after a block of size 40 alloc'd
==218857==    at 0x48386EB: malloc (vg_replace_malloc.c:393)
==218857==    by 0x400527: f (a.c:5)
==218857==    by 0x400545: main (a.c:11)
==218857==
==218857==
==218857== HEAP SUMMARY:
==218857==     in use at exit: 40 bytes in 1 blocks
==218857==   total heap usage: 1 allocs, 0 frees, 40 bytes allocated
==218857==
==218857== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==218857==    at 0x48386EB: malloc (vg_replace_malloc.c:393)
==218857==    by 0x400527: f (a.c:5)
==218857==    by 0x400545: main (a.c:11)
==218857==
==218857== LEAK SUMMARY:
==218857==    definitely lost: 40 bytes in 1 blocks
==218857==    indirectly lost: 0 bytes in 0 blocks
==218857==      possibly lost: 0 bytes in 0 blocks
==218857==    still reachable: 0 bytes in 0 blocks
==218857==         suppressed: 0 bytes in 0 blocks
==218857==
==218857== For lists of detected and suppressed errors, rerun with: -s
==218857== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Running Parallel Programs

For the sake of example, let's use a simple-minded MPI version:

#include <stdlib.h>
#include <mpi.h>

void f(void)
{
   int* x = malloc(10 * sizeof(int));
   x[10] = 0;        // problem 1: heap block overrun
}                    // problem 2: memory leak -- x not freed

int main(int argc, char **argv)
{
   int nproc, me;
   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &nproc);
   MPI_Comm_rank(MPI_COMM_WORLD, &me);
   f();
   MPI_Finalize();
   return 0;
}

In your batch script, simply (1) load the module; (2) add valgrind in front of your command. For example, your srun line will be replaced by the following:

module load valgrind
srun -n 8 valgrind --leak-check=yes ./a.out

Alternatively, you can direct output to a separate file for each MPI task, using the --log-file=... flag. Below, %q{SLURM_JOB_ID} is replaced with the environment variable's value (that is, Slurm job ID), and %q{SLURM_PROCID} with the MPI rank.

$ srun -n 8 valgrind --leak-check=yes --log-file=mc_%q{SLURM_JOB_ID}.%q{SLURM_PROCID}.out ./a.out
$ ls -l
...
-rw-------  1 elvis elvis  5481 Jun 23 08:56 mc_27100535.1.out
-rw-------  1 elvis elvis  5481 Jun 23 08:56 mc_27100535.0.out
-rw-------  1 elvis elvis  5481 Jun 23 08:56 mc_27100535.7.out
-rw-------  1 elvis elvis  5481 Jun 23 08:56 mc_27100535.6.out
-rw-------  1 elvis elvis  5481 Jun 23 08:56 mc_27100535.5.out
...

$ cat mc_27100535.0.out
...
==979438== Conditional jump or move depends on uninitialised value(s)
==979438==    at 0x48B2BB1: xpmem_make (in /opt/cray/xpmem/2.6.2-2.5_2.38__gd067c3f.shasta/lib64/libxpmem.so.0.0.0)
==979438==    by 0x6977B51: MPIDI_CRAY_XPMEM_mpi_init_hook (in /opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi_gnu_123.so.12.0.0)
==979438==    by 0x696F48B: MPIDI_SHMI_mpi_init_hook (in /opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi_gnu_123.so.12.0.0)
==979438==    by 0x657990F: MPID_Init (in /opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi_gnu_123.so.12.0.0)
==979438==    by 0x5004FDD: MPIR_Init_thread (in /opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi_gnu_123.so.12.0.0)
==979438==    by 0x5004DB5: PMPI_Init (in /opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi_gnu_123.so.12.0.0)
==979438==    by 0x40074E: main (a_mpi.c:13)
==979438==
...
==979438== Invalid write of size 4
==979438==    at 0x400724: f (a_mpi.c:7)
==979438==    by 0x400775: main (a_mpi.c:16)
...
==979438== LEAK SUMMARY:
==979438==    definitely lost: 40 bytes in 1 blocks
==979438==    indirectly lost: 0 bytes in 0 blocks
==979438==      possibly lost: 0 bytes in 0 blocks
==979438==    still reachable: 95,957 bytes in 601 blocks
==979438==         suppressed: 0 bytes in 0 blocks
==979438== Reachable blocks (those to which a pointer was found) are not shown.
==979438== To see them, rerun with: --leak-check=full --show-leak-kinds=all
...

As you can see, output files contain error messages for system libraries.

Suppressing Errors

Memcheck occasionally produces false positives, and there is a mechanism for suppressing these. This is also useful since Memcheck also reports errors in library code that you cannot change or do not care about. The default suppression set hides a lot of these, but you may come across more.

To make it easier to write suppressions, you can use the --gen-suppressions=yes or --gen-suppressions=all option. This tells Valgrind to print out a suppression for each reported error, and you can selectively copy those for the errors that you want to suppress into a suppression file. To use a suppression file, specify it with the --suppressions=<filename> flag. Valgrind uses the default suppression file $PREFIX/lib/valgrind/default.supp where $PREFIX is for the installation directory for Valgrind.

Unrecognized Instructions

When using Valgrind to debug your code, you may occasionally encounter error messages of the form:

valgrind: Unrecognised instruction at address 0x6b2f2b

accompanied by your program raising SIGILL and exiting. While this may be bug in your program (which caused it to jump to a non-code location), it may also be an instruction that is not correctly handled by Valgrind.

There are a couple of ways to work around issues related to unrecognized instructions. The simplest is often to make sure that the code you are debugging is compiled with the minimum level of optimization necessary in order to reproduce the bug you are investigating. This is in general good practice, and will avoid the use of more obscure (typically SIMD) instructions which are more likely to be unhandled.

If you find that this does not work, you may wish to try a different compiler - this can affect both the nature of the optimizations performed on your code, as well as the libraries to which your code is linked. In the specific example above with __intel_sse4_strtok, switching to the GNU programming environment and recompiling the code being debugged remedied this situation.

The above content is largely based on the Valgrind Quick Start Page. For more information about valgrind, please refer to http://valgrind.org/, especially, Valgrind User Manual.

Using Valgrind4hpc

Valgrind4hpc can be used for a MPI code and it aggregates any duplicate Valgrind messages across MPI processes to help provide an understandable picture of program behavior. The tool works with Memcheck, Helgrind and DRD only.

Running Programs

In your batch script, load the valgrind4hpc module, and then launch a parallel application using valgrind4hpc, as shown below.

module load valgrind4hpc
valgrind4hpc -n 8 --valgrind-args="--leak-check=yes" ./a.out

The -n flag is to specify the number of MPI tasks. Other srun flags can be provided with the --launcher-args="<arguments>" option. Valgrind arguments such as --leak-check=yes need to be passed with --valgrind-args=..., as shown above.

Output from the command is much simpler, making it easily manageable:

RANKS: <0..7>

Invalid write of size 4
  at f (in a_mpi.c:7)
  by main (in a_mpi.c:16)
Address is 0 bytes after a block of size 40 alloc'd
  at malloc (in vg_replace_malloc.c:393)
  by f (in a_mpi.c:6)
  by main (in a_mpi.c:16)


RANKS: <0..7>

40 bytes in 1 blocks are definitely lost
  at malloc (in vg_replace_malloc.c:393)
  by f (in a_mpi.c:6)
  by main (in a_mpi.c:16)


RANKS: <0..7>

HEAP SUMMARY:
  in use at exit: 40 bytes in 1 blocks

LEAK SUMMARY:
   definitely lost: 40 bytes in 1 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 0 bytes in 0 blocks

ERROR SUMMARY: 1 errors from 1 contexts (suppressed 601)

Note that the tool comes with suppression files for error messages associated with HPE software: known.supp, libmpich_cray.supp, libpmi.supp and misc.supp in the $VALGRIND4HPC_BASEDIR/share/suppressions directory. In the example above, they have reduced error messages by suppressing 601 errors.

For more info on how to use the tool, please load the module and read the man page.

Heap Usage and Memory Leaks Data from Execution Trees

An execution tree ("xtree") is made of a set of stack traces, each stack trace is associated with some resource consumptions or event counts. Depending on the xtree, different event counts/resource consumptions can be recorded in the xtree.

A typical usage for an xtree is to show a graphical or textual representation of the heap usage of a program.

Memory use xtrees can be recorded with Valgrind's Memcheck, Helgrind and Massif tools while Memcheck can output leak search results in an xtree.

An xtree heap memory report is produced at the end of the execution when using the option --xtree-memory, which collects data for the current allocated Bytes (curB), current allocated Blocks (curBk), total allocated Bytes (totB), total allocated Blocks (totBk), total Freed Bytes (totFdB) and total Freed Blocks (totFdBk).

Xtrees can be saved in 2 file formats, the Callgrind format and the Massif format. The commands below are to get memory use xtrees in the Callgrind format (output file names ending in .kcg) and generate a text report for MPI rank 0 using a Callgrind tool.

$ module rm darshan
$ srun -n 8 valgrind --xtree-memory=full --xtree-memory-file=xtmemory.%q{SLURM_PROCID}.kcg ./memoryleak_mpi
$ callgrind_annotate --auto=yes --inclusive=yes --sort=curB:100,curBk:100,totB:100,totBk:100,totFdB:100,totFdBk:100 xtmemory.0.kcg
...
--------------------------------------------------------------------------------
curB             curBk        totB               totBk        totFdB             totFdBk
--------------------------------------------------------------------------------
195,957 (100.0%) 602 (100.0%) 1,932,992 (100.0%) 789 (100.0%) 1,737,035 (100.0%) 187 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
curB             curBk        totB               totBk        totFdB             totFdBk       file:function
--------------------------------------------------------------------------------
195,957 (100.0%) 602 (100.0%) 1,911,132 (98.87%) 758 (96.07%) 1,719,355 (98.98%) 163 (87.17%)  memoryleak_mpi.c:main
100,000 (51.03%)   1 ( 0.17%)   100,000 ( 5.17%)   1 ( 0.13%)         0            0           memoryleak_mpi.c:f
 95,957 (48.97%) 601 (99.83%) 1,815,312 (93.91%) 762 (96.58%)    42,727 ( 2.46%)  95 (50.80%)  UnknownFile???:MPIR_Init_thread
 95,957 (48.97%) 601 (99.83%) 1,815,292 (93.91%) 761 (96.45%)    42,727 ( 2.46%)  95 (50.80%)  UnknownFile???:PMPI_Init
 95,109 (48.54%) 597 (99.17%)   123,965 ( 6.41%) 629 (79.72%)    28,856 ( 1.66%)  32 (17.11%)  UnknownFile???:MPIR_T_env_init
...
--------------------------------------------------------------------------------
-- Auto-annotated source: memoryleak_mpi.c
--------------------------------------------------------------------------------
curB             curBk        totB               totBk        totFdB             totFdBk
...<snipped>...
      .            .                  .            .                  .           .           void f(void)
      .            .                  .            .                  .           .           {
100,000 (51.03%)   1 ( 0.17%)   100,000 ( 5.17%)   1 ( 0.13%)         0           0              int* x = malloc(25000 * sizeof(int));
      .            .                  .            .                  .           .              x[25000] = 0;     // problem 1: heap block overrun
      .            .                  .            .                  .           .           }                    // problem 2: memory leak -- x not freed
      .            .                  .            .                  .           .
      .            .                  .            .                  .           .           int main(int argc, char **argv)
      .            .                  .            .                  .           .           {
      .            .                  .            .                  .           .              int nproc, me;
 95,957 (48.97%) 601 (99.83%) 1,811,132 (93.70%) 757 (95.94%)    42,727 ( 2.46%) 93 (49.73%)     MPI_Init(&argc, &argv);
      .            .                  .            .                  .           .              MPI_Comm_size(MPI_COMM_WORLD, &nproc);
      .            .                  .            .                  .           .              MPI_Comm_rank(MPI_COMM_WORLD, &me);
100,000 (51.03%)   1 ( 0.17%)   100,000 ( 5.17%)   1 ( 0.13%)         0           0              f();
      0            0                  0            0          1,676,628 (96.52%) 70 (37.43%)     MPI_Finalize();
      .            .                  .            .                  .           .              return 0;
      .            .                  .            .                  .           .           }

Similary, to get a profiling result for memory leaks, you can do use the --xtree-leak=yes and --xtree-leak-file=... flags:

srun -n 8 valgrind --xtree-leak=yes --xtree-leak-file=xtleak.%q{SLURM_JOB_ID}.%{SLURM_PROCID}.kcg ./memoryleak_mpi

Training & Tutorials