MAP¶
MAP, part of the Linaro Forge (previously known as Arm Forge or Allinea Forge) tool suite, is a source-level parallel profiler with a simple graphical user interface.
Note that the performance of the X Windows-based MAP Graphical User Interface can be greatly improved if used in conjunction with the free NoMachine (NX) software.
Introduction¶
MAP is a parallel profiler with simple Graphical User Interface. MAP can be run to profile serial, OpenMP, CUDA and MPI codes (up to 2048 tasks).
The Forge User Guide available from the official web page or $ALLINEA_TOOLS_DOCDIR/userguide-forge.pdf
is a good resource for learning more about some of the advanced MAP features. The variable ALLINEA_TOOLS_DOCDIR
is defined by the forge
module.
Loading the Forge Module¶
To use MAP, first load the forge
module to set the correct environment settings:
module load forge
Compiling Code to Run with MAP¶
Dynamic linking is the default mode of linking on Perlmutter. To build a dynamically-linked executable, you don't have to explicitly build MAP libraries. Generally speaking, build your executable as you would normally do, but with the -g
compile flag to keep debugging symbols, together with optimization flags that you would normally use:
ftn -c -g -O3 ... testMAP.f
ftn -o testMAP_ex testMAP.o
The recommended set of compilation flags are:
- CPU code (or host code)
PrgEnv-gnu
:-g1 -O3 -fno-inline -fno-optimize-sibling-calls
PrgEnv-nvidia
:-g -O3 -Meh_frame -Mnoautoinline
PrgEnv-cray
- C/C++:
-g1 -O3 -fno-inline -fno-optimize-sibling-calls
- Fortran:
-G2 -O3 -h ipa0
- C/C++:
nvcc
for CUDA kernels:-g -lineinfo -O3
Do not generate debug information for device code using the -G
or -device-debug
flag as it can significantly slow down the code. Use -lineinfo
instead.
For more info, please check the user guide.
Static linking is not supported on Perlmutter.
Starting a Job with MAP¶
Running an X window GUI application can be painfully slow when it is launched from a remote system over internet. NERSC recommends to use the free NX software because the performance of the X Window-based DDT GUI can be greatly improved. Another way to cope with the problem is to use Forge remote client, which will be discussed in the next section.
You can also start Be sure to log in with an X window forwarding enabled. This could mean using the -X
or -Y
option to ssh. The -Y
option often works better for macOS.
ssh -Y username@perlmutter.nersc.gov
After loading the forge
module and compiling with the -g
option, request an interactive session:
salloc -A <project> -C cpu -N <numNodes> -q interactive -t 30:00 # Perlmutter CPU
Load the forge
module if you haven't loaded it yet:
module load forge
Then launch the profiler with either
map ./testMAP_ex
where ./testMAP_ex
is the name of your program to profile.
The Forge GUI will pop up, showing a start up menu for you to select what to do. For profiling choose the option 'PROFILE' with the MAP tool. You can also choose to 'LOAD PROFILE DATA FILE' to view profiling results saved in a file created in a previous MAP run.
Then a submission window will appear with a prefilled path to the executable to debug. Select the number of processors on which to run and press run. To pass command line arguments to a program enter them in the 'srun arguments' box.
MAP will start your program and collect performance data from all processes.
By default, MAP lets your program run to completion and will display data for the entire run. You can also use the 'Stop and Analyze' button and the menu beneath it to control how long to profile your program.
Reverse Connect Using Remote Client¶
If you want to use the NoMachine (NX) tool instead of the remote client, you can skip this section.
Forge remote clients are provided for Windows, macOS and Linux that can run on your local desktop to connect via SSH to NERSC systems to debug, profile, edit and compile files directly on the remote NERSC machine. You can download the clients from Forge download page and install on your laptop/desktop.
Please note that the client version must be the same as the Forge version that you're going to use on the NERSC machines.
Instructions for configuring the client are provided in the DDT web page. If you have done configuration for using DDT on a NERSC machine, the same configuration will be used for running MAP.
To start a MAP session after the configuration step, select the configuration for the machine that you want to use from the 'Remote Launch' menu.
You'll be prompted to authenticate with password plus MFA (Multi-Factor Authentication) OTP (One-time password):
If you have set up ssh to use the ssh keys generated by sshproxy as shown in MFA page's 'Ssh Configuration File Options' section and the keys have not expired, the remote client will connect to the desired machine without you entering password and OTP.
You can use the Reverse Connection method with the remote client. To do this, put aside the remote client window that you have been working with, and log in to the corresponding machine from a window on your local machine, as you would normally do.
ssh perlmutter.nersc.gov # Perlmutter
Then, start an interactive batch session there. For example,
salloc -N 2 -G 8 -t 30:00 -q debug -C gpu -A ... # Perlmutter GPU
and run MAP with with the option --connect
as follows:
module load forge
map --connect srun -n 32 -c 8 --cpu-bind=cores ./jacobi_mpi
The remote client will ask you whether to accept a Reverse Connect request. Click 'Accept'.
The usual Run window, as shown near the top of this webpage, will appear where you can change or set run configurations and debugging options. Click 'Run'.
Now, your program will start under MAP and profiling results are displayed in the remote client.
Profiling Results¶
After completing the run, MAP displays the collected performance data using GUI.
For info on how to interpret the results, please see the Forge User Guide.
MAP saves profiling results in a file, executablename_#p_yyyy-mm-dd_HH-MM.map
where #
is for the process count and yyyy-mm-dd_HH-MM
is the time stamp.
$ ls -l
-rw------- 1 elvis elvis 621583 Mar 16 21:31 jacobi_mpi_32p_1n_2023-03-16_21-30.map
CUDA Code Profiling¶
To enable CUDA analysis mode, click the checkboxes for 'Kernel analysis (CUDA only)' and 'Memory transfers (CUDA only)' under the 'GPU' menu of the Run window.
MAP will display data for lines inside CUDA kernels and memory transfers. CPU time spent waiting for CUDA kernels to complete is shown in purple. For the performance metrics, you can select 'Preset: Nvidia' which will show the 'GPU utilization' and 'GPU memory usage' time-series data.
Note that MAP uses the timings from the perspective of the host. So the time spent in a non-blocking kernel is attributed to the next synchronous API call (e.g., cudaMemcpy
), not to the kernel itself. This is also seen when the 'Functions' tab is clicked:
To see the actual kernel runtime, click the 'GPU Kernels' tab:
Running in Command Line Mode¶
MAP can be run from the command line without GUI, by using the -profile
option. You can submit a batch job as follows:
$ cat runit
#!/bin/bash
#SBATCH -A <project>
#SBATCH -C cpu
#SBATCH -N 1
#SBATCH -q debug
#SBATCH -t 10:00
module load forge
map --profile --np=32 ./jacobi_mpi
$ sbatch runit
Submitted batch job 6130079
$ cat slurm-6130079.out
Linaro Forge 23.0 - Linaro MAP
Profiling : /pscratch/sd/e/elvis/jacobi_mpi
Allinea sampler : preload
MPI implementation : Auto-Detect (SLURM (MPMD))
* number of processes : 32
* number of nodes : 1
* Allinea MPI wrapper : preload (JIT compiled)
MAP analysing program...
MAP gathering samples...
MAP generated /pscratch/sd/e/elvis/jacobi_mpi_32p_1n_2023-03-16_21-36.map
1 85.3816681
...
10 16.8724918
...
$ ls -l
...
-rw------- 1 elvis elvis 654668 Mar 16 21:37 jacobi_mpi_32p_1n_2023-03-16_21-36.map
Troubleshooting¶
If you are having trouble launching MAP, try these steps.
Make sure you have the most recent version of the system.config
configuration file. The first time you run DDT, you pick up a master template which then gets stored locally in your home directory in ~/.allinea/${NERSC_HOST}/system.config
where ${NERSC_HOST}
is the machine name. If you are having problems launching DDT you could be using an older verion of the system.config
file and you may want to remove the entire directory:
rm -rf ~/.allinea/${NERSC_HOST}
Remove any stale processes that may have been left by DDT.
rm -rf $TMPDIR/allinea-$USER
In case of a font problem where every character is displayed as a square, please delete the .fontconfig
directory in your home directory and restart ddt.
rm -rf ~/.fontconfig
Make sure you are requesting an interactive batch session. NERSC has configured Forge to run from the interactive batch jobs.
salloc -q interactive -N <numNodes> -A <project> ...
Finally make sure you have compiled your code with -g
. If none of these tips help, please contact the consultants via https://help.nersc.gov.