Skip to content

VTune

Using VTune

See the Intel VTune Amplifier documentation for general usage.

VTune command name changes in version 2020

Version 2020 of VTune includes several significant upgrades in functionality. It also includes some command name changes. The command line interface to VTune has changed from amplxe-cl to simply vtune, and the GUI has changed from amplxe-gui to vtune-gui. Intel provides symbolic links such that the old commands amplxe-cl and amplxe-gui will continue to work, but those symbolic links may be removed in a future version.

VTune is available on all NERSC production systems by loading the VTune module.

module load vtune

Recommended compiler flags for VTune performance collection

Intel provides a page documenting their recommended compiler flags for compiling applications when collecting performance data with VTune. Users will generally have the best results when compiling codes using the Intel compilers, although the CCE and GCC compilers can also produce application suitable for analysis with VTune.

When collecting performance data with VTune, it is strongly recommended to add the Slurm flag --perf=vtune or --perf=<vtune_module_version> to your job allocation, where <vtune_module_version> is the full module name of the VTune version you want to use.

Warning

Certain VTune collections can function without the #SBATCH --perf=vtune flag, but many others will fail.

Defer finalization

It is generally recommended to defer finalization when running on KNL. Finalization is an inherently serial process and the individual core performance on KNL is very poor. Thus, when running VTune on KNL, add the parameter -finalization-mode=deferred

#!/bin/bash
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --perf=vtune
# ... additional sbatch parameters ...

module load vtune

vtune -finalization-mode=deferred -collect ... -r <result-dir> -- <command-to-profile>

# in some cases, it one might want to copy over the libraries need to finalize
vtune -archive -r <result-dir>

and then finalize on a login node:

vtune -finalize -result-dir <PATH>

Using VTune with Shifter

VTune can be attached to a Shifter container by executing the process in the background and then attaching VTune to the process via the PID (process identifier).

Cannot directly run collection on containers

The following will not work:

vtune -collect ... -- shifter <command-to-execute-in-container>

The recommended method is as follows:

#!/bin/bash
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --perf=vtune
#SBATCH --image=<username/some-image>
# ... additional sbatch parameters ...

module load vtune

PID_FILE=$(mktemp pid.XXXXXXX)

# the first "&" causes the command to execute in the background
# "echo $!" prints the PID
# "&> ${PID_FILE}" writes the PID to the temporary file
shifter <command-to-execute-in-container> & echo $! &> ${PID_FILE}

# read the PID from the file
TARGET_PID=$(cat ${PID_FILE})

# attach VTune to the process
vtune -collect <collection-mode> --target-pid=${TARGET_PID} ...

VTune finalization with Shifter

In the Using VTune section, it was recommended to not finalize on KNL. However, when using containers, deferring finalization creates a problem because the binaries needed for finalization exist only within the container. Due to this fact, it is recommended to not defer finalization when using containers.

VTune + Shifter Example

#!/bin/bash
#SBATCH --qos=regular
#SBATCH --constraint=knl
#SBATCH --nodes=1
#SBATCH --time=03:00:00
#SBATCH --job-name=tomopy_gridrec
#SBATCH --output=out_tomopy_%j.log
#SBATCH --image=jrmadsen/tomopy-reference:gcc
#SBATCH --perf=vtune

set -o errexit

# ensure VTune module is loaded
module load vtune

# this format of assignment only sets the variable to the specified value
# if not already set in the environment
: ${OMP_NUM_THREADS:=1}
: ${NUMEXPR_MAX_THREADS:=$(nproc)}
: ${VTUNE_COLLECTION_MODE:="advanced-hotspots"}
: ${VTUNE_SAMPLING_INTERVAL:=25}
: ${VTUNE_RESULTS_DIR:=$(mktemp -d ${PWD}/run-${VTUNE_COLLECTION_MODE}-XXXXX)}

export OMP_NUM_THREADS
export NUMEXPR_MAX_THREADS
export VTUNE_COLLECTION_MODE
export VTUNE_SAMPLING_INTERVAL
export VTUNE_RESULTS_DIR

# make sure empty, let vtune create directory
rm -rf ${VTUNE_RESULTS_DIR}

# use mktemp to ensure guard against multiple jobs in same dir
PID_FILE=$(mktemp pid.XXXXXX)

echo -e "\n### Submitting shifter job into background and storing PID in file: ${PID_FILE} ###\n"
shifter /opt/conda/bin/python ./run_tomopy.py -a gridrec -n 256 -s 512 -f jpeg -S 1 -c 8 -p shepp3d -i 5 & echo $! &> ${PID_FILE}

echo -e "\n### Reading PID file: ${PID_FILE} ###\n"
TARGET_PID=$(cat ${PID_FILE})

# echo the ps for debugging
echo -e "\n### Target PID: ${TARGET_PID} ###\n"
ps

# echo the environment for reference
echo -e "\n### Environment ###\n"
env

echo -e "\n### Attaching VTune process to PID ${TARGET_PID} ###\n"
vtune \
    -collect ${VTUNE_COLLECTION_MODE} \
    -knob collection-detail=hotspots-sampling \
    -knob event-mode=all \
    -knob analyze-openmp=true \
    -knob sampling-interval=${VTUNE_SAMPLING_INTERVAL} \
    -data-limit=0 \
    --target-pid=${TARGET_PID} \
    -r ${VTUNE_RESULTS_DIR}

echo -e "\nCompleted\n"