NERSC provides many popular profiling tools. Some of them are general-purpose tools and others are geared toward more specific tasks.
A quick guideline for performance analysis tools is as follows:
- Codee: Codee, previously known as Parallelware Analyzer, is a suite of command-line tools for C/C++/Fortarn code. Codee scans the source code without executing it and produces a performance optimization report actions that identify where and how to fix performance issues of the code with OpenMP/OpenACC parallelization on CPUs and GPUs.
- CrayPat (also called Perftools): CrayPat is a suite of HPE/Cray profiling tools for a detailed analysis which can show routine-based hardware counter data, MPI message statistics, I/O statistics, etc; in addition to getting performance data deduced from a sampling method, tracing of certain routines (or library routines) can be performed for better understanding of performance statistics associated with the selected routines.
- Darshan: Darshan is a light weight I/O profiling tool capable of profiling POSIX I/O, MPI I/O and HDF5 I/O.
- Darshan DXT: Darshan Extended Tracing collects I/O trace from POSIX I/O and MPI I/O.
- Drishti: Drishti analyzes Darshan log files and recommends I/O optimazions.
- MAP: MAP is a parallel GUI sampling tool for performance metrics; time series of the collected data for the entire run of the code is displayed graphically, and the source code lines are annotated with performance metrics.
- Performance Reports: Performance Reports is a low-overhead tool that produces one-page text and HTML reports summarizing and characterizing both scalar and MPI application performance.
- Reveal: Utilizing the HPE Cray CCE program library for source code analysis and performance data collected from CrayPat, Reveal helps to identify top time-consuming loops and provides compiler directive suggestions for inserting OpenMP parallelism.
- Roofline Performance Model: The Roofline performance model offers an intuitive and insightful way to compare application performance against machine capabilities, track progress towards optimality, and identify bottlenecks, inefficiencies, and limitations in software implementations and architecture designs.
- NVIDIA profiling tools: Nvidia performance analysis tools are available on Perlmutter. These offer the capacity to get an overview of your applications performance (using Nsight systems) or take a more detailed look at an individual kernel (using Nsight compute). They can be used to profile both CPU and GPU applications.