Skip to content

Performance Tools

NERSC provides many popular profiling tools. Some of them are general-purpose tools and others are geared toward more specific tasks.

A quick guideline for performance analysis tools is as follows:

  • Codee: Codee is an "all-in-one" tool to enhance correctness, modernization, portability and optimization in Fortran and C/C++ software applications. It features a static analyzer that generates detailed reports on the source code, without executing the code. Codee identifies quality issues that make code optimization easier and provides actionable recommendations, named Autofixes, with OpenMP and OpenACC directives for both CPU and GPU parallelization.
  • CrayPat (also called Perftools): CrayPat is a suite of HPE/Cray profiling tools for a detailed analysis which can show routine-based hardware counter data, MPI message statistics, I/O statistics, etc; in addition to getting performance data deduced from a sampling method, tracing of certain routines (or library routines) can be performed for better understanding of performance statistics associated with the selected routines.
  • Darshan: Darshan is a light weight I/O profiling tool capable of profiling POSIX I/O, MPI I/O and HDF5 I/O.
  • Darshan DXT: Darshan Extended Tracing collects I/O trace from POSIX I/O and MPI I/O.
  • Drishti: Drishti analyzes Darshan log files and recommends I/O optimazions.
  • MAP: MAP is a parallel GUI sampling tool for performance metrics; time series of the collected data for the entire run of the code is displayed graphically, and the source code lines are annotated with performance metrics.
  • Performance Reports: Performance Reports is a low-overhead tool that produces one-page text and HTML reports summarizing and characterizing both scalar and MPI application performance.
  • Reveal: Utilizing the HPE Cray CCE program library for source code analysis and performance data collected from CrayPat, Reveal helps to identify top time-consuming loops and provides compiler directive suggestions for inserting OpenMP parallelism.
  • Roofline Performance Model: The Roofline performance model offers an intuitive and insightful way to compare application performance against machine capabilities, track progress towards optimality, and identify bottlenecks, inefficiencies, and limitations in software implementations and architecture designs.
  • NVIDIA profiling tools: Nvidia performance analysis tools are available on Perlmutter. These offer the capacity to get an overview of your applications performance (using Nsight systems) or take a more detailed look at an individual kernel (using Nsight compute). They can be used to profile both CPU and GPU applications.