Darshan I/O Profiler¶
Darshan is an open-source lightweight I/O profiler developed by ANL, which collects I/O statistics of several widely-used HPC I/O frameworks such as MPI-IO, HDF5, PNetCDF, and standard POSIX calls. We use Darshan at NERSC to examine file system utilization and provide advices to improve performance of users' applications.
Darshan is available as a module on Perlmutter for all users, and is included at link time into users' applications via the Cray compiler wrappers (cc
, CC
, ftn
) (see the related page in the docs for more details on compilers at NERSC). If you wish to use darshan on Perlmutter you can do module load darshan
before compiling.
Tip
To make sure Darshan is dynamically linked to your application, make sure you compile your application via the Cray compiler wrappers (cc
, CC
, and ftn
).
Darshan is started automatically when a MPI session is initiated, and will create a log file in a defined log directory, which can be used to extract usage metrics of the different file systems. Read on to learn how to enable darshan for non-MPI applications, or how you can use darshan log file to study the I/O behavior of your application.
To check whether your dynamically linked application has been compiled to instrument data with darshan at runtime, use ldd
and look for darshan among the results:
$ ldd your-application | grep darshan
libdarshan.so => /path/to/darshan/x.y.z/lib/libdarshan.so
For statically built applications you can list the symbols contained in your executable with nm
.
Tip
The darshan/3.4.4
module only instruments STDIO, POSIX and MPI-IO calls.
Opting Out of Darshan¶
Should darshan cause you any issue, you can disable it by unloading Darshan with module unload darsahn
and rebuilding your application. We believe darshan to be stable for most applications at NERSC, but we invite users to contact us if they experience any problems, via the online help desk.
Injecting Darshan¶
If you're not using the Cray compiler wrappers or want to compile a statically-linked non-MPI application refer to the official Darshan documentation for instructions on how to generate Darshan-capable compiler wrappers.
For all other cases, using cc
, CC
or ftn
should work out of the box.
Enabling Darshan at Runtime¶
Darshan is automatically injected into users' applications at compile time, but it can also be enabled at runtime on dynamically linked executables: these are applications built before Darshan went into production, or applications built without the Cray compiler wrappers, or interpreted languages applications (e.g. Python). This may also be useful for applications not built on Perlmutter, like executables on CVMFS or other pre-compiled binaries.
You can enable Darshan by setting the LD_PRELOAD
variable for your application, for example:
LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so" your-application-here
Do not export LD_PRELOAD
globally
export
-ing LD_PRELOAD
in your session will instrument any application you execute, which may impact your workflow and also the filesystem where the Darshan logs are stored.
To instrument a code you execute through srun
, export the LD_PRELOAD
variable only to the application being launched by srun
, to avoid instrumenting srun internal calls:
srun --export=ALL,LD_PRELOAD=$DARSHAN_BASE_DIR/lib/libdarshan.so your-application-here
Warning
The ALL
token in srun --export=ALL,LD_PRELOAD=...
is required to instruct SLURM to add LD_PRELOAD
to the existing environment variables; not specifying ALL will make your application ignore the current environment variables and may cause your application to crash because some required environment variables are missing. See man srun
for more information and details.
Warning
Darshan doesn't interact correctly with multiple Python processes spawned via multiprocessing, due to how the Python internals operate to clone processes.
Related bug tracker.
Instrumenting non-MPI Code¶
Darshan can also be used to instrument non-MPI code. To enable this feature, set the environment variable DARSHAN_ENABLE_NONMPI
to any value, e.g.:
DARSHAN_ENABLE_NONMPI=1 LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so" your-application-here
Notice that Darshan instrumentation of non-MPI applications is only possible with dynamically-linked applications.
Producing Reports¶
The darshan modules save the data they collect to a shared dir, divided by date, username, application name, etc. according to:
${DARSHAN_LOGS}/${YEAR}/${MONTH}/${DAY}/${USER}_${APPLICATION}_${JOB}_${TIME}.darshan
This means you can find the logs of your applications by searching for the day your application was running and filtering on your NERSC username.
Darshan log files can be processed to produce a plain text or PDF report containing relevant insights of your application. For example, given $LOGFILE
an environment variable storing some compressed Darshan log data, you can parse it with darshan-parser
, a command available in the darshan
module:
darshan-parser $LOGFILE
The output can be quite long if the application has accessed several files during a long run: redirect the output to a file (e.g. > $PARSED_LOGFILE
) or pipe it to other commands for better reading (e.g. | less
).
Excessive computing on login nodes harms other users
Please submit a batch job or use the interactive QOS if you plan to parse several Darshan log files, because it may impact other users' experience and workflows on login nodes.
To produce a PDF report you need to first load the texlive
module, then use darshan-job-summary.pl
, like the following:
module load texlive
darshan-job-summary.pl $LOGFILE
You can control where to store the output file name with --output /path/to/output.pdf
, otherwise the output file will default to a file named like the input Darshan log file and the suffix .pdf
, saved in the current directory.
Here's an example of a report produced by darshan when executing an MPI application: you can extract many details on how your application accesses and uses the file system.
Known Issues¶
-
Darshan aggregates the data collected during a MPI run only when
MPI_Finalize()
is called inside the application; this means that applications lacking the finalize call won't have their data collected, and similarly this will happen for applications that crashed during execution. -
Applications built with Darshan usually are less portable than those built without, because the library loader will try to load
libdarshan.so
at every execution.You can opt-out of darshan to make your application more portable.