Darshan I/O profiler¶
Darshan is an open-source lightweight I/O profiler developed by ANL, which collects I/O statistics of several widely-used HPC I/O frameworks such as MPI-IO, HDF5, PNetCDF, and standard POSIX calls. We use Darshan at NERSC to examine file system utilization and provide advices to improve performance of users' applications.
Darshan is automatically loaded as a module on Cori for all users, and is included at link time into users' applications via the Cray compiler wrappers (cc
, CC
, ftn
) (see the related page in the docs for more details on compilers on Cori). Darshan is not loaded by default on Perlmutter. If you wish to use darshan on Perlmutter you can do module load darshan
before compiling.
Darshan is started automatically when a MPI session is initiated, and will create a log file in a defined log directory, which is then used by NERSC staff to extract usage metrics of the different file systems. Read on to learn how to enable darshan for non-MPI applications, or how you can use darshan log file to study the I/O behavior of your application.
To check whether your dynamically linked application has been compiled to instrument data with darshan at runtime, use ldd
and look for darshan among the results:
$ ldd your-application |grep darshan
libdarshan.so => /path/to/darshan/x.y.z/lib/libdarshan.so
For statically built applications you can list the symbols contained in your executable with nm
.
Tip
The default darshan/3.4.0
module that is loaded automatically in the users' environment only instruments POSIX and MPI-IO calls, but we also provide darshan/3.4.0-hdf5
, which can be used to instrument applications using HDF5, and can be swapped in with:
module swap darshan/3.4.0-hdf5
Opting out of darshan¶
Should darshan cause you any issue, you can disable it by unloading the darshan
module and rebuilding your application. We believe darshan to be stable for most applications at NERSC, but we invite users to contact us if they experience any problems, via the online help desk.
Injecting darshan into your application¶
If you're not using the Cray compiler wrappers or want to compile a statically-linked non-MPI application refer to the official Darshan documentation for instructions on how to generate Darshan-capable compiler wrappers.
For all other cases, using cc
, CC
or ftn
should work out of the box.
Enabling darshan at runtime¶
Darshan is automatically injected into users' applications at compile time, but it can also be enabled at runtime on dynamically linked executables: these are applications built before darshan went into production, or applications built without the Cray compiler wrappers (e.g. Nvidia compilers for external Cori architectures), or interpreted languages applications (e.g. Python). This may also be useful for applications not built on Cori, like executables on CVMFS or other pre-compiled binaries.
You can enable darshan by setting the LD_PRELOAD
variable for your application, for example:
LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so" your-application-here
Do not export LD_PRELOAD
globally
export
-ing LD_PRELOAD
in your session will instrument any application you execute, which may impact your workflow and also the filesystem where the darshan logs are stored.
To instrument a code you execute through srun, export the LD_PRELOAD
variable only to the application being launched by srun, to avoid instrumenting srun internal calls:
srun --export=ALL,LD_PRELOAD=$DARSHAN_BASE_DIR/lib/libdarshan.so your-application-here
Warning
The ALL
token in srun --export=ALL,LD_PRELOAD=...
is required to instruct SLURM to add LD_PRELOAD
to the existing environment variables; not specifying ALL will make your application ignore the current environment variables and may cause your application to crash because some required environment variables are missing. See man srun
for more information and details.
Warning
Darshan doesn't interact correctly with multiple Python processes spawned via multiprocessing, due to how the Python internals operate to clone processes.
Related bug tracker.
Instrumenting non-MPI code¶
Darshan can also be used to instrument non-MPI code. To enable this feature, set the environment variable DARSHAN_ENABLE_NONMPI
to any value, e.g.:
DARSHAN_ENABLE_NONMPI=1 LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so" your-application-here
Producing reports¶
The darshan modules save the data they collect to a shared dir, divided by date, username, application name, etc. according to the following "mask" on Cori:
/global/cscratch1/sd/darshanlogs/${YEAR}/${MONTH}/${DAY}/${USER}_${APPLICATION-NAME}_${JOB-ID}_${TIME}.darshan
And on Perlmutter:
/pscratch/darshanlogs/${YEAR}/${MONTH}/${DAY}/${USER}_${APPLICATION-NAME}_${JOB-ID}_${TIME}.darshan
This means you can find the logs of your applications by searching for the day your application was running and filtering on your NERSC username.
Darshan log files can be processed to produce a plain text or PDF report containing relevant insights of your application.
For example, given $LOGFILE
an environment variable storing some compressed darshan log data, you can parse it with darshan-parser
, a command available in the darshan
module loaded by default:
darshan-parser $LOGFILE
The output can be quite long if the application has accessed several files during a long run: redirect the output to a file (e.g. > $PARSED_LOGFILE
) or pipe it to other commands for better reading (e.g. | less
).
Excessive computing on login nodes harms other users
Please submit a job or use the interactive queue if you plan to parse several logfiles, because it may impact other users' experience and workflows on login nodes.
To produce a PDF report you need to first load the texlive
module, then use darshan-job-summary.pl
, like the following:
module load texlive
darshan-job-summary.pl $LOGFILE
You can control where to store the output file name with --output /path/to/output.pdf
, otherwise the output file will default to a file named like the input darshan log file and the suffix .pdf
, saved in the current directory.
Here's an example of a report produced by darshan when executing an MPI application: you can extract many details on how your application accesses and uses the file system, and you can appreciate some plots.
Difference between darshan-parser
text output and PDF report
The PDF report does not contain everything that can be extracted with the darshan-parser
tool, but new darshan releases may improve the PDF report produced, see e.g. this thread.
Build options¶
To build darshan 3.4.0 on Cori and Perlmutter, these scripts were used.
In particular, the PrgEnv-gnu
and craype-haswell
modules are used because the gnu compiler produces a more "compact" darshan library with less dependencies, which can be used to instrument applications built against many combinations of compilers and MPI frameworks.
The MPI framework used to build darshan is the Cray-optimized MPICH, automatically provided by the cc
compiler wrapper: all users' applications built against MPICH or MVAPICH should work fine. Users building their applications against Open MPI or derivatives (Intel MPI, Spectrum MPI, etc) may need to disable darshan, or build their own version.
Darshan is also able to instrument PnetCDF I/O calls; this mode can be enabled by adding --enable-pnetcdf-mod=${PNETCDF_DIR}
at the configure, after you load one of the cray-parallel-netcdf
pnetcdf modules available on Cori.
HDF5-aware darshan build¶
The default cray-hdf5-parallel
version 1.12.1.1
was used to build the HDF5-aware darshan on both Cori and Perlmutter: HDF5 1.10 introduced some ABI changes that are not compatible with HDF5 1.8 or lower and cause darshan to break applications; only HDF5 1.10 or higher are currently available on Cori, so if you only use NERSC-provided HDF5 modules you should not experience issues.
If your application was built against HDF5 1.8 or lower and you cannot rebuild it against a newer HDF5 release and you want to instrument your code with darshan, you need to build your own darshan against the HDF5 release you're using: feel free to use the scripts above or contact us for support.
A caveat of building an application with the HDF5-capable darshan is that the HDF5 library dependency will be always included in the list of libraries that the application will look up, also when you don't have any HDF5 code; this means that the library will always be "loaded" by the Operative System at execution, but apart from a minor slowdown in order to retrieve the library, your application should work normally.
Known issues¶
-
If you build an application with gcc and HDF5, and you load the darshan built with HDF5, the linker may complain with the following message:
/usr/bin/ld: warning: libhdf5_parallel_gnu_82.so.103, needed by $DARSHAN_BASE_DIR/lib/libdarshan.so, may conflict with libhdf5_parallel_gnu_82.so.200
This is caused by a different hdf5 used, between the loaded
cray-hdf5-parallel
module and the one used to build darshan: this warning does not have an impact on your application and can be ignored, since the ABI used by darshan to instrument I/O calls should be the same for all the cray-hdf5 modules available at Cori. -
While the hdf5-aware darshan module usually works fine for compiled applications, it may produce some incompatibility warnings when used with interpreted programs, for example with Python environments using conda to provide an external HDF5 build. This is probably caused by
h5py
trying to load the HDF5 dependencies it was built upon directly, instead of using those provided in theLD_PRELOAD
variable. This causes the following warning message to appear:$ module load python $ conda create -y -n testdarshan/ python=3.8 h5py hdf5=1.10.6 $ python -c 'import h5py; print(h5py.version.hdf5_version) 1.10.6 $ module swap darshan/3.4.0-hdf5 $ LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so" python -c 'import h5py; print(h5py.version.hdf5_version) testdarshan/lib/python3.8/site-packages/h5py/__init__.py:37: UserWarning: h5py is running against HDF5 1.10.5 when it was built against 1.10.6, this may cause problems Warning! ***HDF5 library version mismatched error*** The HDF5 header files used to compile this application do not match the version used by the HDF5 library to which this application is linked. Data corruption or segmentation faults may occur if the application continues. This can happen when an application was compiled by one version of HDF5 but linked with a different version of static or shared HDF5 library. You should recompile the application or check your shared library related settings such as 'LD_LIBRARY_PATH'. You can, at your own risk, disable this warning by setting the environment variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'. Setting it to 2 or higher will suppress the warning messages totally. Headers are 1.10.6, library is 1.10.5 $ HDF5_DISABLE_VERSION_CHECK=2 LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so" python -c 'import h5py; print(h5py.version.hdf5_version) 1.10.5
Setting the variable
HDF5_DISABLE_VERSION_CHECK
to 1 or higher will drop the warning, but this seems to cause h5py to use the HDF5 library used to compile darshan with, instead of the HDF5 library installed with conda.Please refer to the section Build options above to build your own darshan release on top of the HDF5 library you installed with conda. In particular when you configure darshan you need to specify the HDF5 path during the configure step, which is the prefix you use when you installed the conda environment; in the example above it would be:
--enable-hdf5-mod="/path/to/conda/env/testdarshan/"
You can then use your own darshan to instrument your python code.
-
When instrumenting interpreted languages (e.g. Python), you may get errors like
undefined symbol: H5get_libversion
. Explicitly adding the HDF5 library inLD_PRELOAD
after the darshan library fixes this error, for example:LD_PRELOAD="$DARSHAN_BASE_DIR/lib/libdarshan.so:/path/to/your/libhdf5.so" your-application-here
And similarly for variables exported to srun.
-
The HDF5-aware darshan library provided was built with the MPICH provided by the Cray compiler wrapper, and may cause some applications to break with the following message:
Attempting to use an MPI routine before initializing MPICH
If you're interested in tracing your non-MPI application, consider building your own version of darshan as shown in the section above, adding
--without-mpi
, and set the environment variable to enable darshan for non-MPI applications.If you're not interested in darshan, you can opt-out from it and just rebuild your application.
-
Darshan aggregates the data collected during a MPI run only when
MPI_Finalize()
is called inside the application; this means that applications lacking the Finalize call won't have their data collected, and similarly this will happen for applications that crashed during execution. A fix for this issue is currently being developed. -
Applications built with darshan usually are less portable than those built without, because the library loader will try to load
libdarshan.so
at every execution. You can opt-out of darshan to make your application more portable.