HDF5¶
Hierarchical Data Format version 5 (HDF5) is a set of file formats, libraries, and tools for storing and managing large scientific datasets. Originally developed at the National Center for Supercomputing Applications, it is currently supported by the non-profit HDF Group.
HDF5 is a different product from previous versions of software named HDF, representing a complete redesign of the format and library. It also includes improved support for parallel I/O. The HDF5 file format is not compatible with HDF 4.x versions.
Documentation and presentations¶
For an introduction to HDF5 refer to the official HDF5 documentation.
Quincey Koziol from NERSC, gave a talk about HDF5 at the Argonne Training Program on Extreme-Scale Computing (ATPESC) 2019, covering the basics of the library and the format, examples of parallel HDF5 access, optimizations, future developments, etc.
The slides and code examples of Quincey's presentation at ATPESC 2020 are available for download here.
Using HDF5 at NERSC¶
Cray provides native HDF5 libraries for each of the three PrgEnv
s. The module cray-hdf5
provides a serial HDF5 I/O library:
module load cray-hdf5
ftn my_serial_hdf5_code.f90
while cray-hdf5-parallel
provides a parallel HDF5 implementation:
module load cray-hdf5-parallel
ftn my_parallel_hdf5_code.f90
After loading one of those modules, one can continue to use the Cray compiler wrappers cc
, CC
, and ftn
to compile HDF5 applications without requiring any additional flags to the compiler:
Other HDF5 tools at NERSC¶
NERSC provides additional tools which allow users to interact with HDF5 data.
H5py¶
The H5py package is a Pythonic interface to the HDF5 library.
H5py provides an easy-to-use high level interface, which allows an application to store huge amounts of numerical data, and easily manipulate that data from NumPy. H5py uses straightforward Python and NumPy metaphors, like dictionaries and NumPy arrays. For example, you can iterate over datasets in a file, or check the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started. H5py rests on an object-oriented Cython wrapping of the HDF5 C API. Almost anything you can do in HDF5 from C, you can do with h5py from Python.
For information about using H5py at NERSC, please see our parallel Python page.
A common error encountered by h5py users is documented on our Python FAQ and troubleshooting page.
Further information about HDF5¶
- HDF5 Performance Tuning: Best practices for tuning HDF5 applications' I/O performance
- ExaHDF5: Research and development for HDF5 for exascale systems
- The HDF Group: Documents and support from official HDF group