Skip to content

Lustre File Striping

Perlmutter uses Lustre as its $SCRATCH file system. For many applications, a technique called file striping will increase I/O performance. File striping will primarily improve performance for codes doing serial I/O from a single node or parallel I/O from multiple nodes writing to a single shared file as with MPI-I/O, parallel HDF5 or parallel NetCDF.

The Lustre file system is made up of an underlying set of I/O servers and disks called Object Storage Targets (OSTs). A file is said to be striped when its data is on multiple OSTs. Read and write operations on striped files will access multiple OST's concurrently. File striping is a way to increase I/O performance since writing or reading from multiple OST's simultaneously increases the available I/O bandwidth. Selecting the best striping can be complicated since striping a file over too few OSTs will not take advantage of the system's available bandwidth, but striping over too many will cause unnecessary overhead and lead to a loss in performance. The default striping is set to 1 on Perlmutter's $SCRATCH. This means that each file is written to 1 OST on Perlmutter by default.

Custom Lustre Striping

Each file and directory can have a separate striping pattern. To set striping for a file or directory, use the command lfs setstripe. Striping must be set on a file before is written. The lfs setstripe syntax is:

$ lfs setstripe \
    --size [stripe-size] \
    --index [OST-start-index] \
    --count [stripe-count] \
        filename
Option Description Default
stripe-size Number of bytes write on one OST before cycling to the next. Use multiples of 1MB. Default has performed well for a wide variety of codes. 1MB
stripe-count Number of OSTs a file exists on. 1 on Perlmutter
OST-start-index Starting OST (default highly recommended). -1 (System follows a round-robin procedure to optimize the creation of files by all users.)

Warning

Avoid using a stripe count larger than (128 OSTs). This can reduce I/O performance for some scenarios and can negatively impact other users of the Lustre file system.

NERSC File Striping Recommendations

  • Shared file I/O: Either one processor does all the I/O for a simulation in serial or multiple processors write to a single shared file e.g. MPI-IO and parallel HDF5 or NetCDF.
  • File per process: Each process writes to its own file resulting in as many files as the number of processes.
  • write/read-intensive: The code spends a significant portion of its time writing / reading data
Workload Nodes Single Shared-File File per Process
write-intensive <= 16 keep default striping keep default striping
write-intensive > 16 set stripe count equal to number of compute nodes keep default striping
read-intensive any set stripe count equal to number of compute nodes set stripe count equal to number of compute nodes

Warning

Do not use a stripe count larger than (128 OSTs). This will result in poor performance and can adversely affect the entire file system.

For example, with 16 compute nodes, one could create an empty file and set its striping appropriately with the command:

lfs setstripe -c 16 output_file

This has to be done before running a job that will populate the file. The striping of a file cannot be changed once the file has been written to, it must be rewritten (usally using cp) to pick up any new striping settigns.

Files inherit the striping configuration of the directory in which they are created. The desired striping must be set on the directory before creating the files and later changes of the directory striping are not inherited. When copying an existing striped file into a striped directory, the new copy will inherit the directory's striping configuration. This provides another approach to changing the striping of an existing file.

Inheritance of striping provides a convenient way to set the striping on multiple output files at once. For example, if a job will produce multiple output files in a known output directory, the striping of the latter can be configured before job submission:

mkdir output_directory
lfs setstripe -c 16 output_directory

Restriping an Existing Directory or File

A directory's striping setting can be set or changed by issuing the lfs setstripe command However, as noted above, this will not apply to files that already exist in that directory. The files need to be copied elsewhere and then copied back to the directory in order to inherit the new settings.

To restripe an existing file you can make a copy of it:

lfs setstripe -c 32 tmp_my_big_file
cp my_big_file tmp_my_big_file
mv tmp_my_big_file my_big_file

If there are multiple files, you could create a directory with the desired striping and copy the files into it, to avoid repeating the above procedure for each file.

The alternative is to use lfs_migrate, and let Lustre manage the migration:

lfs_migrate -c $STRIPE_COUNT -S 1M my_big_file

Where $STRIPE_COUNT is a sensible amount of OSTs according to the table above.

Check Striping of Files and Directories

To obtain the number of OSTs a file or directory is striped on, you can use lfs getstripe, which works similarly to ls:

$ mkdir $SCRATCH/test-dir
$ lfs setstripe -c 24 -S 1M $SCRATCH/test-dir
$ echo > $SCRATCH/test-dir/test-file.txt

$ lfs getstripe $SCRATCH/test-dir

/pcratch/sd/a/adele/test-dir
stripe_count:  24 stripe_size:   1048576 pattern:       raid0 stripe_offset: -1

/pscratch/sd/a/adele/test-dir/test-file.txt
lmm_stripe_count:  24
lmm_stripe_size:   1048576
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 138
    obdidx           objid          objid            group
       138        15345604       0xea27c4     0x19c0000403
       ...             ...            ...              ...

If you only want to see the details of the directory itself and not its content, use lfs getstripe -d $directory.