hsi
¶
hsi
is a flexible and powerful command-line utility to access the NERSC HPSS storage system. You can use it to store and retrieve files, and it has a large set of commands for listing your files and directories, creating directories, changing file permissions, etc. The command set has a UNIX look and feel, so that moving through your HPSS directory tree is close to what you would find on a UNIX file system. hsi
can be used either interactively or in batch scripts. hsi
doesn't offer compression options, but the HPSS tape system uses hardware compression, which is as effective as software compression.
The hsi
utility is available on all NERSC production computer systems, and it has been configured on these systems to use high-bandwidth parallel transfers.
hsi
Commands¶
All of the NERSC computational systems available to users have the hsi
client already installed. To access the Archive (HPSS) storage system, you can type hsi
with no arguments. This will put you in an interactive command shell, placing you in your home directory on the Archive system. From this shell, you can run the ls
command to see your files, cd
into storage system subdirectories, put
files into the storage system and get
files from it.
Most of the standard Linux commands work in hsi
(cd
, ls
,rm
,chmod
, mkdir
, rmdir
, etc). There are a few commands that are unique to hsi
:
Command | Function |
---|---|
put | Archive one or more local files into HPSS, overwriting the destination file, if it exists |
get | Download one or more HPSS files to local storage, overwriting the destination file, if it exists |
cput | Conditional put - archive a file if it does not already exist on HPSS or the local file is newer than an existing HPSS file |
cget | Conditional get - get the file only if a local copy does not already exist or the HPSS file is newer than an existing local file |
mget /mput | Interactive get/put - prompts for user confirmation before copying each file |
hsi
also has a series of "local" commands, that act on the non-HPSS side of things:
Command | Function |
---|---|
lcd | Change local directory |
lls | List local directory |
lmkdir | Make a local directory |
lpwd | Print current local directory |
!<command> | Issue shell command |
hsi
Syntax¶
The hsi
utility uses a special syntax to specify local and HPSS file names when using the put and get commands. :
(a colon character with spaces on the sides) is used to separate the local and HPSS paths; the local file name is always on the left of the colon mark, while the HPSS file name is always on the right.
You don't need to provide the separator at all if you want the destination file to use the same name as the source file. You can also combine this with a cd
command, e.g. hsi "cd my_hpss_dir/; put my_local_file; get my_hpss_file"
Here are some usage examples:
- Show the content of your HPSS home directory (not to be confused with /global/homes):
hsi ls
- Show the content of a specific directory:
hsi -q ls /path/to/hpss/dir/ 2>&1
(-q
will suppress the extra output that appears at each connection to HPSS, and2>&1
will redirect stderr to stdout) - Create a remote directory in your home:
hsi mkdir new_dir_123
- Store a single file from your local home into your HPSS home:
hsi put my_local_file : my_hpss_file
- Store a single file into HPSS without renaming:
hsi put my_local_file
- Store a directory tree, creating sub-dirs when needed:
hsi put -R my_local_dir/
- Fetch a single file from HPSS, from a specific directory:
hsi get /path/to/my_local_file : /path/to/my_hpss_file
- Fetch a single file from HPSS into the local directory without renaming:
hsi get /path/to/my_hpss_file
- Delete a file from HPSS:
hsi rm /path/to/my_hpss_file
; usehsi rm -i
if you want to confirm the deletion of each file - Recursively remove a directory and all of its contained subdirectories and files:
hsi rm -R /path/to/my_hpss_dir/
- Delete an empty directory:
hsi rmdir /path/to/my_hpss_dir/
Make sure to escape bash expansions, e.g. place quotes around *
to avoid bash from replacing the symbol with the files in your local directory, e.g. hsi rm -i "*"
or hsi rm -i \*
.
In addition to at the command line, you can run hsi
commands several different ways:
- Single-line execution, e.g. to create a new dir and copy a file into it:
hsi "mkdir my_hpss_dir; cd my_hpss_dir; put bigdata.123"
- Read commands from a file:
hsi "in command_file"
- Read commands from standard input:
hsi < command_file
- Read commands from a pipe:
cat command_file | hsi
hsi
Storage Verification¶
HPSS provides a built-in checksum mechanism to verify data integrity while archiving to HPSS, but you can also calculate checksums for files already stored in HPSS. All checksums are stored separately from the files. The checksum algorithm used is MD5.
Performance degradation
Checksum generation is very CPU-intensive and can significantly impact file transfer performance. As much as 80% degradation in transfer rates has been observed during testing of this feature. Checksum verification also takes time, which is proportional to the size of the file to be hashed.
Some examples:
- To calculate and store the checksum of a file during a transfer to HPSS, use
hsi put -c on local.file : hpss.file
(specifying a destination file is optional, see the examples section above); - To show the stored hash of a file, use
hsi hashlist
(-R
to recurse in directories); - You can calculate the hash of a file already stored in HPSS with
hsi hashcreate hpss.file
. To calculate hashes of all files in an HPSS directory recursively, usehsi hashcreate -R
- Similarly, you can verify that a file on HPSS still matches its hash using
hsi hashverify
(-R
for directories);
The easiest way to verify the integrity of a file in HPSS is to create the checksum during the transfer, which can be then be used to verify the data on tape matches with the content on disk. Therefore the recommended approach is to use hsi put -c on
to store data, and hsi hashverify
before deleting the source files from the local storage.
Sort Your Files for Large Hash Calculations
If you are calculating hashes for a large number of files (>10s of files) already on HPSS, please make sure to sort the files in tape order, to avoid unnecessary mount and unmount of tape drives and reduce the time it takes. For this task you can use our file sorting script.
Removing Older Files¶
You can find and remove older files in HPSS using the hsi find
command. This may be useful if you're doing periodic backups of directories (not recommended for software version control; use a versioning system like git instead) and want to delete older backups. Since you can't use a linux pipe (|
) in hsi
, you need a multi-step process.
The example below will find files older than 10 days and delete them from HPSS:
hsi -q "find . -ctime 10" > temp.txt 2>&1
cat temp.txt | awk '{print "rm -R",$0}' > temp1.txt
hsi in temp1.txt