The HPSS Archive System¶
Introduction¶
The High Performance Storage System (HPSS) is a modern, flexible, performance-oriented mass storage system. HPSS is intended for long-term storage of data that is not frequently accessed.
Storing data in such a system requires a little more effort than storing on a local disk or a typical mounted file system. Files should be stored in appropriately sized chunks. In particular, storing many small files in HPSS is very inefficient, whereas extremely large files can be unwieldy, so users should aim for sizes between 100 GB and 2 TB. The HPSS command-line tools hsi
and htar
allow you to move files in and out and address the need for grouping. The commands tar
and split
can be helpful for breaking up large files.
Warning
Storing a large number of smaller files without bundling them is likely to cause performance issues in the system and may cause NERSC to temporarily disable the user's access to HPSS without prior notice. For a discussion of why, see the best practices page.
Retrieving files requires a little thought as well. If you're retreiving many files, you should order the retrievals so that the system can pull the data efficiently. NERSC provides scripts for retrieving files in order. Taking a little time to learn how to use these HPSS utilities will save you headaches.
By default, every user has an HPSS account. Usage charges are based on settings in Iris. See HPSS Usage Charging and Data Sharing for details.
A Beginner's Guide to HPSS¶
This section contains a few quick instructions to get you started using HPSS. We recommend you also review the best practices and read about HPSS usage charging and data sharing. For more in-depth information about HPSS commands, see the pages about hsi
and htar
.
You can access NERSC's HPSS in a variety of different ways. hsi
and htar
are command line tools that offer the best ways to transfer data in and out of HPSS within NERSC. hsi
is used to put individual files or directories into HPSS. htar
is used to put bundles of files into HPSS, similar to how the tar
utility works.
Storing and Fetching Files with hsi
¶
You can log onto HPSS by using hsi
nersc$ hsi
Typing just hsi
alone transfers you to an HPSS command shell, which looks very similar to a regular login environment. It has a directory structure you can navigate through, and most regular linux commands will work (like ls
, cd
, etc.). However, commands like ls
will only show files and directories stored in the HPSS archive, and only hsi
commands will work. It's effectively like sshing to another system called hpss
. To exit from the HPSS command shell, use exit
.
One can execute hsi
commands from any Perlmutter login node or a Data Transfer Node by either typing hsi <command>
or hsi
alone first, then the commands once you enter the HPSS command shell.
Here's a list of some common hsi
commands. The commands below are written assuming you are running from a login node (i.e., you haven't first invoked an HPSS command shell):
- Show the content of your HPSS home directory:
hsi ls
- Create a remote directory in your home:
hsi mkdir new_dir_123
- Store a single file into HPSS without renaming:
hsi put my_local_file
- Store a directory tree, creating sub-dirs when needed:
hsi put -R my_local_dir
- Fetch a single file from HPSS into the local directory without renaming:
hsi get /path/to/my_hpss_file
- Delete a file from HPSS:
hsi rm /path/to/my_hpss_file
- To recursively remove a directory and all of its contained sub-directories and files:
hsi rm -R /path/to/my_hpss_dir/
- Delete an empty directory:
hsi rmdir /path/to/my_hpss_dir/
The example below finds files that are more than 20 days old and redirects the output to the file temp.txt
:
hsi -q "find . -ctime 20" > temp.txt 2>&1
For more details on using hsi
refer to the hsi page.
Storing groups of files in HPSS with htar
¶
It's generally recommended that you group your files together into bundles whenever possible. htar
is an HPSS application that will create a bundle of files and store it directly in HPSS. The next example shows how to create a bundle with the contents of the directory nova
and the file simulator
:
nersc$ htar -cvf nova.tar nova simulator
Listing the contents of a tar file:
nersc$ htar -tf nova.tar
To extract a specific file simulator
from an htar
file nova.tar
nersc$ htar -xvf nova.tar simulator
For more details on using htar
, refer to the htar page.
Token Generation¶
The first time you try to connect from a NERSC system (Perlmutter, DTNs, etc.) using a NERSC provided client like hsi
or htar
, you will be prompted for your NERSC password + one-time password, which will generate a token stored in $HOME/.netrc. After completing this step, you will be able to connect to HPSS without typing a password.
nersc$ hsi
Generating .netrc entry...
Password + OTP:
Sometimes the .netrc file can become out of date or otherwise corrupted. This generates errors that look like this:
nersc$ hsi
result = -11000, errno = 29
Unable to authenticate user with HPSS.
result = -11000, errno = 9
Unable to setup communication to HPSS...
*** HSI: error opening logging
Error - authentication/initialization failed
If this error occurs, try moving your $HOME/.netrc
file to $HOME/.netrc_temp
. Then connect to the HPSS system again and enter your NERSC password + one-time password when prompted. A new $HOME/.netrc
file will be generated with a new token. Alternatively, you can generate the token manually. If the problem persists, contact account support.
Manual Token Generation¶
You can manually generate a token for accessing HPSS by going to Iris and selecting the blue "Storage" tab. Scroll down to the section labeled "HPSS Tokens" and you will see buttons to generate a token from within NERSC. This button will generate a token which you can paste into a file named .netrc
in your home directory. (See Iris for users for more about Iris.)
machine archive.nersc.gov
login <your NERSC user name>
password <token generated by Iris>
The .netrc
file should only have user readable permissions. If it's group or world readable, HPSS access will fail.
Session Limits¶
Users are limited to 15 concurrent sessions. This number can be temporarily reduced if a user is impacting system usability for others.
Transfers Between HPSS and Facilities Outside NERSC¶
NERSC's HPSS system can be accessed from outside the center by using Globus. For more information about using Globus, please see our Globus page. The NERSC HPSS endpoint is called "NERSC HPSS". You can use the command line or the web interface to transfer HPSS files. Unfortunately, with the web interface, there is no explicit ordering by tape of file retrievals.
Caution
If you're retrieving a large data set from HPSS with Globus, please see our Globus page for instructions on how to best retrieve files in correct tape order using the command line interface for Globus.
About HPSS Hardware and Software¶
HPSS is Hierarchical Storage Management (HSM) software developed by a collaboration of DOE labs and IBM. NERSC is a participant in that collaboration. The software has been used at NERSC for archival storage since 1998. Our HPSS system is a tape system that uses HSM software to ingest data onto a high-performance disk cache and automatically migrate it to a very large enterprise tape subsystem for long-term retention. The disk cache in HPSS is designed to retain many days' worth of new data, and the tape subsystem is designed to provide the most cost-effective long-term scalable data storage available.