Skip to content

htar

htar is a command-line utility that is ideal for storing groups of files in the HPSS archive. It generally works like regular tar, except the tar file is created directly in HPSS. This means you don't need to use local space to temporarily store the tar file when writing or reading data. htar preserves the directory structure of stored files. htar doesn't have options for compression, but the HPSS tape system uses hardware compression, which is as effective as software compression. htar creates an index file that (by default) is stored along with the archive in HPSS. This allows you to list the contents of an archive without retrieving it from tape first. The index file is only created if the htar bundle is successfully stored in the archive.

htar is installed and maintained on all NERSC production systems. If you need to access the member files of an htar archive from a system that does not have the htar utility installed, you can retrieve the tar file to a local file system with Globus and extract the member files using the local tar utility.

If you have a collection of files and store them individually with hsi, the files will likely be distributed across several tapes, requiring long delays (due to multiple tape mounts) when fetching them from HPSS. Instead, group these files in an htar archive file, which will likely be stored on a single tape, requiring only a single tape mount when it comes time to retrieve the data.

The basic syntax of htar is similar to the standard tar utility:

htar -{c|K|t|x|X} -f tarfile [directories] [files]

As with the standard unix tar utility, the -c, -x, and -t options create, extract, and list tar archive files. The -K option verifies an existing tarfile in HPSS, and the -X option can be used to re-create the index file for an existing archive.

You cannot add or append files to an existing htar file.

htar Usage Examples

Create an archive with directory nova and file simulator

nersc$ htar -cvf nova.tar nova simulator
HTAR: a   nova/
HTAR: a   nova/sn1987a
HTAR: a   nova/sn1993j
HTAR: a   nova/sn2005e
HTAR: a   simulator
HTAR: a   /scratch/scratchdirs/elvis/HTAR_CF_CHK_61406_1285375012
HTAR Create complete for nova.tar. 28,396,544 bytes written for 4 member files, max threads: 4 Transfer time: 0.420 seconds (67.534 MB/s)
HTAR: HTAR SUCCESSFUL

Now list the contents:

nersc$ htar -tf nova.tar
HTAR: drwx------  elvis/elvis          0 2010-09-24 14:24  nova/
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn2005e
HTAR: -rwx------  elvis/elvis     398552 2010-09-24 17:35  simulator
HTAR: -rw-------  elvis/elvis        256 2010-09-24 17:36  /scratch/scratchdirs/elvis/HTAR_CF_CHK_61406_1285375012
HTAR: HTAR SUCCESSFUL

Here is how we extract the contents of the htar file:

htar -xvf nova.tar

Here is how we extract a single file from the htar file:

htar -xvf nova.tar simulator

If your htar files are 100 GB or larger and you only want to extract one or two small member files, you may find faster retrieval rates by skipping staging the file to the HPSS disk cache by adding the -Hnostage option to your htar command.

Here is how we can rebuild an index file if it is accidentally deleted. As an example, using hsi, remove the nova.tar.idx index file from HPSS (Note: you generally do not want to do this)

nersc$ hsi "rm nova.tar.idx"
rm: /home/e/elvis/nova.tar.idx (2010/09/24 17:36:53 3360 bytes)

Now try to list the archive contents without the index file:

nersc$ htar -tf nova.tar
ERROR: No such file: nova.tar.idx
ERROR: Fatal error opening index file: nova.tar.idx
HTAR: HTAR FAILED

Now rebuild the index file using htar -X:

nersc$ htar -Xvf nova.tar
HTAR: i nova
HTAR: i nova/sn1987a
HTAR: i nova/sn1993j
HTAR: i nova/sn2005e
HTAR: i simulator
HTAR: i /scratch/scratchdirs/elvis/HTAR_CF_CHK_61406_1285375012
HTAR: Build Index complete for nova.tar, 5 files 6 total objects, size=28,396,544 bytes
HTAR: HTAR SUCCESSFUL

nersc$ htar -tf nova.tar
HTAR: drwx------  elvis/elvis          0 2010-09-24 14:24  nova/
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn2005e
HTAR: -rwx------  elvis/elvis     398552 2010-09-24 17:35  simulator
HTAR: -rw-------  elvis/elvis    256 2010-09-24 17:36  /scratch/scratchdirs/elvis/HTAR_CF_CHK_61406_1285375012
HTAR: HTAR SUCCESSFUL

Using ListFiles to Create an htar Archive

Rather than specifying the list of files and directories on the command line when creating an htar archive, you can place the list of file and directory pathnames into a ListFile and use the -L option. The contents of the ListFile must contain exactly one pathname per line. Let's assume that we want to archive only files starting with sn19 in the directory nova:

nersc$ find nova -name 'sn19*' -print > novalist
nersc$ cat novalist
nova/sn1987a
nova/sn1993j

Now create an archive containing only these files:

nersc$ htar -cvf nova19.tar -L novalist
HTAR: a   nova/sn1987a
HTAR: a   nova/sn1993j
nersc$ htar -tf nova19.tar
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1993j

Soft Delete and Undelete

The -D option can be used to "soft delete" one or more member files or directories from an htar archive. The files are not really deleted, but simply marked in the index file as deleted. A file that is soft-deleted will not be retrieved from the archive during an extract operation.

Soft-delete the file nova/sn1993j from the archive:

nersc$ htar -Df nova.tar nova/sn1993j
HTAR: d  nova/sn1993j
HTAR: HTAR SUCCESSFUL

If you list the contents of the archive, soft deleted files will have a D character after the mode bits in the listing. Thus, if we list the files we see that sn1993j is marked as deleted:

nersc$ htar -tf nova.tar
HTAR: drwx------   elvis/elvis          0 2010-09-24 14:24  nova/
HTAR: -rwx------   elvis/elvis    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------ D elvis/elvis    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------   elvis/elvis    9331200 2010-09-24 14:24  nova/sn2005e

To undelete the file, use the -U option:

nersc$ htar -Uf nova.tar nova/sn1993j
HTAR: u  nova/sn1993j
HTAR: HTAR SUCCESSFUL

List the file and note that the 'D' is missing:

nersc$ htar -tf nova.tar nova/sn1993j
HTAR: -rwx------  elvis/elvis    9331200 2010-09-24 14:24  nova/sn1993j

htar Archive Verification

Performance degradation

Similarly to hsi, calculating checksums for htar archives reduces file transfer speed. Calculating and verifying checksums also takes time proportional to the size of the files to hash.

You can request that htar compute and save checksum values for each member file during archive creation. The checksums are saved in the corresponding htar index file. You can then further request that htar compute checksums of the files as you extract them from the archive and compare the values to what it has stored in the index file.

nersc$ htar -Hcrc -cvf nova.tar nova
HTAR: a   nova/
HTAR: a   nova/sn1987a
HTAR: a   nova/sn1993j
HTAR: a   nova/sn2005e

Now, in another directory, extract the files and request verification

nersc$ htar -Hverify=crc -xvf nova.tar
HTAR: x nova/
HTAR: x nova/sn1987a, 9331200 bytes, 18226 media blocks
HTAR: x nova/sn1993j, 9331200 bytes, 18226 media blocks

htar Limitations

htar has several limitations to be aware of:

  • Member File Path Length: File path names within an htar archive of the form prefix/name are limited to 154 characters for the prefix and 99 characters for the file name. Link names cannot exceed 99 characters.
  • Member File Size: The maximum archive file size the NERSC HPSS system will support is approximately 20 TB. However, we recommend you aim for htar archive sizes between 100 GB and 2 TB. Member files within an htar archive are limited to approximately 68GB.
  • Member File Limit: htar archives have a default soft limit of 1,000,000 (1 million) member files. Users can increase this limit to a maximum hard limit of 5,000,000 member files.

You can work around the above limitations by using tar and then hsi put to put the tar file into HPSS. If the tarballs will be very large, you can split them up by following the instructions found in the "Avoid Very Large Files" section.

  • Not available on compute nodes htar requires the node you're on to accept incoming connections from its movers. This is not possible from a compute node at NERSC, so htar transfers will fail from them. Instead we recommend you use our special xfer QOS for data transfers.