File System overview¶
Storage System Usage and Characteristics¶
File systems are configured for different purposes. Each machine has access to at least three different file systems with different levels of performance, data persistence and available capacity, and each file system is designed to be accessed and used either by a user individually or by their project, as reported in the "Access" column.
See quotas for detailed information about inode, space quotas and file system purge policies.
Files in the Community and Common File Systems are charged to the project quota, while files on the Home and Scratch File Systems are charged to the file owner's quota: your files in another user's directory will still be charged to your quota. If a shared access is needed on Home or Scratch file systems, consider using a Collaboration account.
Directories on the Common and Community File Systems (CFS) are designed to be used by all members of a project, and have the setgid bit set by default, which makes all directories and files inherit the group ID and allows other members of the same group read and write to those files. Some groups may prefer to disable this behavior, which can be done by removing the setgid bit on the desired directory.
If desired, multiple top-level directories can be created on CFS for each project. Each top-level CFS directory comes with its own quota (drawn out of the total quota allocated for the entire project) and could be set to have different group ownership. For instance, if a project
m9999 at NERSC wanted to have separate directories for their
beta groups, they could request two directories (e.g.
m9999_beta). The quotas for each top-level directory can be allocated in iris by the project's PI. Additionally, if the PI wanted to limit access to these directories to only subsets of their users, then could also adjust the owning groups (e.g.
m9999_alpha is owned by group
alpha etc.). The PI for m9999 could then add users in their projects to the appropriate groups to allow them access to each directory as desired.
Permanent, relatively small storage for data like source code, shell scripts that you want to keep. This file system is not tuned for high performance on parallel jobs: use the more optimized Common file system to store applications that need to be sourced by more than a dozen nodes at a time, or applications composed of several packages and small files such as conda environments.
Referenced by the environment variable
A performant platform to install software stacks and compile code. Mounted read-only on Cori compute nodes.
Large, permanent, medium-performance file system. Community directories are intended for sharing data within a group of researchers and for storing data that will be accessed in the meduium term (i.e. 1 - 2 years)
The PI toolbox can help PIs and PI Proxies fix permissions in their Community project directories.
Cori and Perlmutter have a dedicated, large, local, parallel scratch file system based on Lustre. The scratch file system is intended for temporary uses such as storage of checkpoints or application input and output during jobs. We have more details on Cori's scratch and Perlmutter's scratch on their respective pages.
A high capacity tape archive intended for long term storage of inactive and important data. Accessible from all systems at NERSC. Space quotas are allocation dependent.
The High Performance Storage System (HPSS) is a modern, flexible, performance-oriented mass storage system. It has been used at NERSC for archival storage since 1998. HPSS is intended for long term storage of data that is not frequently accessed.
The following file systems provide high I/O performance, but often don't preserve data across different jobs, so they are meant to be used as scratch space, and data produced must be staged out at the end of the computation.
Access is always per-user, since these file systems are either mounted on the compute nodes on-demand (Burst Buffer) or only accessible within the same SLURM job (XFS and in-ram file systems), since SLURM purges the content afterwards.
Cori's Burst Buffer provides very high performance I/O on a per-job or short-term basis. It is particularly useful for codes that are I/O-bound, for example, codes that produce large checkpoint files, or that have small or random I/O reads/writes.
Support on the Burst Buffer has been reduced, therefore its usage is discouraged: please check the current limitations.
Temporary per-node Shifter file system¶
Shifter users can access a fast, per-node xfs file system to improve I/O.
Local temporary file system¶
Compute nodes have a small amount of temporary local storage that can be used to improve I/O.
Sharing data with other users must be done carefully. Permissions should be set to the minimum necessary to achieve the desired access. For instance, consider carefully whether it's really necessary before sharing write permissions on data, often just read permissions are enough. Be sure to have archived backups of any critical shared data. It is also important to ensure that private login secrets (like SSH private keys or apache htaccess files) are not shared with other users (either intentionally or accidentally). Good practice is to keep things like this in a separate directory that is as locked down as possible (e.g. by removing group and other permissions with
chmod g-rwx,o-rwx <directory>, please see our permissions page for a full discussion).
Also take a look at the NERSC Data Management policy.
Sharing Data Inside NERSC¶
Sharing Data Within Your Project¶
The easiest way to share data within your project at NERSC is to use the Community File System (CFS). Permissions on CFS directories are set up to be group readable and writable by default, and any permissions drift can be corrected by the PIs using the PI toolbox.
PIs can also request an HPSS Project Directory to share HPSS data within their project.
Sharing Data Outside Your Project¶
If you want to share just a few files a single time, you can use NERSC's give/take utilty.
If you want to share with multiple users, you might want to consider setting the linux permissions such that they're accessible for multiple users.
If you have a large volume of data you'd like to share with several NERSC users outside your project, you may want to consider creating a dedicated top-level CFS directory that's shared between projects. Project PIs can request a new CFS directory and can also request that directory be owned by a linux group made up of users from different projects.
Sharing Data Outside of NERSC¶
Data on the Community File System can also be shared with users outside of NERSC through Globus Guest Collections.
Data can also be shared via Science Gateways.