Environment¶
Note
This page outlines tips on managing the user shell environment and startup scripts for NERSC systems. Please see the shell startup page for a detailed explanation of shells, startup shell files, and different shell modes.
NERSC User Environment¶
NERSC Shell Support Policy¶
By default, the login shell for a user account is bash. Users also have the option to change their default shell to csh, tcsh, zsh, or ksh. Only bash is fully supported by NERSC at this time; all other shells are supported on a basic level.
For fully supported shells (bash):
- NERSC will test tools, modules, and scripts upon system updates, and fix errors with high priority.
- We reserve the right to take unscheduled outages to fix critical issues.
For basic supported shells:
- NERSC will deploy the version supplied by the underlying OS and a basic skeleton configuration.
- We do not guarantee that tools and scripts deployed by NERSC staff will work with these shells, and fixes for reported issues will be treated as lower priority.
Dotfiles and Customizing Your Environment¶
NERSC does not populate shell initialization files (also known as "dotfiles") in users' home directories. The same home file system is mounted on all NERSC resources, meaning that the same dotfile is used across all compute systems.
NERSC provides template dotfiles that can be found at https://software.nersc.gov/NERSC/dotfiles Before copying the NERSC dotfiles, we recommend creating a backup for your dotfiles. Then, copy the content of dotfiles into your ~/.bashrc
or ~/.tcshrc
.
Note
Home directories are shared between Perlmutter and DTN, which means your dotfiles must be compatible with both systems; otherwise, you will run into errors.
You can create your own dotfiles instead of using our template files. We recommend you test your changes by starting a new shell and see if configuration changes match your expectation.
No more .ext dotfiles at NERSC since February 21, 2020.
NERSC used to reserve the standard dotfiles (~/.bashrc
, ~/.bash_profile
, ~/.cshrc
, ~/.login
, etc.) for system use, and users put their shell modifications into the corresponding .ext
files (e.g., ~/.bashrc.ext
, ~/.bash_profile.ext
, etc.). This is not the case any more! You can now modify the standard dotfiles for your personal use.
The actual dotfile transition occurred during the center maintenance on February 21-25, 2020. To mitigate any interruptions to existing workloads, we preserved shell environments by replacing dotfiles with template dotfiles that source .ext files. For example, we changed the ~/.bashrc
file to look like,
# begin .bashrc
if [ -z "$SHIFTER_RUNTIME" ]
then
. $HOME/.bashrc.ext
fi
# end .bashrc
We recommend that users whose accounts were created before February 2020 move the contents of their ~/.bashrc.ext
file into their ~/.bashrc
file (and remove the .ext files afterwards).
Changing Your Default Login Shell¶
Use Iris to change your default login shell. Log in, then under the "Profile" tab look for the "Server Logins" section. Click on "Edit" under the "Actions" column.
Customizing Your Shell Environment¶
bash
users can add startup configurations in the ~/.bashrc
file, e.g., environment variables, aliases, and functions, to make them accessible in subshells. The ~/.bashrc
file is sourced in non-interactive shell invocations (an example of this is running a shell script). csh
users should specify their configuration in ~/.tcshrc
, which will be available in interactive login and interactive non-login mode.
System-specific Customizations¶
All NERSC systems share the same global home file system; a user's $HOME
macro points to the same directory on every NERSC platform. To make system-specific customizations, use the pre-defined environment variable NERSC_HOST
.
Don't set NERSC_HOST
Some older dotfiles set NERSC_HOST
without checking whether this variable is set first. Generally you should not need to do this, so we advise you not set NERSC_HOST
in your dotfiles. If you must set NERSC_HOST
for some reason, it's good practice to check whether this variable is set first before overwriting it. In bash you can do this with if [ -z "$var" ]; then var="mysettings"; fi
.
Example
case $NERSC_HOST in
"perlmutter")
: # settings for Perlmutter
export MYVARIABLE="value-for-perlmutter"
;;
"datatran")
: # settings for DTN nodes
export MYVARIABLE="value-for-dtn"
;;
*)
: # default value for other nodes
export MYVARIABLE="default-value"
;;
esac
Shifter¶
If you run shifter applications, you may want to skip the dotfiles. You can use the following if block in your dotfiles:
if [ -z "$SHIFTER_RUNTIME" ]; then
: # Settings for when *not* in shifter
fi
Missing NERSC Variables¶
If any NERSC-defined environment variables such as $SCRATCH
, are missing in your shell invocations, you can add them in your ~/.bashrc
file as follows:
if [ -z "$SCRATCH" ]; then
export SCRATCH=/global/pscratch/sd/${USER:0:1}/$USER
fi
scrontab¶
Crontab functionality is provided on NERSC HPC systems via scrontab
. If you run bash scripts with scrontab, you may want to invoke a login shell (#!/bin/bash -l
) in order to get the NERSC-defined environment variables, such as NERSC_HOST
, SCRATCH
, PSCRATCH
, and to get the module command defined.
For more information about using scrontab
at NERSC, please see our scrontab
documentation page
Older Environments¶
We generally recommend that you use the most recent programming environment installed on Perlmutter. However, sometimes it is convenient to have access to previous programming environments to check things like compile options and libraries, etc. You can use module load cpe/YY.XX
to load the previous programming environment from year YY and month XX. We will remove cpe
modules for environments that no longer work on our system due to changes in underlying dependencies like network libraries.
Please keep in mind that these cpe
modules are offered for convenience sake. If you require reproducibility across environments we encourage you to investigate container-based options like Shifter.
Troubleshooting User Environment Issues¶
If you are facing issues with your user environment, we have some recommendations to help you diagnose the problem.
First, we recommend you check the shell startup files used by your shell type (bash
, sh
, csh
, zsh
, tcsh
). Most user environment issues can be resolved by reviewing the content of your user startup files. For bash
users, check your $HOME/.bashrc
file to see if an environment issue is caused by this file. For csh
, check $HOME/.cshrc
and for zsh, check $HOME/.zshrc
. If you update your startup files, you can source the files to apply the changes to the current shell (source $HOME/.bashrc
) or log out and log back in.
If you want to know where environment variables are set, you will need to understand the shell startup files. When you ssh
into NERSC systems you are in an interactive login shell. For bash
user you will want to look at the table outlined in bash startup files. The /etc/profile
script, which is typically sourced during shell login, is available on any Linux distribution, but its contents may vary by distribution. During shell initialization, the shell will source files in /etc/profile.d/*
-- startup files added by the site administrator to provide system-wide defaults to all users. We encourage you review the content of each file if you need to troubleshoot your environment. Note that /etc/profile
and files in /etc/profile.d/*
are owned by the root user, so you wouldn't be able to edit them, but it's good to check these files when tracing issues related to the startup environment.
Second, you can review the modules loaded at startup. All user environments are initially loaded with a pre-determined set of modulefiles selected by the site administrators. You should review the content of your active modules by running module list
, then analyze the content of each modulefile by running module show <modulefile>
. Many users include module load
statements in their ~/.bashrc
to customize their startup modules, but this can cause unexpected side-effects when loading other modules.
Here are some additional tips to help you troubleshoot environment issues:
- Check for environments like
PATH
,LD_LIBRARY_PATH
in startup scripts such as~/.bashrc
that may cause issues. A common mistake is to reset one of these environment variables instead of prepending or appending additional paths. Settingexport PATH=/path/to/dir
will corrupt your shell -- instead setexport PATH=/path/to/dir:$PATH
, which will prepend a directory to $PATH. - Check all environment variables set in your terminal via
env
orprintenv
. If you are looking for a particular pattern, you cangrep
for it within the long output, e.g.,printenv | grep -i petsc
(the-i
ignores capitalization). - Always check the path to the binary that is being run. For instance, if you want to run a python script, double check the path to the python wrapper by invoking
which python
and see if the path makes sense. - Make sure you are on the right machine! The environment variable
NERSC_HOST
will show you which machine you are logged in to. The expected value should be the following for Perlmutter:
elvis@perlmutter> echo $NERSC_HOST
perlmutter
- Check whether you are in login or compute node by invoking
hostname
. If you see an output start withnid*
then chances are you are in a compute node. - If your shell prompt gets clobbered, try running
reset
, which will reset your terminal settings.
Troubleshooting Shell Scripts¶
Running Shell Scripts¶
You can run a shell script with your preferred shell (i.e., bash script.sh
, csh script.sh
, sh script.sh
) or you can specify a full or relative path to the script. A shell script must be executable in order to run when specifying the full path. In example below there is a permission error, since the file doesn't have execute permission (x
). You can fix this by running chmod +x script.sh
.
elvis@login24> ./script.sh
bash: permission denied: ./script.sh
elvis@login24> ls -l script.sh
-rw-rw---- 1 elvis elvis 126 Apr 1 08:43 script.sh
Using Strict Running Modes¶
Running a script in a stricter mode can help in the debugging process. For example, the default behavior of the bash shell is to run a script to completion regardless of the success of any commands within the script. Using set -e
makes the script terminate immediately when a simple command exits with a non-zero exit status (effectively, upon encountering an error).
The set command is a built-in option that changes shell behavior in bash
and sh
.
Note
In csh
, the set
command is used for setting variables (set FOO=BAR
). This is very different from how set
works in bash
or sh
: in these shells' syntax, set
changes the behavior of the current shell.
In the following example, bash stops execution after running XYZ
(which is an invalid command). The command whoami
is not run because the script terminates immediately after the invalid command. Note the non-zero script exit code, retrieved by $?
.
elvis@login24> cat script.sh
#!/bin/bash
set -e
hostname
# invalid command. Bash will terminate immediately
XYZ
# This command won't be executed
whoami
elvis@login24> bash script.sh
login24
script.sh: line 6: XYZ: command not found
elvis@login24> echo $?
127
The shebang is a character sequence (!#
) at the beginning of a script used to indicate which shell interpreter to use when processing the script. You can also pass any shell options in the shebang line. In the previous example, we specified set -e
within the script to modify the behavior of the bash shell. This option can be passed on the shebang line #!/bin/bash -e
, which is also equivalent to invoking the script with /bin/bash -e <script>.sh
. Likewise, to enable strict mode for the csh/tcsh shell, you can use #!/bin/csh -e
and #!/bin/tcsh -e
.
If we were to source
this script, the setting would be applied to the current shell. When set -e
is enabled in the current shell or set as a result of sourcing some script, an invalid command (even a typo!) will terminate your shell. Watch out for this behavior if you source any script that enables set -e
.
elvis@login24> source script.sh
login24
XYZ: command not found
Connection to perlmutter.nersc.gov closed.
Running in the mode in which the execution of a script terminates upon detecting a non-zero exit status can help you determine what went wrong in your script. You can check the exit code of your last command as follows:
# bash, sh, zsh
echo $?
# csh, tcsh
echo $status
For complicated commands, set -e
may not be sufficient to determine whether there was an error. For example in bash
, the exit code for a piped command (|
) will be the last command in the pipe. Below we show two examples of non-zero exit codes within the pipe operator. The command grep123
is a typo -- we meant grep
. In the first example we see a non-zero exit code, however in the second example we see a 0 exit code because wc -l
returned 0:
elvis@login24> ls -ld | grep123 $user
grep123: command not found
elvis@login24> echo $?
127
elvis@login24> ls -ld | grep123 $user | wc -l
grep123: command not found
0
elvis@login24> echo $?
0
If you want bash to report the piped command as a failure, consider also running set -o pipefail
. If we add this setting and rerun the same example, we now see the exit code is 127 instead of 0.
elvis@login24> set -o pipefail
elvis@login24> ls -ld | grep123 $user | wc -l
grep123: command not found
0
elvis@login24> echo $?
127