Opening an SSH connection to NERSC systems results in a connection to a login node. Typically systems will have multiple login nodes which sit behind a load balancer. New connections will be assigned a random node. If an account has recently connected the load balancer will attempt to connect to the same login node as the previous connection.
Do not run compute- or memory-intensive applications on login nodes. These nodes are a shared resource. NERSC may terminate processes which are having negative impacts on other users or the systems.
On login nodes, typical user tasks include * Compiling codes (but please limit the number of threads, e.g.,
make -j 8) * Editing files * Submitting jobs
Some workflows require interactive use of applications such as IDL, MATLAB, NCL, python, and ROOT. For small datasets and short runtimes it is acceptable to run these on login nodes. For extended runtimes or large datasets these should be run in the batch queues.
An interactive qos is available on Cori for compute- and memory-intensive interactive work.
NERSC has implemented usage limits on Cori login nodes via Linux cgroup limits. These usage limits prevent inadvertent overuse of resources and ensure a better interactive experience for all NERSC users.
The following memory and CPU limits have been put in place on a per-user basis (i.e., all processes combined from each user) on Cori.
Memory limits: On login nodes and workflow nodes, the limit is 128 GB (25% of the available memory). On the Jupyter nodes, the limit is 42 GB.
CPU limits: On login nodes, workflow nodes, and Jupyter nodes, the CPU limit is 50% of usage. Your processes will be throttled to allow fairness of other users using the login node resources.
If your login node processes exceed the above limits, they may terminate with a message like "Out of memory". To help identify processes that make heavy use of resources, you can use:
top -u $USER, or run the specific command with
/usr/bin/time -v ./my_command. For Jupyter, there is a widget on the lower left-hand side of the JupyterLab UI that shows aggregate memory usage.
Additional tips for using login nodes: * Avoid the
watch command, which has a default interval of 2 seconds. If you must use the
watch command, please use a much longer interval such as 5 minutes (=300 sec), e.g.,
watch -n 300 <your_command>.
- Avoid long-running commands in general. Instead, run it on the compute nodes, through the interactive qos, or submit a batch job to the regular or shared qos.
If you need to do a large number of data transfers use a dedicated xfer queue. e.