Cori for JGI¶
A subset of nodes on Cori, the flagship supercomputer at NERSC, are reserved for exclusive use by JGI users. All the features available on Cori Haswell nodes are available also on the JGI-specific "quality of service" (QOS).
Access¶
All JGI staff and collaborators can submit a request to JGI management to be given access to Cori Genepool, the JGI reserved fraction of Cori compute capacity. This service first became available in January 2018.
JGI staff and affiliates can use their access to Cori Genepool by passing QOS arguments to Slurm job submissions.
- All JGI users must specify the Slurm account under which the job will run (with
-A <youraccount>
). Unlike other NERSC users, JGI users accessing the Genepool QOS do not have a default account. - For jobs requiring one or more whole nodes, use
--qos=genepool
. -
For jobs which can share a node with other jobs, use
--qos=genepool_shared
. -
Each of the following items first require
module load esslurm
: - For large memory batch jobs use
--qos=jgi_exvivo
. - For large memory shared batch jobs use
--qos=jgi_shared
. - For large memory interactive jobs use
--qos=jgi_interactive
. - For transfer jobs which write to the Data and Archive file system use
--qos=xfer_dna
.
Note
Jobs run under the Cori genepool
, genepool_shared
, jgi_exvivo
, jgi_shared
, jgi_interactive
, and xfer_dna
QOSes are not charged. Resources are scheduled to the best of our ability, but interference with other users' workloads can still occur. Please be a good citizen to your fellow researchers. Users violating the spirit of this policy will find themselves less able to do so.
Note
The JGI's Cori capacity is entirely housed on standard Haswell nodes: 32 physical cores, each core with 2 hyperthreads, no local hard drives, and 128GB memory. It is not necessary to request -C Haswell
via Slurm if using a JGI QOS. KNL nodes are NOT available via a JGI QOS. To use KNL nodes, submit to one of Cori's standard QOS (such as regular
), and use the "m342" account. Be aware that jobs run with "m342" will charge NERSC allocation hours to JGI.
Example
For a single core shared job, you would minimally need:
sbatch --qos=genepool_shared -A <youraccount> yourscript.sh
To request an interactive session on a single node with all CPUs and memory:
salloc --qos=genepool -A <youraccount>
Don't forget that if the Cori Genepool QOS is full, the previous command can take a long time to give you a node.
In the earlier examples, youraccount
is the project name you submit to, not your username or file group name. If you don't know what accounts you belong to, you can check with:
sacctmgr show associations where user=$USER
Cori Features and Other Things to Know¶
Cori offers additional features and capabilities that can be of use to JGI researchers:
Slurm¶
Cori uses the Slurm job scheduler. Documentation and examples for using Slurm at NERSC can be found here.
Cori Scratch¶
Cori scratch is storage space for each user located on a Lustre file system accessible from Cori and Cori ExVivo. This directory can be found at /global/cscratch1/sd/$USER
or by using the $CSCRATCH
environment variable. Cori scratch is purged periodically; backing up data stored there is your responsibility. The HPSS Tape Data Archive or JGI JAMO system can be used for for this purpose. See the NERSC Data Management Policy for more information on topics such as automatic file backups and scratch directory purge frequency.
Note
The performance of the different file systems will vary depending significantly on what your application is doing. It's worth experimenting with your data in different locations to see what gives the best results.
JGI Partition Configuration¶
Setting | Value |
---|---|
Job limits | 5000 exclusive jobs, or 10000 shared jobs |
Run time limits | 72 h |
Partition size | 192 nodes |
Node configuration | 32-core Haswell CPUs (64 hyperthreads), 128GB memory |