Cori Large Memory software¶
Many software packages available from Cori and other external compute nodes (e.g., Cori GPUs) work on the large memory nodes (aka cmem
nodes), too, since they are built with x86 and x64 instruction sets. However, some packages, meant to be used with CUDA on GPUs or to be for a different interconnect fabric, don't work. In that case, we have built separate cmem-specific versions, and differentiate the module names from those for the existing ones, by appending -cmem
to the regular names.
If you see module file names for a package, some with the trailing -cmem
and others without, you should use one that ends with -cmem
. For example, we show below all the llvm
versions available, but only the one for llvm-cmem/10.0.1
works on the large memory nodes.
To use modules for the large memory nodes, you can module load cmem
.
cmem$ module avail llvm
--------- /global/common/software/nersc/osuse15_cmem/extra_modulefiles ---------
llvm-cmem/10.0.1(default)
------------- /global/common/software/nersc/cle7/extra_modulefiles -------------
llvm/8.0.1
llvm/9.0.0-git_20190220_cuda_10.1.168(default)
llvm/9.0.1
llvm/10.0.0
llvm/10.0.0-git_20190828
llvm/11.0.0-git_20200409
llvm_openmp_debug/9.0.0-git_20190220_cuda_10.1.168(default)
llvm_openmp_debug/10.0.0-git_20190828
llvm_openmp_debug/11.0.0-git_20200409
Compilers¶
There are several base compilers available.
CCE (Cray Compiler)¶
cmem$ module load PrgEnv-cray
cmem$ module rm darshan # requires incompatible cray-mpich
cmem$ export CRAY_CPU_TARGET=x86-64 # set the CPU target to x86-64
GCC¶
cmem$ module load gcc
It is suggested to add the processor specific compiler option -march=znver1
for better performance codes.
Intel¶
cmem$ module load intel
For better performance codes, you can make use of the processor specific compiler option for better performances, by adding -xHOST
, and make sure that you compile on a cmem node directly.
LLVM¶
cmem$ module load cmem
cmem$ module load llvm-cmem
PGI¶
cmem$ module load pgi
It is suggested to use the processor specific compiler option by adding -tp=zen
.
MPI¶
Open MPI¶
Open MPI is provided for the GCC, HPC SDK (formerly PGI), Intel, and CCE compilers, and is provided as the openmpi-cmem/4.0.3
and openmpi-cmem/4.0.5
modules. Users must use this particular version of the openmpi-cmem
module.
One must first load a compiler module before loading a openmpi-cmem
module, e.g.,
cmem$ module load cmem
cmem$ module load gcc
cmem$ module load openmpi-cmem
After the openmpi-cmem
module is loaded, the MPI compiler wrappers will be available as mpicc
, mpic++
, and mpif90
.
Python¶
Python use on the largemem nodes is largely the same as on Cori. You can find general information about using Python at NERSC at our Python page.
Two main differences Python users should be aware of on the largemem nodes are mpi4py (documented below) and performance issues in libraries that use Intel's MKL like NumPy, SciPy, and scikit-learn. You can find more information about improving MKL performance on AMD CPUs at this Intel MKL on AMD Zen blog post. Alternatively, in NumPy for example, you can use OpenBLAS instead of MKL by installing NumPy from the conda-forge
channel (more info at the NumPy installation page). You can also use the nomkl
package documented on the Anaconda MKL optimization page.
Using Python mpi4py¶
Using Python's mpi4py on the Large Memory nodes requires an mpi4py built with Open MPI. This means that the mpi4py in our default Python module will not work on these nodes. It also means that any custom conda environments built with Cray MPICH (following our standard build recipe) will also not work on the Large Memory nodes.
We provide two options for users:
1) Build mpi4py against Open MPI in your own conda environment:
module load cmem
module load python
module swap PrgEnv-intel PrgEnv-gnu
module load openmpi-cmem
conda create -n mylargemem python=3.8 -y
source activate mylargemem
cd $HOME
wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-3.0.3.tar.gz
tar zxvf mpi4py-3.0.3.tar.gz
cd mpi4py-3.0.3
python setup.py build
python setup.py install
OR
2) Start with our pre-built mpi4py for the Large Memory nodes by cloning an environment:
module load python
conda create -n mylargemem --clone lazy-mpi4py-amd
source activate mylargemem
To run with Slurm:
srun -n 2 python hello_world.py
To run with Open MPI's mpirun:
module load cmem
module load openmpi-cmem
mpirun -n 2 python hello_world.py
Q-Chem¶
cmem$ module load qchem-cmem
VASP¶
cmem$ module load vasp-cmem