Skip to content

VASP

VASP is a package for performing ab initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set. The approach implemented in VASP is based on a finite-temperature local-density approximation (with the free energy as variational quantity) and an exact evaluation of the instantaneous electronic ground state at each MD step using efficient matrix diagonalization schemes and an efficient Pulay mixing scheme.

Availability and Supported Architectures at NERSC

VASP is available at NERSC as a provided support level package for users with an active VASP license.

Gaining Access to VASP Binaries

To gain access to the VASP binaries at NERSC through an existing VASP license, please fill out the VASP License Confirmation Request. You can also access this form at NERSC Help Desk (Open Request -> VASP License Confirmation Request).

Note

If your VASP license was purchased from VASP Software GmbH, the license owner (usually your PI) must register you under his/her license at the VASP Portal before you fill out the form.

It may take several business days from when the form is submitted to when access to NERSC-provided VASP binaries is granted.

When your VASP license is confirmed, NERSC will add you to a unix file group: vasp5 for VASP 5, and vasp6 for VASP 6. You can check if you have VASP access at NERSC via the groups command. If you are in the vasp5 file group, then you can access VASP 5 binaries provided at NERSC. If you are in the vasp6 file group, then you can access VASP 6 binaries.

VASP 6 supports GPU execution.

Versions Supported

Perlmutter GPU Perlmutter CPU
6.x 5.4, 6.x

Use the module avail vasp command to see a full list of available sub-versions.

Application Information, Documentation, and Support

See the developers page for information about VASP, including links to documentation, workshops, tutorials, and other information. Instructions for building the code and preparing input files can be found in the VASP Online Manual. For troubleshooting, see our troubleshooting guide for frequently asked questions and additional links to support pages.

Using VASP at NERSC

We provide multiple VASP builds for users. Use the module avail vasp command to see which versions are available and module load vasp/<version> to load the environment. For example, these are the available modules (as of 04/11/2024),

perlmutter$ module avail vasp

------------------------------------------- /opt/nersc/pe/modulefiles -------------------------------------------
   mvasp/5.4.4-cpu           vasp-tpc/6.3.2-cpu    vasp/5.4.4-cpu (D)    vasp/6.3.2-gpu
   vasp-tpc/5.4.4-cpu (D)    vasp-tpc/6.3.2-gpu    vasp/6.2.1-gpu        vasp/6.4.1-cpu
   vasp-tpc/6.2.1-gpu        vasp/5.4.1-cpu        vasp/6.3.2-cpu        vasp/6.4.1-gpu

  Where:
   D:  Default Module

The modules with "6.x.y" in their version strings are official releases of hybrid MPI+OpenMP VASP, which are available to the users who have VASP 6 licenses. The "vasp-tpc" (tpc stands for "Third Party Codes") modules are the custom builds incorporating commonly used third party contributed codes; these may include Wannier90, DFTD4, LIBXC, BEEF, VTST from University of Texas at Austin, and VASPSol. On Perlmutter the "cpu" and "gpu" version strings indicate builds which target Perlmutter's CPU and GPU nodes, respectively. The current default on Perlmutter is vasp/5.4.4-cpu (VASP 5.4.4 with the latest patches), and you can access it by

perlmutter$ module load vasp

To use a non-default module, provide the full module name,

perlmutter$ module load mvasp/5.4.4-cpu

The module show command shows the effect VASP modules have on your environment, e.g.

perlmutter$ module show mvasp/5.4.4-cpu
----------------------------------------------------------------------------------------------------
   /global/common/software/nersc/pm-2022.12.0/extra_modulefiles/mvasp/5.4.4-cpu.lua:
----------------------------------------------------------------------------------------------------
help([[This is an MPI wrapper program for VASP 5.4.4.pl2 (enabled Wannier90 1.2) 
to run multiple VASP jobs with a single srun. 

VASP modules are available only for the NERSC users who already have an existing VASP license. 
In order to gain access to the VASP binaries at NERSC through an existing VASP license, 
please fill out the VASP License Confirmation Request form at https://help.nersc.gov
(Open Request -> VASP License Confirmation Request).
]])
whatis("Name: VASP")
whatis("Version: 5.4.4")
whatis("URL: https://docs.nersc.gov/applications/vasp/")
whatis("Description: MPI wrapper for running many VASP jobs with a single srun")
setenv("PSEUDOPOTENTIAL_DIR","/global/common/software/nersc/pm-stable/sw/vasp/pseudopotentials")
setenv("VDW_KERNAL_DIR","/global/common/software/nersc/pm-stable/sw/vasp/vdw_kernal")
setenv("NO_STOP_MESSAGE","1")
setenv("MPICH_NO_BUFFER_ALIAS_CHECK","1")
prepend_path("LD_LIBRARY_PATH","/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/compilers/extras/qd/lib")
prepend_path("LD_LIBRARY_PATH","/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/compilers/lib")
prepend_path("LD_LIBRARY_PATH","/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/math_libs/11.5/lib64")
prepend_path("LD_LIBRARY_PATH","/global/common/software/nersc/pm-stable/sw/vasp/5.4.4-mpi-wrapper/milan/nvidia-22.5/lib")
prepend_path("PATH","/global/common/software/nersc/pm-stable/sw/vasp/vtstscripts/3.1")
prepend_path("PATH","/global/common/software/nersc/pm-stable/sw/vasp/5.4.4-mpi-wrapper/milan/nvidia-22.5/bin")

This vasp module adds the path to the VASP binaries to your search path and sets a few environment variables, where PSEUDOPOTENTIAL_DIR and VDW_KERNAL_DIR are defined for the locations of the pseudopotential files and the vdw_kernel.bindat file used in dispersion calculations. The OpenMP and MKL environment variables are set for optimal performance.

VASP binaries

Each VASP module provides three different binaries:

  • vasp_gam - gamma-point-only build
  • vasp_ncl - non-collinear spin
  • vasp_std - the standard k-point binary

One must choose the appropriate binary for the corresponding job.

Sample Job Scripts

To run batch jobs, prepare a job script (see samples below), and submit it to the batch system with the sbatch command, e.g. for job script named run.slurm,

nersc$ sbatch run.slurm

Please check the Queue Policy page for the available QOS settings and their resource limits.

Perlmutter GPUs

Sample job script for running VASP 6 on Perlmutter GPU nodes
#!/bin/bash
#SBATCH -J myjob
#SBATCH -A <your account name>  # e.g., m1111
#SBATCH -q regular
#SBATCH -t 6:00:00        
#SBATCH -N 2           
#SBATCH -C gpu
#SBATCH -G 8 
#SBATCH --exclusive
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err

module load vasp/6.4.1-gpu

export OMP_NUM_THREADS=1
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

srun -n 8 -c 32 --cpu-bind=cores --gpu-bind=none -G 8 vasp_std

Perlmutter CPUs

Sample job script for running VASP 5 on Perlmutter CPU nodes
#!/bin/bash
#SBATCH -N 2
#SBATCH -C cpu
#SBATCH -q regular
#SBATCH -t 01:00:00
#SBATCH -J vasp_job
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err

# Default version loaded: vasp/5.4.4-cpu
module load vasp

# Run with (-n) 256 total MPI ranks
#  128-MPI-ranks-per-node is maximum on Perlmutter CPU
# Set -c ("--cpus-per-task") = 2 
#  to space processes two "logical cores" apart
srun -n 256 -c 2 --cpu-bind=cores vasp_std
Sample job script for running VASP 6 on Perlmutter CPU nodes
#!/bin/bash
#SBATCH -N 2
#SBATCH -C cpu
#SBATCH -q regular
#SBATCH -t 01:00:00
#SBATCH -J vasp_job
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err

module load vasp/6.4.1-cpu

# Always provide OpenMP settings when running VASP 6
export OMP_NUM_THREADS=2
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# Run with (-n) 128 total MPI ranks:
#  64 MPI-ranks-per-node
#   2 OpenMP threads-per-MPI-rank
#  Set -c ("--cpus-per-task") = 2 x (OMP_NUM_THREADS) = 4
#   to space processes two "logical cores" apart
srun -n 128 -c 4 --cpu-bind=cores vasp_std

Running interactively

To run VASP interactively, request a batch session using salloc.

Tips

  1. The interactive QOS allocates the requested nodes immediately or cancels your job in about 5 minutes (when no nodes are available). See the Queue Policy page for more info.
  2. Test your job using the interactive QOS before submitting a long running job.

Long running VASP jobs

For long VASP jobs (e.g., > 48 hours), you can use the variable-time job script, which allows you to run jobs with any length. Variable-time jobs split a long running job into multiple chunks, so it requires the application to be able to restart from where it left off. Note that not all VASP computations are restartable, but e.g., RPA; long running atomic relaxations and MD simulations are good use cases of the variable-time job script.

Sample variable-time job script
#!/bin/bash
#SBATCH -N 1
#SBATCH -C cpu
#SBATCH -J vasp_job
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
#SBATCH --qos=debug_preempt
#SBATCH --comment=0:45:00
#SBATCH --time=0:05:00
#SBATCH --time-min=0:05:00
#SBATCH --signal=B:USR1@60
#SBATCH --requeue
#SBATCH --open-mode=append

## Notes on parameters above:
##
## '--qos=XX'      can be set to any of several QoS to which the user has access, which
##                 may include: regular, debug, shared, preempt, debug-preempt, premium,
##                 overrun, or shared-overrun.
## '--comment=XX'  is the total time that SUM of restarts can run (can be VERY LARGE).
## '--time=XX'     is the maximum time that individual restart can run. This MUST fit
##                 inside the time limit for the QOS that you want to use
##                 (see https://docs.nersc.gov/jobs/policy/ for details).
## '--time-min=XX' is the minimum time that job can run before being preempted (using this
##                 option can make it easier for Slurm to fit into the queue, possibly allowing
##                 job to start sooner). Omit this parameter unless running in either
##                 --qos=preempt or --qos=debug_preempt.
## '--signal=B:USR1@60' sends signal to begin checkpointing @XX seconds before end-of-job
##                 (set this large enough to have enough time to write checkpoint file(s)
##                  before time limit is reached; 60 seconds is usually enough).
## '--requeue'     specifies job is elegible for requeue in case of preemption.
## '--open-mode=append' appends contents of to the end of standard output and standard
##                 error files with successive requeues.

# Remove STOPCAR file so job isn't blocked
if [ -f "STOPCAR" ]; then
rm STOPCAR
fi

# Select VASP module of choice
module load vasp/5.4.4-cpu

# srun must execute in background and catch signal on wait command
# so ampersand ('&') is REQUIRED here
srun -n 128 -c 2 --cpu_bind=cores vasp_std &

# Put any commands that need to run to continue the next job (fragment) here
ckpt_vasp() {
set -x
restarts=`squeue -h -O restartcnt -j $SLURM_JOB_ID`
echo checkpointing the ${restarts}-th job

# Trim space from restarts variable for inclusion into filenames
restarts_num=$(echo $restarts | sed -e 's/^[ \t]*//')
echo "Restart number: ==${restarts_num}=="

# Terminate VASP at the next electronic step
echo LABORT = .TRUE. >STOPCAR

# Wait until VASP completes current step, then write WAVECAR file and quit
srun_pid=$(ps -fle | grep srun | head -1 | awk '{print $4}')
echo srun pid is $srun_pid
wait $srun_pid

# Copy CONTCAR to POSCAR and back up data from current run in each folder
folder="checkpt-$SLURM_JOB_ID-${restarts_num}"
mkdir $folder
echo "In directory $folder"

cp -p CONTCAR POSCAR
cp -p CONTCAR "$folder/POSCAR"
echo "CONTCAR copied."

cp -p OUTCAR "$folder/OUTCAR-${restarts_num}"
echo "OUTCAR copied."

cp -p OSZICAR "$folder/OSZICAR-${restarts_num}"
echo "OSZICAR copied."

# Back up the vasprun.xml file in the parent folder
cp -p vasprun.xml "vasprun-${restarts_num}.xml"
echo "vasprun.xml copied."

set +x
}

ckpt_command=ckpt_vasp

# The 'max_timelimit' is max time per individual job, in seconds
#     This line MUST be included !!!
#     This MUST match the value set for '#SBATCH --time=' above !!!
max_timelimit=300

# The 'ckpt_overhead' is the time reserved to perform the checkpoint step, in seconds
#     This MUST fit within the max_timelimit !!!
#     This should match the value set for '--signal=B:USR1@XX' above
ckpt_overhead=60

# Requeue the job if remaining time > 0
.  /global/common/sw/cray/cnl7/haswell/nersc_cr/19.10/etc/env_setup.sh
requeue_job func_trap USR1

wait

Running multiple VASP jobs simultaneously

For running many similar VASP jobs, it may be beneficial to bundle them inside a single job script, as described in Running Jobs.

However, the maximum number of jobs you should bundle in a job script is limited, ideally not exceeding ten. This is because the batch system, Slurm (as it's implemented currently) is serving tens of thousands of other jobs on the system at the same time as yours, and compounding srun commands can occupy a great deal of Slurm's resources.

If you want to run many more similar VASP jobs simultaneously, we recommend using the MPI wrapper for VASP that NERSC has provided, which enables you to run as many VASP jobs as you wish under a single srun invocation. The MPI wrapper for VASP is available via the mvasp module. mVASP is best suited for collections of jobs have approximately the same runtime and for which the jobs are small enough that one can exploit the fast intra-node MPI communication for each component job.

For example, consider the case of running 2 VASP jobs simultaneously, each on a single Perlmutter CPU node. One has prepared 2 input files, where each input resides in its own directory under a common parent directory. From the parent directory one can create a job script like below,

run_mvasp.slurm run 2 VASP jobs simultaneously on Perlmutter CPU
#!/bin/bash
#SBATCH -C cpu
#SBATCH --qos=debug
#SBATCH --time=0:30:00
#SBATCH --nodes=2
#SBATCH --error=mvasp-%j.err
#SBATCH --output=mvasp-%j.out

module load mvasp/5.4.4-cpu

sbcast --compress=lz4 `which mvasp_std` mvasp_std
srun -n128 -c4 --cpu-bind=cores mvasp_std

then generate a file named joblist.in, which contains the number of jobs to run and the VASP run directories (one directory per line).

Sample joblist.in file
2
test1
test2

One can then use the script gen_joblist.sh that is available via the mvasp modules to create the joblist.in file.

module load mvasp
gen_joblist.sh

Then, submit the job via sbatch:

sbatch run_mvasp.slurm 

Note

  • Be aware that running too many VASP jobs at once may overwhelm the file system where your job is running. Please do not run jobs in your global home directory.
  • In the sample job script above, to reduce the job startup time for large jobs the executable was copied to the /tmp file system (memory) of the compute nodes using the sbcast command prior to execution.

Troubleshooting VASP

Solutions to frequently encountered issues

General Tips
  • Use the most recent version of VASP allowed by your license.

  • Check your job script, INCAR, POSCAR, and KPOINTS files for typos or other input mistakes before running.

  • Check the OUTCAR, standard out, and standard error files to verify that inputs were processed correctly and for error messages.

My VASP job just sits in the queue and does not start.
  • Nodes may be unavailable due to a scheduled maintenance or to an unexpected outage or to a large node reservation. In this case the squeue command will typically display a message similar to the following:

    elvis@perlmutter:login28:~> squeue -u elvis
    
       JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    24224393  gpu_ss11 vasp_job    elvis PD       0:00      1 (ReqNodeNotAvail, UnavailableNodes:nid[001140-008864])
    
    Your job will be held in the queue until the nodes become available. See the NERSC MOTD page for details.

  • Consider whether the amount of resources (nodes, wall time) that you requested are appropriate for your job, and decrease the size of the request if possible. As a general rule, the most reliable way to accelerate your job's priority in the queue is to reduce the time limit as this gives Slurm the flexibility to fit your job in gaps between other jobs.

My VASP job immediately crashes with the message: <vasp_executable>: No such file or directory:

Example:

vasp not found

VASP binaries are permission-locked to file groups and are not visible to users outside of the file groups. This message appears when you attempt to run a VASP executable without being in the corresponding file group. Take the following steps:

  • Once logged in to NERSC, execute the groups command to show the file groups to which you belong. You must be in the vasp5 group to access VASP 5 binaries and in the vasp6 group to access VASP 6 binaries.
  • Fill out the VASP License Confirmation Request form to request access. Once your VASP license has been successfully confirmed, NERSC will add you to the file groups corresponding to your license.
My VASP job crashes with an error when trying to read {POSCAR|INCAR|KPOINTS}, but the file appears to be correct.

Example:

ERROR: there must be 1 or 3 items on line 2 of POSCAR 

This error can occur when reading input files generated by an external program or by a text editor on a non-unix operating system. Here, the input file contains special (non-human-readable) characters that cause VASP to read the file incorrectly. To test if this is the case:

  1. Back up the input file which causes the error, e.g.

    perlmutter$ mv POSCAR POSCAR_backup
    
  2. Pipe the text of the input file to screen:

    perlmutter$ cat POSCAR_backup
    
  3. Create a new input file (in this example, POSCAR) using a unix-based text editor (e.g. vim or nano) and paste the text from the preceding step into the new file, and save the new file.

  4. Run your VASP job using the new input file.

If the preceding steps solve the issue, then possible production solutions include:

  • Check to see if your external input generating program can export files to a unix-compatible format.

  • Use a tool to convert the files to unix format. For example, one can convert DOS files via the dos2unix utility, e.g.

    perlmutter$ dos2unix POSCAR
    
My VASP job immediately crashes with an error about not being compiled for noncollinear calculations
    ERROR: noncollinear calculations require that VASP is compiled without the flag -DNGXhalf and -DNGZhalf

This error occurs when one attempts to run a noncollinear calculation using the vasp_std or vasp_gam executable. To solve, use the vasp_ncl executable instead.

My VASP job runs but produces no output or incomplete output

This type of error can occur when one incorrectly configures how a job interacts with the file system. Here are some items to check:

  • Always run jobs in your $SCRATCH directory, not in $HOME. VASP generates large files which, if run in $HOME, may cause your $HOME directory to exceed the quota. After the quota has been reached, no new files may be written, and your output will be incomplete or missing.

  • When running multiple VASP calculations inside of a single job script, perform each calculation in a separate directory. VASP generates identically-named files (OUTCAR, OSZICAR, ...) for each calculation, and these files will overwrite one another if run in the same directory.

My VASP job hangs or stops making progress

There are various reasons that a VASP job may hang, including:

  • VASP jobs on GPUs may hang due to incorrect binding. Examples of this include:

    • VASP uses NCCL to transfer data between GPUs attached to different MPI ranks. If the Slurm setting --gpu-bind=single:1 is used it will cause only one GPU to be visible to each MPI rank; this inhibits NCCL and can cause GPU jobs to hang. Set --gpu-bind=none instead.

    • VASP only supports 1:1 MPI-rank-to-GPU mapping. This means that one cannot use multiple MPI ranks per GPU, nor can one use multiple GPUs per MPI rank. Make sure to select no more than four MPI ranks per node and that the number of GPUs is set equal to the number of MPI ranks in your job script.

  • Running on hyperthreads, or running multiple processes per physical cores, can cause VASP jobs to hang:

    • On Perlmutter GPU, each node contains 64 physical cores but 128 "logical cores" with hyperthreading. Make sure that you do not run more than four MPI ranks per node, and, if using OpenMP, make sure that the product of MPI ranks per node and OpenMP threads-per-MPI-rank does not exceed 64. The value for the Slurm parameter -c (--cpus-per-task) should not be set less than 32.

    • On Perlmutter CPU, each node contains 128 physical cores but 256 "logical cores" with hyperthreading. Make sure that you do not run more than 128 MPI ranks per node (for a "pure" MPI calculation), or, if using OpenMP, make sure that the product of MPI ranks per node and OpenMP threads per MPI rank does not exceed 128. The value for the Slurm parameter -c (--cpus-per-task) should never be set less than 2.

    • When running VASP 6 executables with OpenMP enabled, set the OpenMP environment variables after loading the VASP module but before invoking srun to ensure that the correct OpenMP settings are selected. We recommend always setting the OpenMP variables as follows, even when not using OpenMP explicitly:

      export OMP_NUM_THREADS=1    # Set >1 if using OpenMP
      export OMP_PLACES=threads
      export OMP_PROC_BIND=spread
      
  • Some plugins/third-party codes do not run on GPUs and may hang when run with the GPU version of VASP. If you have an issue running a third party code with a VASP GPU module, try running using the corresponding CPU module.

  • Calculations on very small systems may hang if run with too many MPI ranks.

My VASP job runs out of memory

VASP jobs containing many atoms, many k-points, or large FFT grids may fail due to lack of memory resources. Please see the VASP Wiki Not Enough Memory page for an explanation of VASP memory usage. The general strategy to solve out-of-memory errors is to increase the number of nodes or GPUs requested as this increases the pool of available memory.

Sample out-of-memory failure when exceeding host (CPU) memory:

CPU OOM

Sample out-of-memory failure when exceeding device (GPU) memory:

GPU OOM

To solve out-of-memory errors, try the following steps:

  • Check the memory cost table (in OUTCAR) to make sure that the cost is what you expect. Below is a sample table:
 total amount of memory used by VASP MPI-rank0 42743434. kBytes
=======================================================================

   base      :      30000. kBytes                    
   nonlr-proj:      97297. kBytes                  
   fftplans  :      63529. kBytes                    
   grid      :     228114. kBytes                    
   one-center:       9953. kBytes                  
   wavefun   :   42314541. kBytes
  • Make a "test" run of your job with a short time limit, and examine how the memory cost table changes as you increase the number of nodes. Try increasing the number of nodes while leaving the number of MPI ranks-per-node constant. Alternatively, you may find it helpful to increase the number of nodes while leaving the total number of MPI ranks constant. Increase the node count until the value for "total amount of memory used by VASP MPI-rank0" multiplied by the number of ranks-per-node of your job fits inside the node's memory limit (512 GB and 256 GB for Perlmutter CPU and GPU nodes, respectively).

  • For GPU jobs, one may also be limited by memory on the GPU card itself. The Slurm setting #SBATCH -C gpu may request either 40 GB or 80 GB A100 GPUs. One can explicitly request only 80 GB A100 GPU nodes by setting #SBATCH -C gpu&hbm80g instead. Please keep in mind that the 80 GB nodes are a limited commodity and that your queue time may increase significantly with this setting.

  • The choice of VASP executable also influences the memory cost, with vasp_ncl > vasp_std > vasp_gam. Check if your calculation can be completed with an executable which uses less memory.

  • K-point parallelization improves performance but also increases memory usage. You can reduce the memory cost by decreasing the value of KPAR in your INCAR file, with KPAR=1 giving the lowest memory usage.

My VASP job crashes with a '!BUG!' message in my OUTCAR file

Example:

 -----------------------------------------------------------------------------
|                     _     ____    _    _    _____     _                     |
|                    | |   |  _ \  | |  | |  / ____|   | |                    |
|                    | |   | |_) | | |  | | | |  __    | |                    |
|                    |_|   |  _ <  | |  | | | | |_ |   |_|                    |
|                     _    | |_) | | |__| | | |__| |    _                     |
|                    (_)   |____/   \____/   \_____|   (_)                    |
|                                                                             |
|     internal error in: mkpoints_change.F  at line: 185                      |
|                                                                             |
|     internal ERROR in RE_READ_KPOINTS_RD: the new k-point set for the       |
|     reduced symmetry case                                                   |
|      does not contain all original k-points. Try to switch off symmetry     |
|                                                                             |
|     If you are not a developer, you should not encounter this problem.      |
|     Please submit a bug report.                                             |
|                                                                             |
 -----------------------------------------------------------------------------

Take the following steps:

  • Try the work-around suggestions provided by the 'BUG!' message, if any.

  • Check the VASP Known Issues page and search the VASP User Forum to see if this issue has been previously reported.

  • If this issue has not been reported, file a bug report with the VASP developers via the VASP Portal. Be sure to provide the VASP developers with all of the input files needed to reproduce your issue (INCAR, POSCAR, KPOINTS files, along with your job script).

  • If the VASP developers provide a patch or recommend a change in the build, contact the NERSC Help Desk for further assistance, and provide NERSC with a link to your post on the VASP User Forum.

My VASP jobs are running but there are problems with convergence in the SCF or relaxation

This type of issue has two common causes:

  • The positions of the atoms are bad, e.g. two atoms are too close to one another (possibly as a consequence of cell volume or symmetry). This usually occurs at the start of the calculation but it can also result from a geometry update step.

  • The algorithm used for the SCF is a poor match to the electronic structure of the molecule or material system.

For issues with electronic relaxation, see the VASP Developers' documentation for the ALGO parameter. For issues with geometry relaxation, see the VASP Developers' page for the IBRION parameter.

My VASP jobs crash with an error in a library routine

Example:

Error EDDDAV: Call to ZHEGV failed. Returncode = <value>

This message may be preceded by one or more warnings of the type:

WARNING: Sub-Space-Matrix is not hermitian in DAV

See above issue: "My VASP jobs are running but there are problems with convergence in the SCF or relaxation".

For additional help with troubleshooting

If you encounter an issue that is not described above or need further assistance, see the VASP users forum for technical help and support-related questions; see also the list of known issues. For help with issues specific to the NERSC module, please file a support ticket. Below are some general guidelines for where to seek assistance in various cases:

When to file a ticket with the VASP developers:

Examples of issues that should be directed to the VASP developers at the VASP Portal include:

  • You encounter a !BUG! message while running.

  • Your job fails due to algorithmic issues, such as SCF convergence or relaxation.

  • You would like advice related to choosing pseudopotentials, INCAR parameters, or program options.

When to file a ticket with the NERSC Help Desk:

Common examples of issues that should be directed to the NERSC Help Desk include:

  • Your job fails with Slurm or MPICH errors.

  • You have a job which previously ran successfully on Perlmutter using a NERSC-provided module but suddenly begins failing.

  • You need to create a job script for a specific purpose, such as a checkpoint/restart run or an automated workflow.

  • You would like advice specific to compiling or running on Perlmutter.

Advice for filing VASP support tickets

When you file a VASP-related ticket at the NERSC Help Desk, your issue will be initially handled by the Consultant on Duty (CoD) who may or may not be a VASP expert. The CoD will either answer the ticket themselves or will route it to a specialist. In general, the more information that you provide in the initial ticket, the easier it is for the CoD to direct your ticket to the NERSC staff that is best equipped to answer your issue. We recommend designating your issue as one of the three categories:

  • Runtime issue: this includes problems that you may have when running an individual VASP calculation, such as Slurm errors, segmentation faults, MPICH errors, and CUDA errors.

  • Workflow issue: this includes most problems that do not appear in a single job but which occur when multiple VASP jobs are involved, such as:

    • Using and automated environment to run a collection of VASP jobs.

    • Running many VASP jobs in parallel using a single job script.

    • Checkpointing and restarting VASP.

    • Setting up a workflow containing VASP jobs which depend on other jobs.

    • Setting up jobs with unusual resource requirements, e.g. very long running jobs or jobs requiring a large fraction of NERSC resources.

  • Build issue: this includes problems that you may face if you compile your own version of VASP, such as compiler or linker errors, or if your VASP build has runtime issues that do not appear in the NERSC-provided modules.

For runtime issues, answer the following questions when submitting your request:

  • What incorrect behavior do you observe, and what error messages do you receive (if any)?

  • What is the path to the location of your job on Perlmutter?

  • What is your job script?

Answers to the following questions may also help NERSC diagnose your issue:

  • When did the error first appear?

  • Is the error reproducible when run on different nodes?

  • Did you try a different VASP binary or a different module?

  • Is this error specific to CPU or GPU runs, or does it appear in both?

For workflow issues, please answer the following:

  • What is the "big picture" of what you would like to accomplish with the workflow?

  • At which step in the workflow are you experiencing the issue?

  • What is your job script?

For build issues, please answer the following:

  • Do you intend to run your build on CPUs or GPUs?

  • What is the makefile.include that you used to build?

  • Which programming environment do you have loaded (e.g. PrgEnv-gnu)?

  • Are there third party codes or other VASP plugins that you included?

  • Did you modify the VASP source code prior to building?

Performance Guidance

Users are recommended to refer to the paper to run VASP efficiently on Perlmutter. The VASP developers' parallelization page also provides a detailed explanation of how VASP parallelization works along with advice for optimizing parallel performance.

The following tips may also help improve the performance of VASP jobs:

  • Select the GPU builds whenever possible as GPU performance is superior to CPU performance for most calculation types.

  • GPU builds for VASP 6.4.1 and later include OpenMP parallelization, so note that:

    • Without using OpenMP, one can only use up to 4 MPI ranks per node (corresponding to 1:1 MPI-to-GPU mapping), leaving 60 CPU cores per node idle. With OpenMP one can exploit these CPU cores by setting export OMP_NUM_THREADS=<value> with <value> set as large as 16.

    • Creating OpenMP threads requires some overhead which ideally should be amortized by the work done by those threads. Most of the computational workload is already handled by the GPUs, so there may or may not be enough work left over for the CPUs to justify creating the OpenMP threads. The optimal number of OpenMP threads for a calculation depends both on the system size and on details of the algorithms used in the calculation.

  • For VASP 5 CPU builds, one can improve parallel performance by setting the NCORE parameter in the INCAR file. The value of NCORE should factorize the number of MPI ranks per node and should ideally be close to the square root of the number of ranks per node. On Perlmutter CPU using 128 ranks per node, NCORE=8 is usually a reasonable setting.

  • For VASP 6 CPU builds, one can improve parallel performance either by setting NCORE or by setting the number of OpenMP threads greater than 1, but not both (i.e. setting OMP_NUM_THREADS>1 forces NCORE to be set to 1).

  • For all VASP builds, one can turn on parallelization over k-points by setting KPAR>1 in the INCAR file. One should choose a value for KPAR which factorizes the total number of MPI ranks and the number of irreducible k-points.

    Note

    K-point parallelization can significantly improve performance but increases the memory cost of the calculation.

  • Users who run large numbers of small jobs may benefit from using either mvasp or one of the workflow tools supported at NERSC.

We strongly encourage users to perform benchmarking tests with various OpenMP settings and with different values of NCORE and KPAR, as applicable, before beginning production calculations.

Building VASP from Source

Some users may be interested in building VASP themselves. As an example we outline the process for building the VASP 6.3.2 binaries. First, download vasp6.3.2.tgz from VASP Portal to your cluster and run,

tar -zxvf vasp6.3.2.tgz
cd vasp.6.3.2

e.g. in your home directory, to unpack the archive and navigate into the VASP main directory.

One needs a makefile.include file to build the code; samples are available in the arch directory in the unpacked archive and are also provided in the installation directories of the NERSC modules. Additional makefile.include examples are available from the VASP developers at the wiki page.

For example, the makefile.include file that we used to build the vasp/6.3.2-gpu module is located at,

/global/common/software/nersc/pm-stable/sw/vaspPT/vasp.6.3.2-std/

Execute module show <a vasp module> to find the installation directory of the NERSC module. Copy the sample makefile.include file from the NERSC installation directory to the root directory of your local VASP 6.3.2 build.

Note

makefile.include files are compiler-specific and must be used with the corresponding programming environment at NERSC. Existing NERSC builds use makefile.include files that are only compatible with the PrgEnv-nvidia environment. Other makefile.include files may require loading a different programming environment.

A user may be interested in augmenting the functionality of VASP by activating certain plugins. See the developer's list of plugin options for instructions on how to modify makefile.include for common supported features.

Note

Plugins must be built using the same compilers used to build VASP, or errors and other unexpected behavior may occur at link time or at run time.

The next step is to prepare your environment for building VASP. The NERSC-provided VASP modules were built after preloading the following modules:

Setting up the GPU build environment
module reset
module load gpu
module load PrgEnv-nvidia
module load cray-hdf5 cray-fftw
module load nccl

Note

As of 04/11/2024, the current default version of nccl, nccl/2.19.4 does not work with VASP. We recommend loading nccl/2.18.3-cu12 instead.

Note

VASP versions (6.3.0-) which use the OpenACC GPU port require a patch to build with Cray MPICH. See this post for details.

Setting up the CPU build environment
module reset
module load cpu
module load PrgEnv-nvidia
module load cray-hdf5 cray-fftw

Once the module environment is ready, run

make std ncl gam

to build all binaries: vasp_std, vasp_gam, and vasp_ncl.

Instructions specific to mvasp builds

For mvasp builds, execute the following commands immediately after unpacking the archive to apply the latest patch:

cd vasp.5.4.4.pl2
git clone https://github.com/zhengjizhao/mpi_wrapper.git
patch -p0 < mpi_wrapper/patch_vasp.5.4.4.pl2_mpi_wrapper.diff

Then proceed with the build instructions described in the previous section.

User Contributed Information

Please help us improve this page

Users are invited to contribute helpful information and corrections through our GitLab repository.