Build WRF¶
Required modules¶
The majority of the WRF model code is written in Fortran, but some part and ancillary programs are written in C (WRF UG ch.2). For most cases, we run WRF using either shared-memory parallelism using 1) the OpenMP application programming interface, 2)
distributed memory message passing (MPI) parallelism across nodes, or 3) both of them as hybrid. Therefore, we need to use Fortran and C compilers that supports OpenMP along with the MPI library. For compiling such a complex program, NERSC provides compiler wrappers that combine compilers and various libraries (including MPI) necessary to run shared- and distributed-memory program on the NERSC systems.
In addition, WRF requires the netCDF library for input and output. WRF can use the parallel netcdf library to read/write netcdf files through multiple MPI tasks simultaneously, taking advantage of the Lustre file system of the Perlmutter scratch space. WRF can also use the file compression functionality from the netCDF4.0 or later version, which depends on the HDF5 library . See Balle & Johnsen (2016) for WRF I/O options and their performance.
NetCDF4 (and underlying HDF5) library provides parallel read/write functionality, which is currently available as one of I/O options in WRF (README.netcdf4par). However, experiments by a WRF-SIG member found that the netcdf4 parallel I/O is significantly slower than the I/O using the parallel netcdf library.
Another useful knowledge about netCDF library is the limitation on the size of the variable in a file, which depends on the netCDF data format (CDF1 = Classic ->2GB, CDF2=64-bit offset ->4GB, netCDF4 and CDF5 -> unlimited). See the table "Large File Support" in the NetCDF Users Guide . The current WRF code supports serial I/O of CDF1, CDF2, and netCDF4. The WRF's interface to the parallel netcdf library supports CDF1 and CDF2.
If a user runs a high-resolution, large-domain simulation with the number of columns greater than ~1500 x ~1500, a 3D variable will be larger than 4GB and it is necessary to modify WRF's I/O source code to use the CDF5 format.
Our experience shows that using the netCDF and parallel netcdf libraries provide flexible and much faster I/O on the scratch system. Serial or parallel netcdf options can be specified in the run-time WRF namelist, and the parallel netCDF I/O is 10--20 times faster than the serial I/O option. Therefore, we recommend to build WRF with the netCDF (cray-netcdf module) and parallel netCDF (cray-parallel-netcdf module) libraries. This I/O choice is activated by setting a few environmental variables when compiling WRF after loading the netcdf and parallel netcdf libraries. With these two modules, we set the following environmental variables when compiling WRF
module load cray-hdf5 #the netcdf library depends on hdf5
module load cray-netcdf
module load cray-parallel-netcdf
export NETCDF_classic=1 #use classic (CDF1) as default
#use 64-bit offset format (CDF2) of netcdf files
#netcdf4 compression (serial) with the hdf5 module can be very slow
and then specify the CDF2 format for high-resolution simulations at run-time by setting the namelist variables io_form_history, io_form_restart, etc., to be 11 for parallel netcdf I/O (instead of 2 for standard serial netcdf).
[A discussion in the WRF user forum] (https://forum.mmm.ucar.edu/threads/solved-netcdf-error-when-attempting-to-using-pnetcdf-quilting-with-wrf.9015/) suggests that not only wrfinput but also wrfbdy data can be read through the parallel netcdf library if WPS is compiled appropriately with the parallel netcdf.
Build WRF on Perlmutter¶
WRF's build process starts with running the "configure" csh script that comes with the WRF source code package. This script automatically checks the computing platform and asks for a user input about the parallel job configuration.
On Perlmutter, we have tested the default gnu environment. Tested inputs to the "configure" csh script are gnu (dm+sm) and basic nesting:
Please select from among the following Linux x86_64 options:
32. (serial) 33. (smpar) 34. (dmpar) 35. (dm+sm) GNU (gfortran/gcc)
Enter selection [1-75] :
Compile for nesting? (0=no nesting, 1=basic,...) [default 0] :
For real cases (not idealized cases like the 2d squall line), we generally recommend the option 35 (dm+sm) based on our experience of 4 threads per MPI rank (dm+sm) performing better than the pure MPI (dm) using the same number of nodes. However, the performance is sensitive to grid resolution, grid nesting, and other aspects of the model configuration; for each case a user needs to experiment to find optimal parallelism configurations.
After running the configure program, we run the "compile" csh script in the top directory of the WRF source code to compile.
We can execute all these steps in a bash script like the following:
Example WRF build script for Perlmutter¶
set -e
set -o pipefail
#change the following boolean variables to run/skip certain compiling steps
doclean=false #true if WRF source code is modified since the last compilation
doclean_all=false #true if previously compiled with different configure options
runconf=true #run WRF's configure script; should do this first before compiling
docompile=true #run WRF's compile script; should do this after configure
debug=false #true to compile WRF with debug flag (no optimizations, -g flag for debugger, etc.)
imach="pm" #target system name. "pm" for Perlmutter.
# set the top directory of the WRF source code as an environmental variable
#Modules --------------------------------------------------------------------
#general modules
module load cpu
module load PrgEnv-gnu
#module for WRF file I/O
#order of loading matters!
module load cray-hdf5 #required to load netcdf library
module load cray-netcdf
module load cray-parallel-netcdf
module list #check what modules are loaded
#set environmental variables used by WRF build system,
#using the environmental variables set by the modules
#use classic (CDF1) as default
export NETCDF_classic=1
#use 64-bit offset format (CDF2) of netcdf files
#do not use netcdf4 compression (serial), need hdf5 module
export HDF5=$HDF5_DIR
export HDF5_LIB="$HDF5_DIR/lib"
export HDF5_BIN="$HDF5_DIR/bin"
#create PNETCDF environment variable to use the parallel netcdf library
export LD_LIBRARY_PATH="/usr/lib64":${LD_LIBRARY_PATH}
#other special flags
export PNETCDF_QUILT="0" #Quilt output is not stable, better not use it
#check environment variables
echo "PATH: "$PATH
echo "NETCDF is $NETCDF"
echo "HDF5 is $HDF5"
echo "HDF5_LIB is $HDF5_LIB"
##capture the date and time for log file name
idate=$(date "+%Y-%m-%d-%H_%M")
##run WRF build scripts located in the top WRF directory
if [ "$doclean_all" = true ]; then
./clean -a
#"The './clean –a' command is required if you have edited the configure.wrf
#or any of the Registry files.", but this deletes configure.wrf....
if [ "$doclean" = true ]; then
#echo "running configure"
if [ "$runconf" = true ]; then
if [ "$debug" = true ]; then
echo "configure debug mode"
./configure -d
##configure options selected are:
# 32. (serial) 33. (smpar) 34. (dmpar) 35. (dm+sm) GNU (gfortran/gcc)
# choose 35 for real (not idealized) cases
#the sed commands below will change the following lines in configure.wrf
#--- original
#SFC = gfortran
#SCC = gcc
#CCOMP = gcc
#DM_FC = mpif90
#DM_CC = mpicc
#--- edited (FC and CC with MPI)
#SFC = gfortran
#SCC = gcc
#CCOMP = cc
#DM_FC = ftn
#DM_CC = cc
if [ -f "$configfile" ]; then
echo "editing configure.wrf"
#need to remove -cc=$(SCC) in DM_CC
sed -i 's/-cc=\$(SCC)/ /' ${configfile}
sed -i 's/mpif90/ftn/' ${configfile}
sed -i 's/mpicc/cc/' ${configfile}
#also user can remove the flag -DWRF_USE_CLM
#from ARCH_LOCAL if not planning to
#use the CLM4 land model to speed up compilation
#sed -i 's/-DWRF_USE_CLM/ /' ${configfile}
if [ "$docompile" = true ]; then
export J="-j 4" #build in parallel
echo "J = $J"
echo "compile log file is ${bldlog}"
#run the compile script
./compile em_real &> ${bldlog}
#check if there is an error in the compile log
#grep command exits the script in case of nomatch
#after the 2022-12 maintenance
set +e #release the exit flag before grep
grep "Problems building executables" ${bldlog}
#set the exit flag again
set -e
if [ $RESULT -eq 0 ]; then
echo "compile failed, check ${bldlog}"
echo "compile success"
#sometimes renaming executable with descriptive information is useful
#cp $WRF_DIR/main/ideal.exe $WRF_DIR/main/ideal_${idate}_${imach}.exe
#cp $WRF_DIR/main/real.exe $WRF_DIR/main/real_${idate}_${imach}.exe
#cp $WRF_DIR/main/wrf.exe $WRF_DIR/main/wrf_${idate}_${imach}.exe
#cp $WRF_DIR/main/ndown.exe $WRF_DIR/main/ndown_${idate}_${imach}.exe
As seen in the example script, the compiler names need to be edited in the configure.wrf file (we use the sed command to do this in the script). Specifically, we need to change the compiler names for MPI applications to compiler wrappers.
- change "mpif90" to "ftn" for the DM_FC flag
- change "mpicc" to "cc" for the DM_FC flag
- keep SFC and SCC to be the base compiler (gfortran and gcc)
These edits allow us to compile WRF on the login node.
Also note that in the example bash script above, we control which steps to run or not to run by the following logical variables.
For example, when we download the source code and compile out of the box for the first time, we would set
then the example shell script will run the WRF configure script and WRF compile script. The compile script does several checks and invokes the make command, among other things.
We may make simple changes to some WRF source code files, and want to to build new executable files from the modified source code (with the same model configuration). In such a case, we set:
Then the bash script runs WRF's clean script to remove all object and executable files and recompile the code.
If we want to change model configuration (e.g., nesting option or dynamical core), add new variables to the WRF registry , or turn on/off the chemistry option, then
Then the bash script runs WRF's clean script with the "-a" option to remove all files created during and after the configuration and compilation, and run the configure and compile scripts.
Run WRF¶
Experience of WRF-SIG members and NERSC Best Practice recommends the following:
Use the scratch space for model execution and set an appropriate file strip on the execution directory.
Use the parallel netcdf library I/O Time spent on history and restart file writing is reduced by at least 30%, often to be 1/10 of a serial netCDF I/O (need to set an appropriate stripe size for the output directory).
For simple use-cases such as a single domain (no nesting), use four OpenMP threads instead of using all the available physical cores for MPI tasks. On Perlmutter for the COUNS 2.5km benchmark case, an 8-node job with 4 OpenMP threads and 256 MPI ranks performs as almost equally fast as a 16-nodes job with 2048 MPI ranks without OpenMP threads. The latter (16 nodes) is twice more expensive than the former (8 nodes); note the charge to project allocation depends on the number of nodes used and the wall-clock hours, among others (Compute Usage Charging). However, a WRF-SIG member found no computational advantage of OpenMP threads for a nested high-resolution (LES) case.
Other useful resources:
Example WRF sbatch script for Perlmutter¶
#SBATCH -q debug
#SBATCH -t 00:30:00
#SBATCH -J test
#SBATCH -A <account> #user needs to change this
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<email address> #and this
#SBATCH -L scratch,cfs
#SBATCH -C cpu
#SBATCH --tasks-per-node=64 #user needs to experiment this value
ntile=4 #number of OpenMP threads per MPI task
#need to set the "numtiles" variable in the wrf namelist (namelist.input) to be the same
wrfexe="PATH_to_your_wrf.exe" #recommend to save the executable in global common or scratch space
rundir="/pscratch/sd/e/elvis/simulation/WRF/run" #where to run WRF; user needs to change this
#Modules --------------------------------------------------------------------
#general modules
module load cpu
module load PrgEnv-gnu
#module for WRF file I/O
#order of loading matters!
module load cray-hdf5 #required to load netcdf library
module load cray-netcdf
module load cray-parallel-netcdf
#OpenMP settings:
export OMP_NUM_THREADS=$ntile
export OMP_PLACES=threads #"true" when not using multiple OpenMP threads (i.e., ntile=1)
export OMP_PROC_BIND=spread
export OMP_STACKSIZE=64M #increase memory segment to store local variables, needed by each thread
cd $rundir
#run simulation
srun -n 64 -c 4 --cpu_bind=cores ${wrfexe}
#rename and save the process 0 out and err files
cp rsl.error.0000 rsl.error_0_$SLURM_JOBID
cp rsl.out.0000 rsl.out_0_$SLURM_JOBID
To set appropriate options for the srun command (by considering process and thread affinity), users are encouraged to use the jobscript generator.