Skip to content

WRF

Build WRF

Required modules

The majority of the WRF model code is written in Fortran, but some part and ancillary programs are written in C (WRF UG ch.2). For most cases, we run WRF using either shared-memory parallelism using the OpenMP application programming interface,
distributed memory message passing (MPI) parallelism across nodes, or both of them as hybrid. Therefore, we need to use Fortran and C compilers that supports OpenMP along with the MPI library. For compiling such a complex program, NERSC provides compiler wrappers that combine compilers and various libraries (including MPI) necessary to run shared- and distributed-memory program on the NERSC systems.

In addition, WRF requires the netCDF library for input and output. WRF can use the parallel netcdf library to read/write netcdf files through multiple MPI tasks simultaneously, taking advantage of the Lustre file system of the Perlmutter scratch space. WRF can also use the file compression functionality from the netCDF4.0 or later version, which depends on the HDF5 library . See Balle & Johnsen (2016) for WRF I/O options and their performance.

Note

NetCDF4 (and underlying HDF5) library provides parallel read/write functionality, which is currently available as one of I/O options in WRF (README.netcdf4par). However, experiments by a WRF-SIG member found that the netcdf4 parallel I/O is significantly slower than the I/O using the parallel netcdf library.

Note

Another useful knowledge about netCDF library is the limitation on the size of the variable in a file, which depends on the netCDF data format (CDF1 = Classic ->2GB, CDF2=64-bit offset ->4GB, netCDF4 and CDF5 -> unlimited). See the table "Large File Support" in the NetCDF Users Guide . The current WRF code supports serial I/O of CDF1, CDF2, and netCDF4. The WRF's interface to the parallel netcdf library supports CDF1 and CDF2.

Note

If a user runs a high-resolution, large-domain simulation with the number of columns greater than ~1500 x ~1500, a 3D variable will be larger than 4GB and it is necessary to modify WRF's I/O source code to use the CDF5 format.

Our experience shows that using the netCDF and parallel netcdf libraries provide flexible (serial or parallel netcdf options can be specified in the run-time WRF namelist) and much faster I/O on the scratch system (parallel netCDF I/O is 10--20 times faster than serial I/O). Therefore, we recommend to build WRF with the netCDF (cray-netcdf module) and parallel netCDF (cray-parallel-netcdf module) libraries. This I/O choice is activated by setting a few environmental variables when compiling WRF after loading the netcdf and parallel netcdf libraries. With these two modules, we set the following environmental variables when compiling WRF

...
module load cray-hdf5   #the netcdf library depends on hdf5
module load cray-netcdf
module load cray-parallel-netcdf
...

export NETCDF_classic=1               #use classic (CDF1) as default
#use 64-bit offset format (CDF2) of netcdf files
export WRFIO_NCD_LARGE_FILE_SUPPORT=1 
#netcdf4 compression (serial) with the hdf5 module can be very slow
export USE_NETCDF4_FEATURES=0

and then specify the CDF2 format for high-resolution simulations at run-time by setting the namelist variables io_form_history, io_form_restart, etc., to be 11 for parallel netcdf I/O (instead of 2 for standard serial netcdf).

Note

[A discussion in the WRF user forum] (https://forum.mmm.ucar.edu/threads/solved-netcdf-error-when-attempting-to-using-pnetcdf-quilting-with-wrf.9015/) suggests that not only wrfinput but also wrfbdy data can be read through the parallel netcdf library if WPS is compiled appropriately with the parallel netcdf.

Example module loading script

#!/bin/bash
set -e
machname="Perlmutter"
scname=$BASH_SOURCE  #name of this script

echo "loading modules using ${scname}"
echo "target system: ${machname} "

#users may want to unload unnecessary/conflicting modules loaded in .bash_profile. 
#e.g., hugepages. But keep other modules loaded automatically by the system. 
#each user has to edit here:
# module unload craype-hugepages64M

#general modules
module load cpu  
module load PrgEnv-gnu 

#module for WRF file I/O
#order of loading matters!
module load cray-hdf5  #required to load netcdf library
module load cray-netcdf 
module load cray-parallel-netcdf

module list

Build WRF on Perlmutter

WRF's build process starts with running the "configure" csh script that comes with the WRF source code package. This script automatically checks the computing platform and asks for a user input about the parallel job configuration.

On Perlmutter, we have tested the default gnu environment. Tested inputs to the "configure" csh script are gnu (dm+sm) and basic nesting:

Please select from among the following Linux x86_64 options:
....
32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)
...
Enter selection [1-75] :
Compile for nesting? (0=no nesting, 1=basic,...) [default 0] :

For real cases (not idealized cases like the 2d squall line), we recommend the option 35 (dm+sm) based on our experience of 4 threads per MPI rank (dm+sm) performing better than the pure MPI (dm) using the same number of nodes. We will update the WRF performance evaluation & scaling on Perlmutter in late 2023.

After running the configure program, we run the "compile" csh script in the top directory of the WRF source code. In the example bash script below, we do this by setting the following

doclean_all=false #true if previously compiled with different configure options

doclean=false

runconf=false

docompile=false

The compile script does several checks and invokes the make command, among other things.

Example WRF build script for Perlmutter

#!/bin/bash -l
set -e
set -o pipefail 

imach="pm"  #target system name. "pm" for Perlmutter.

#change the following boolean variables to run/skip certain compiling steps
doclean_all=true #true if previously compiled with different configure options

doclean=false

runconf=true    #run WRF's configure script; should do this first before compiling

docompile=false  #run WRF's compile script; should do this after configure

debug=false  #true to compile WRF with debug flag (no optimizations, -g flag for debugger, etc.)

# WRF directories
mversion="v4.4"
#WRF-SIG project directory as example; accessible only by WRF-SIG members
wrfroot="/global/cfs/cdirs/m4232/model"  
script_dir="/global/cfs/cdirs/m4232/scripts/build"

export WRF_DIR=${wrfroot}/${imach}/${mversion}/WRF

#Modules --------------------------------------------------------------------
modversion="2023-09"  #use year of major update that module (default) are introduced (INC0182147)
loading_script="${script_dir}/load_modules_${modversion}_wrfsig.sh"
source ${loading_script}

#set environmental variables used by WRF build system, borrowing environmental variables 
#set by modules

export NETCDF_classic=1               #use classic (CDF1) as default
export WRFIO_NCD_LARGE_FILE_SUPPORT=1 #use 64-bit offset format (CDF2) of netcdf files
export USE_NETCDF4_FEATURES=0         #do not use netcdf4 compression (serial), need hdf5 module
#configure says WRF won't use netcdf4 compression, but I still see a flag
#NETCDF4_IO_OPTS = -DUSE_NETCDF4_FEATURES..., confusing..


export HDF5=$HDF5_DIR
export HDF5_LIB="$HDF5_DIR/lib"
export HDF5_BIN="$HDF5_DIR/bin"

export NETCDF=$NETCDF_DIR
export NETCDF_BIN="$NETCDF_DIR/bin"
export NETCDF_LIB="$NETCDF_DIR/lib"

#create PNETCDF environment variable to use the parallel netcdf library
export PNETCDF=$PNETCDF_DIR  

export LD_LIBRARY_PATH="/usr/lib64":${LD_LIBRARY_PATH}
#export PATH=${NETCDF_BIN}:${PATH}
export PATH=${NETCDF_BIN}:${HDF5_BIN}:${PATH}
export LD_LIBRARY_PATH=${NETCDF_LIB}:${LD_LIBRARY_PATH}


#other special flags to test
export PNETCDF_QUILT="0"  #Quilt output is not stable, better not use it

#check environment variables
echo "LD_LIBRARY_PATH: "$LD_LIBRARY_PATH
echo "PATH: "$PATH
echo "MANPATH: "$MANPATH

echo "NETCDF is $NETCDF"
echo "NETCDF_LIB is $NETCDF_LIB"

echo "HDF5 is $HDF5"
echo "HDF5_LIB is $HDF5_LIB"

echo "PNETCDF: ${PNETCDF}"
echo "PNETCDF_QUILT: ${PNETCDF_QUILT}"

##capture starting time for log file name
idate=$(date "+%Y-%m-%d-%H_%M")
#
##run make in the top directory
cd $WRF_DIR

if [ "$doclean_all" = true ]; then
    ./clean -a
    #"The './clean –a' command is required if you have edited the configure.wrf 
    #or any of the Registry files.", but this deletes configure.wrf....

fi

if [ "$doclean" = true ]; then
    ./clean
fi

#echo "running configure"
if [ "$runconf" = true ]; then

    if [ "$debug" = true ]; then
        echo "configure debug mode"
        ./configure -d
    else
        ./configure
    fi

   ##configure options selected are:
   # 32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)
   # choose 35 for real (not idealized) cases

    configfile="${WRF_DIR}/configure.wrf"

    #the sed commands below will change the following lines in configure.wrf
    #--- original
    #SFC             =       gfortran
    #SCC             =       gcc
    #CCOMP           =       gcc
    #DM_FC           =       mpif90
    #DM_CC           =       mpicc

    #--- edited (FC and CC with MPI)
    #SFC             =       gfortran
    #SCC             =       gcc
    #CCOMP           =       cc
    #DM_FC           =       ftn
    #DM_CC           =       cc

    if [ -f "$configfile" ]; then
        echo "editing configure.wrf"
        #need to remove -cc=$(SCC) in DM_CC
        sed -i 's/-cc=\$(SCC)/ /' ${configfile}
        sed -i 's/mpif90/ftn/' ${configfile}
        sed -i 's/mpicc/cc/' ${configfile}

        #also user can remove the flag -DWRF_USE_CLM from ARCH_LOCAL if not planning to 
        #use the CLM4 land model to speed up compilation
        #sed -i 's/-DWRF_USE_CLM/ /' ${configfile} 

    fi

fi

if [ "$docompile" = true ]; then
    export J="-j 4"  #build in parallel
    echo "J = $J"

    bldlog=${script_dir}/compile_em_${idate}_${imach}.log
    echo  "compile log file is ${bldlog}"

    #run the compile script 
    ./compile em_real &> ${bldlog}

    #check if there is an error in the compile log
    #grep command exits the script in case of nomatch after the 2022-12 maintenance
    set +e #release the exit flag before grep

    grep "Problems building executables" compile_em_real_${idate}_${imach}.log     
    RESULT=$?

    #set the exit flag again
    set -e  

    if [ $RESULT -eq 0 ]; then
        echo "compile failed, check ${bldlog}"      
    else
        echo "compile success"
        #sometimes renaming executable with descriptive information is useful
        #cp $WRF_DIR/main/ideal.exe $WRF_DIR/main/ideal_${idate}_${imach}.exe
        #cp $WRF_DIR/main/real.exe $WRF_DIR/main/real_${idate}_${imach}.exe
        #cp $WRF_DIR/main/wrf.exe $WRF_DIR/main/wrf_${idate}_${imach}.exe
        #cp $WRF_DIR/main/ndown.exe $WRF_DIR/main/ndown_${idate}_${imach}.exe
    fi

fi

As seen in the example script, the compiler names need to be edited in the configure.wrf file. Specifically, we need to change the compiler names for MPI applications to compiler wrappers.

  • change "mpif90" to "ftn" for the DM_FC flag
  • change "mpicc" to "cc" for the DM_FC flag
  • keep SFC and SCC to be the base compiler (gfortran and gcc)

These edits allow us to compile WRF on the login node.

Run WRF

Experience of WRF-SIG members and NERSC Best Practice recommends the following:

  1. Use the scratch space for model execution and set an appropriate file strip on the execution directory.

  2. Use the parallel netcdf library I/O Time spent on history and restart file writing is reduced by at least 30%, often to be 1/10 of a serial netCDF I/O (need to set an appropriate stripe size for the output directory).

  3. For simple use-cases such as a single domain (no nesting), use four OpenMP threads instead of using all the available physical cores for MPI tasks. On Perlmutter for the COUNS 2.5km benchmark case, an 8-node job with 4 OpenMP threads and 256 MPI ranks performs as almost equally fast as a 16-nodes job with 2048 MPI ranks without OpenMP threads. The latter (16 nodes) is twice more expensive than the former (8 nodes); note the charge to project allocation depends on the number of nodes used and the wall-clock hours, among others Compute Usage Charging. However, a WRF-SIG member found no computatinal advantage of OpenMP threads for a nested high-resolution (LES) case.

Example WRF sbatch script for Perlmutter

#!/bin/bash 
#SBATCH -N 1
#SBATCH -q debug
#SBATCH -t 00:30:00
#SBATCH -J test
#SBATCH -A <account>   #user needs to change this
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<email address>  #and this
#SBATCH -L scratch,cfs
#SBATCH -C cpu
#SBATCH --tasks-per-node=64

pwd
ntile=4  #number of OpenMP threads per MPI task
#need to set the "numtiles" variable in the wrf namelist (namelist.input) to be the same 

#example using the WRFSIG project CFS directories; 
#files only accesible by the WRFSIG members

bindir="/global/common/software/m4232/bin"
binname="wrf_2023-01-09-15_55_pm.exe"
rundir="/pscratch/sd/e/elvis/simulation/WRF/run" #user needs to change this

#Modules --------------------------------------------------------------------
#use the example module-loading script given above; accessible for WRF-SIG members
modversion="2023-09" 
loading_script="/global/cfs/cdirs/m4232/scripts/build/load_modules_${modversion}_wrfsig.sh"
source ${loading_script}

#OpenMP settings:
export OMP_NUM_THREADS=$ntile
export OMP_PLACES=threads  #"true" when not using multiple OpenMP threads (i.e., ntile=1)
export OMP_PROC_BIND=spread

cd $rundir

#run simulation
srun -n 64 -c 4 --cpu_bind=cores ${bindir}/${binname}

#rename and save the process 0 out and err files
cp rsl.error.0000 rsl.error_0_$SLURM_JOBID
cp rsl.out.0000 rsl.out_0_$SLURM_JOBID

To set appropriate options for the srun command (by considering process and thread affinity), users are encouraged to use the jobscript generator.

Reference

Balle, T., & Johnsen, P. (2016). Improving I / O Performance of the Weather Research and Forecast ( WRF ) Model. Cray User Group.