WRF¶

Build WRF¶

Required modules¶

The majority of the WRF model code is written in Fortran, but some part and ancillary programs are written in C (WRF UG ch.2). For most cases, we run WRF using either shared-memory parallelism using 1) the OpenMP application programming interface, 2)
distributed memory message passing (MPI) parallelism across nodes, or 3) both of them as hybrid. Therefore, we need to use Fortran and C compilers that supports OpenMP along with the MPI library. For compiling such a complex program, NERSC provides compiler wrappers that combine compilers and various libraries (including MPI) necessary to run shared- and distributed-memory program on the NERSC systems.

In addition, WRF requires the netCDF library for input and output. WRF can use the parallel netcdf library to read/write netcdf files through multiple MPI tasks simultaneously, taking advantage of the Lustre file system of the Perlmutter scratch space. WRF can also use the file compression functionality from the netCDF4.0 or later version, which depends on the HDF5 library . See Balle & Johnsen (2016) for WRF I/O options and their performance.

Note

NetCDF4 (and underlying HDF5) library provides parallel read/write functionality, which is currently available as one of I/O options in WRF (README.netcdf4par). However, experiments by a WRF-SIG member found that the netcdf4 parallel I/O is significantly slower than the I/O using the parallel netcdf library.

Note

Another useful knowledge about netCDF library is the limitation on the size of the variable in a file, which depends on the netCDF data format (CDF1 = Classic ->2GB, CDF2=64-bit offset ->4GB, netCDF4 and CDF5 -> unlimited). See the table "Large File Support" in the NetCDF Users Guide . The current WRF code supports serial I/O of CDF1, CDF2, and netCDF4. The WRF's interface to the parallel netcdf library supports CDF1 and CDF2.

Note

If a user runs a high-resolution, large-domain simulation with the number of columns greater than ~1500 x ~1500, a 3D variable will be larger than 4GB and it is necessary to modify WRF's I/O source code to use the CDF5 format.

Our experience shows that using the netCDF and parallel netcdf libraries provide flexible and much faster I/O on the scratch system. Serial or parallel netcdf options can be specified in the run-time WRF namelist, and the parallel netCDF I/O is 10--20 times faster than the serial I/O option. Therefore, we recommend to build WRF with the netCDF (cray-netcdf module) and parallel netCDF (cray-parallel-netcdf module) libraries. This I/O choice is activated by setting a few environmental variables when compiling WRF after loading the netcdf and parallel netcdf libraries. With these two modules, we set the following environmental variables when compiling WRF

...
module load cray-hdf5   #the netcdf library depends on hdf5
module load cray-netcdf
module load cray-parallel-netcdf
...

export NETCDF_classic=1               #use classic (CDF1) as default
#use 64-bit offset format (CDF2) of netcdf files
export WRFIO_NCD_LARGE_FILE_SUPPORT=1 
#netcdf4 compression (serial) with the hdf5 module can be very slow
export USE_NETCDF4_FEATURES=0

and then specify the CDF2 format for high-resolution simulations at run-time by setting the namelist variables io_form_history, io_form_restart, etc., to be 11 for parallel netcdf I/O (instead of 2 for standard serial netcdf).

Note

[A discussion in the WRF user forum] (https://forum.mmm.ucar.edu/threads/solved-netcdf-error-when-attempting-to-using-pnetcdf-quilting-with-wrf.9015/) suggests that not only wrfinput but also wrfbdy data can be read through the parallel netcdf library if WPS is compiled appropriately with the parallel netcdf.

Build WRF on Perlmutter¶

WRF's build process starts with running the "configure" csh script that comes with the WRF source code package. This script automatically checks the computing platform and asks for a user input about the parallel job configuration.

On Perlmutter, we have tested the default gnu environment. Tested inputs to the "configure" csh script are gnu (dm+sm) and basic nesting:

Please select from among the following Linux x86_64 options:
....
32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)
...
Enter selection [1-75] :
Compile for nesting? (0=no nesting, 1=basic,...) [default 0] :

For real cases (not idealized cases like the 2d squall line), we generally recommend the option 35 (dm+sm) based on our experience of 4 threads per MPI rank (dm+sm) performing better than the pure MPI (dm) using the same number of nodes. However, the performance is sensitive to grid resolution, grid nesting, and other aspects of the model configuration; for each case a user needs to experiment to find optimal parallelism configurations.

After running the configure program, we run the "compile" csh script in the top directory of the WRF source code to compile.

We can execute all these steps in a bash script like the following:

Example WRF build script for Perlmutter¶

#!/bin/bash
set -e
set -o pipefail 

#change the following boolean variables to run/skip certain compiling steps

doclean=false  #true if WRF source code is modified since the last compilation

doclean_all=false  #true if previously compiled with different configure options

runconf=true    #run WRF's configure script; should do this first before compiling

docompile=true  #run WRF's compile script; should do this after configure

debug=false  #true to compile WRF with debug flag (no optimizations, -g flag for debugger, etc.)

imach="pm"  #target system name. "pm" for Perlmutter.

# set the top directory of the WRF source code as an environmental variable
export WRF_DIR="PATH_TO_YOUR_WRFcode_LOCATION"

#Modules --------------------------------------------------------------------
#general modules
module load cpu  
module load PrgEnv-gnu 

#module for WRF file I/O
#order of loading matters!
module load cray-hdf5  #required to load netcdf library
module load cray-netcdf 
module load cray-parallel-netcdf

module list #check what modules are loaded

#set environmental variables used by WRF build system, 
#using the environmental variables set by the modules

#use classic (CDF1) as default
export NETCDF_classic=1 
#use 64-bit offset format (CDF2) of netcdf files              
export WRFIO_NCD_LARGE_FILE_SUPPORT=1 
#do not use netcdf4 compression (serial), need hdf5 module
export USE_NETCDF4_FEATURES=0         

export HDF5=$HDF5_DIR
export HDF5_LIB="$HDF5_DIR/lib"
export HDF5_BIN="$HDF5_DIR/bin"

export NETCDF=$NETCDF_DIR
export NETCDF_BIN="$NETCDF_DIR/bin"
export NETCDF_LIB="$NETCDF_DIR/lib"

#create PNETCDF environment variable to use the parallel netcdf library
export PNETCDF=$PNETCDF_DIR  

export LD_LIBRARY_PATH="/usr/lib64":${LD_LIBRARY_PATH}
export PATH=${NETCDF_BIN}:${HDF5_BIN}:${PATH}
export LD_LIBRARY_PATH=${NETCDF_LIB}:${LD_LIBRARY_PATH}

#other special flags
export PNETCDF_QUILT="0"  #Quilt output is not stable, better not use it

#check environment variables
echo "LD_LIBRARY_PATH: "$LD_LIBRARY_PATH
echo "PATH: "$PATH
echo "MANPATH: "$MANPATH

echo "NETCDF is $NETCDF"
echo "NETCDF_LIB is $NETCDF_LIB"

echo "HDF5 is $HDF5"
echo "HDF5_LIB is $HDF5_LIB"

echo "PNETCDF: ${PNETCDF}"
echo "PNETCDF_QUILT: ${PNETCDF_QUILT}"

##capture the date and time for log file name
idate=$(date "+%Y-%m-%d-%H_%M")
#
##run WRF build scripts located in the top WRF directory
cd $WRF_DIR

if [ "$doclean_all" = true ]; then
    ./clean -a
    #"The './clean –a' command is required if you have edited the configure.wrf 
    #or any of the Registry files.", but this deletes configure.wrf....

fi

if [ "$doclean" = true ]; then
    ./clean
fi

#echo "running configure"
if [ "$runconf" = true ]; then

    if [ "$debug" = true ]; then
        echo "configure debug mode"
        ./configure -d
    else
        ./configure
    fi

   ##configure options selected are:
   # 32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)
   # choose 35 for real (not idealized) cases

    configfile="${WRF_DIR}/configure.wrf"

    #the sed commands below will change the following lines in configure.wrf
    #--- original
    #SFC             =       gfortran
    #SCC             =       gcc
    #CCOMP           =       gcc
    #DM_FC           =       mpif90
    #DM_CC           =       mpicc

    #--- edited (FC and CC with MPI)
    #SFC             =       gfortran
    #SCC             =       gcc
    #CCOMP           =       cc
    #DM_FC           =       ftn
    #DM_CC           =       cc

    if [ -f "$configfile" ]; then
        echo "editing configure.wrf"
        #need to remove -cc=$(SCC) in DM_CC
        sed -i 's/-cc=\$(SCC)/ /' ${configfile}
        sed -i 's/mpif90/ftn/' ${configfile}
        sed -i 's/mpicc/cc/' ${configfile}

        #also user can remove the flag -DWRF_USE_CLM 
        #from ARCH_LOCAL if not planning to 
        #use the CLM4 land model to speed up compilation
        #sed -i 's/-DWRF_USE_CLM/ /' ${configfile} 

    fi

fi

if [ "$docompile" = true ]; then
    export J="-j 4"  #build in parallel
    echo "J = $J"

    bldlog=compile_em_${idate}_${imach}.log
    echo  "compile log file is ${bldlog}"

    #run the compile script 
    ./compile em_real &> ${bldlog}

    #check if there is an error in the compile log
    #grep command exits the script in case of nomatch
    #after the 2022-12 maintenance
    set +e #release the exit flag before grep

    grep "Problems building executables" ${bldlog}    
    RESULT=$?

    #set the exit flag again
    set -e  

    if [ $RESULT -eq 0 ]; then
        echo "compile failed, check ${bldlog}"      
    else
        echo "compile success"
        #sometimes renaming executable with descriptive information is useful
        #cp $WRF_DIR/main/ideal.exe $WRF_DIR/main/ideal_${idate}_${imach}.exe
        #cp $WRF_DIR/main/real.exe $WRF_DIR/main/real_${idate}_${imach}.exe
        #cp $WRF_DIR/main/wrf.exe $WRF_DIR/main/wrf_${idate}_${imach}.exe
        #cp $WRF_DIR/main/ndown.exe $WRF_DIR/main/ndown_${idate}_${imach}.exe
    fi

fi

As seen in the example script, the compiler names need to be edited in the configure.wrf file (we use the sed command to do this in the script). Specifically, we need to change the compiler names for MPI applications to compiler wrappers.

change "mpif90" to "ftn" for the DM_FC flag
change "mpicc" to "cc" for the DM_FC flag
keep SFC and SCC to be the base compiler (gfortran and gcc)

These edits allow us to compile WRF on the login node.

Also note that in the example bash script above, we control which steps to run or not to run by the following logical variables.

doclean=false

doclean_all=false 

runconf=false

docompile=false

For example, when we download the source code and compile out of the box for the first time, we would set

doclean=false

doclean_all=false 

runconf=true

docompile=true

then the example shell script will run the WRF configure script and WRF compile script. The compile script does several checks and invokes the make command, among other things.

We may make simple changes to some WRF source code files, and want to to build new executable files from the modified source code (with the same model configuration). In such a case, we set:

doclean=true

doclean_all=false 

runconf=false

docompile=true

Then the bash script runs WRF's clean script to remove all object and executable files and recompile the code.

If we want to change model configuration (e.g., nesting option or dynamical core), add new variables to the WRF registry , or turn on/off the chemistry option, then

doclean=false

doclean_all=true 

runconf=true

docompile=true

Then the bash script runs WRF's clean script with the "-a" option to remove all files created during and after the configuration and compilation, and run the configure and compile scripts.

Run WRF¶

Experience of WRF-SIG members and NERSC Best Practice recommends the following:

Use the scratch space for model execution and set an appropriate file strip on the execution directory.
Use the parallel netcdf library I/O Time spent on history and restart file writing is reduced by at least 30%, often to be 1/10 of a serial netCDF I/O (need to set an appropriate stripe size for the output directory).
For simple use-cases such as a single domain (no nesting), use four OpenMP threads instead of using all the available physical cores for MPI tasks. On Perlmutter for the COUNS 2.5km benchmark case, an 8-node job with 4 OpenMP threads and 256 MPI ranks performs as almost equally fast as a 16-nodes job with 2048 MPI ranks without OpenMP threads. The latter (16 nodes) is twice more expensive than the former (8 nodes); note the charge to project allocation depends on the number of nodes used and the wall-clock hours, among others (Compute Usage Charging). However, a WRF-SIG member found no computational advantage of OpenMP threads for a nested high-resolution (LES) case.

Other useful resources:

Example WRF sbatch script for Perlmutter¶

#!/bin/bash 
#SBATCH -N 1
#SBATCH -q debug
#SBATCH -t 00:30:00
#SBATCH -J test
#SBATCH -A <account>   #user needs to change this
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<email address>  #and this
#SBATCH -L scratch,cfs
#SBATCH -C cpu
#SBATCH --tasks-per-node=64 #user needs to experiment this value

pwd
ntile=4  #number of OpenMP threads per MPI task
#need to set the "numtiles" variable in the wrf namelist (namelist.input) to be the same 

wrfexe="PATH_to_your_wrf.exe" #recommend to save the executable in global common or scratch space
rundir="/pscratch/sd/e/elvis/simulation/WRF/run" #where to run WRF; user needs to change this

#Modules --------------------------------------------------------------------
#general modules
module load cpu  
module load PrgEnv-gnu 

#module for WRF file I/O
#order of loading matters!
module load cray-hdf5  #required to load netcdf library
module load cray-netcdf 
module load cray-parallel-netcdf

#OpenMP settings:
export OMP_NUM_THREADS=$ntile
export OMP_PLACES=threads  #"true" when not using multiple OpenMP threads (i.e., ntile=1)
export OMP_PROC_BIND=spread
export OMP_STACKSIZE=64MB  #increase memory segment to store local variables, needed by each thread

cd $rundir

#run simulation
srun -n 64 -c 4 --cpu_bind=cores ${wrfexe}

#rename and save the process 0 out and err files
cp rsl.error.0000 rsl.error_0_$SLURM_JOBID
cp rsl.out.0000 rsl.out_0_$SLURM_JOBID

To set appropriate options for the srun command (by considering process and thread affinity), users are encouraged to use the jobscript generator.

Reference¶

Balle, T., & Johnsen, P. (2016). Improving I / O Performance of the Weather Research and Forecast ( WRF ) Model. Cray User Group.