R¶

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical tools, such as linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, graphics, and it is highly extensible.

R provides an Open Source route to express statistical methodologies, it is a GNU project with similarities to the S language and environment. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R at NERSC¶

Quickstart¶

Type the following commands to launch R:

perlmutter$ module load R
perlmutter$ R

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

Available R Modules¶

There are several versions of R available on Perlmutter via the module system. See the list of currently available versions, and load the version of your choosing with the following commands:

perlmutter$ module -r avail "^R$"

------------- /global/common/software/nersc/pm-2022.12.0/extra_modulefiles -------------
R/4.2.3 (D)

perlmutter$ module load R/4.2.3

Running R on Compute Resources¶

Login nodes are not well suited for intensive processing, and running large jobs may affect performance for other users. If you have a non-trivial workload, use compute resources for better performance. Read more about compute jobs.

Running R Interactively in JupyterLab¶

You may run both R and Python interactively in notebooks using the NERSC Jupyter service. The default R kernel available in JupyterLab has a number of commonly used R packages and uses R version 3.6.1 (2019-07-05) -- "Action of the Toes".

Jupyter displays output differently than the shell

Some common functions used in R will not display output normally in Jupyter, by default. For example, system('ls') returns no output! Typically, these functions are sending output to stdout in an inconsistent way and should be returned as strings instead, for example with system('ls',intern=TRUE).

To simulate most of the regular behaviour of R in the terminal while using Jupyter notebooks, add the following snippet to your .Rprofile and use system.jup from notebooks.

system.jup <- function(command){
    cat(base::system(command,intern=TRUE),sep='\n')
}

Running R Interactively via CLI¶

To run R on a compute node interactively, request an interactive allocation with salloc and run R inside it.

nersc$ salloc --qos=interactive -C cpu --time=234
nersc$ module load R
nersc$ R

Running R via Batch Job¶

To run R through a batch job, make a batch script similar to the following and submit it via sbatch.

#!/bin/bash
#SBATCH -C cpu
#SBATCH --qos=regular

module load R
R CMD BATCH code.R

The content of code.R might look like.

j=1;
imagfilename = paste('myimag', j ,'.pdf',sep='');
pdf(file=imagfilename, width = 800, height =800)
x=1:10;
plot(x, main='R is fun')
dev.off()

Finally, submit your batch job with:

nersc$ sbatch myscript.sh

For more general information on creating batch scripts, see example job scripts.

Creating Custom R Environments¶

Using Anaconda¶

We strongly encourage users to use Anaconda to create conflict-free and reproducible R environments. This is typically the quickest way to install R packages, especially if those packages have additional dependencies on other libraries. You may use either conda or mamba, but we have found mamba typically resolves package version dependencies more quickly.

To get started, create a conda environment using mamba and add your desired packages:

perlmutter$ module load conda
perlmutter$ mamba env remove -n my-custom-r
perlmutter$ mamba create -n my-custom-r
perlmutter$ source activate my-custom-r
perlmutter$ mamba install -c conda-forge r r-essentials <additional R libs>

Conda Environments can be used in JupyterLab

If the r-irkernel package is installed in your R environment, then once you install a kernelspec file your environment should show up in the list of available kernels in Jupyter.

To install a kernelspec, run the IRkernel::installspec command from R in your environment.

nersc$ source activate my-custom-r
nersc$ mamba install -c conda-forge r-irkernel
nersc$ R
> ename <- Sys.getenv('CONDA_DEFAULT_ENV')
> dname <- trimws(paste("R",getRversion(),Sys.getenv("CONDA_PROMPT_MODIFIER")))
> IRkernel::installspec(name=ename, displayname=dname)
> quit()

See the conda documentation in the Python docs for more tips on managing Anaconda environments.

From Source Packages¶

Not all R language packages are available to install with mamba via conda-forge, but it is possible to install additional packages from source. In this case, we recommend users still use Anaconda to install prerequisites and other packages, and install all source packages in a separate directory.

After creating a new environment as above, install any prerequisites via mamba and create a new directory to store your source installation for your environment. Then, start R and install the source package, specifying your new directory as the install location.

nersc$ source activate my-custom-r
nersc$ mamba install -c conda-forge <prerequisite packages>

nersc$ mkdir -p ~/.R/srclib/my-custom-r
nersc$ R
> install.packages(<source pkg>, lib='~/.R/srclib/my-custom-r')

This new library location will need to be appended to .libPaths(). To ensure it gets loaded only for your custom conda environment, add the following to your .Rprofile.

.libPaths.env <- function(envname="base") {
    cmd <- paste("bash -c \"source activate",
                  envname,
                  ">/dev/null;",
                  "unset R_HOME;",
                  "R --slave -e 'cat(.libPaths())'\"")
    base::system(cmd,intern=TRUE)
}

if (.libPaths()==.libPaths.env('my-custom-r')){
    .libPaths(new=c(.libPaths(),"~/.R/srclib/my-custom-r"))
}

If you install source packages for additional environments, just reuse the .libPaths.env() function and add the second snippet. For example, if you installed a package from source into the directory ~/.R/srclib/even-more-r to use in a second custom environment even-more-r you should add the following to your .Rprofile

if (.libPaths()==.libPaths.env('even-more-r')){
    .libPaths(new=c(.libPaths(),"~/.R/srclib/even-more-r"))
}

How to Run R Code in Parallel¶

The following program illustrates how R can be used for 'coarse-grained parallelization', particularly useful when chunks of the computation are unrelated and do not need to communicate in any way. The example below uses the package parallel to create workers as lightweight processes via forking, and are very useful to optimize codes that use lapply, sapply, apply and related functions:

library("parallel")
f = function(x)
{
 sum = 0
 for (i in seq(1,x)) sum = sum + i
 return(sum)
}
n=1000
nCores <- detectCores()
result = mclapply(X=1:n, FUN = f, mc.cores=nCores)

Performance with R¶

If you are attempting to use R for large performative workload, we strongly recommend you also review the python documentation which has several helpful insights about how to run an interpreted language efficiently on HPC.

References¶

Official R Project Documentation
R Project Mail List
A quick R tutorial presented at one of our NERSC User Group Meetings.