R¶

The R programming language is an integrated suite of software facilities for data manipulation, calculation, and graphical display. It is particularly well-suited for data analysis—and, in particular, statistics—and visualization, having best-in-class packages in both of these domains.

Highly extensible, it provides a wide variety of statistical tools, such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and publication-quality plots.

Installing R at NERSC¶

The R Module¶

NERSC provides a ready-made R module, along with a number of default packages, which can be loaded as follows:

module load R

We recommend that you install your own R environment if the NERSC provided R module does not fit your needs.

Installing Your Own R Environment¶

We strongly recommend using Conda to install R and the packages you care about in a way that is conflict-free and reproducible. Plus, it tends to be the quickest way to install R packages, especially if they have dependencies.

Conda Installation¶

To get started, create a Conda environment and add your desired packages:

# Creating a custom conda environment
module load conda
conda create -n my-custom-r
conda activate my-custom-r
# Installing R and the libraries of your choice
conda install -c conda-forge r r-essentials <additional R libs>

See our conda documentation for further information on managing Conda environments.

Adding Source Packages¶

Not all R packages are available on the conda-forge repositories, in which case you will likely need to install packages from source.

We recommend using the R install.packages function on top of your Conda installation, putting your dependencies in a dedicated project-specific folder to avoid having various projects interfere with each other:

# Activating your conda R environment 
conda activate my-custom-r
# Creating a dedicated folder for your R-installed dependencies
mkdir -p ~/.R/srclib/my-custom-r
# Starting R and installing dependencies into the dedicated folder
R
> install.packages(<source pkg>, lib='~/.R/srclib/my-custom-r')

This new library location will need to be appended to .libPaths() in order to be picked up by your R environment. To do so, you can add the following to your .Rprofile, which will also ensure that it is loaded only when using your custom conda environment:

.libPaths.env <- function(envname="base") {
    cmd <- paste("bash -c \"source activate",
                  envname,
                  ">/dev/null;",
                  "unset R_HOME;",
                  "R --slave -e 'cat(.libPaths())'\"")
    base::system(cmd, intern=TRUE)
}

if (.libPaths() == .libPaths.env('my-custom-r')) {
    .libPaths(new=c(.libPaths(),"~/.R/srclib/my-custom-r"))
}

Adding further environments is as easy as adding additional if blocks. For example, adding the following would add the even-more-r environment:

if (.libPaths() == .libPaths.env('even-more-r')) {
    .libPaths(new=c(.libPaths(),"~/.R/srclib/even-more-r"))
}

Running R at NERSC¶

Once the module or conda environment is loaded, you can call the R command and start R. However, while you can run trivial commands on login nodes, you will want to use compute nodes to get the best performance available and avoid interfering with other users on those same login nodes.

Interactive Command-line¶

You can run R on a compute node interactively, on the command line, by starting it from within an interactive job:

# Starting an interactive job
salloc --qos=interactive -C cpu --time=234
# Loading the R module
module load R
# Starting R for interactive use
R

See our Interactive Jobs page for further details on this type of allocation.

Slurm Scripts¶

You will want to run a batch job for more intensive, fully scripted, computations. To do so, you would write a Slurm batch script like the following (where code.R would be your R code):

#!/bin/bash
#SBATCH -A <project account>
#SBATCH -C cpu
#SBATCH --qos=regular
#SBATCH --nodes=1
#SBATCH --time=1:00:00

# Loading the R module
module load R

# Running R in batch mode
R CMD BATCH code.R

Then submit it via sbatch as follows (where myscript.sh is the name of your Slurm script):

sbatch myscript.sh

See our page on running jobs and example job scripts for more general information on writing your own batch scripts and requesting specific resources (such as a number of nodes or a GPU).

Using Your Own R Environment in Jupyter¶

You can create a Jupyter kernel based on your own (conda-based) R environment. To do so, you will need to install the r-irkernel package and create a kernelspec file.

The following commands will let you install r-irkernel and set it up within your R environment:

# Adding r-irkernel to your R environment
conda activate my-custom-r
conda install -c conda-forge r-irkernel
# Setting up IRkernel in R
R
> ename <- Sys.getenv('CONDA_DEFAULT_ENV')
> dname <- trimws(paste("R", getRversion(), Sys.getenv("CONDA_PROMPT_MODIFIER")))
> IRkernel::installspec(name=ename, displayname=dname)
> quit()

Once r-irkernel is set up, you can follow our Jupyter documentation which will let you know how to turn a conda environment into a kernel visible in our Jupyter instance.

Fixing the Jupyter Display¶

Some R functions do not display outputs properly in Jupyter (for example, system('ls') returns no output!). This is typically due to those functions sending outputs to stdout, instead of returning strings (as system('ls', intern=TRUE) would do).

Most of this behavior can be fixed by adding the following snippet to your .Rprofile and using system.jup from notebooks:

system.jup <- function(command) {
    cat(base::system(command, intern=TRUE), sep='\n')
}

Parallel R¶

There are many ways to run R code in parallel, especially if you can use a coarse-grained parallelism pattern, in which chunks of computation can be computed independently from each other.

The following example illustrates using the parallel package, which creates workers as lightweight processes via forking, to optimize codes that use lapply, sapply, apply, and related functions:

library("parallel")
f = function(x) {
 sum = 0
 for (i in seq(1, x)) sum = sum + i
 return(sum)
}
n = 1000
nCores <- detectCores()
result = mclapply(X=1:n, FUN=f, mc.cores=nCores)

To go further, we recommend taking a look at the HPC with R Workshop slides. You might also find our Python documentation helpful, as it goes into detail on how one would use an interpreted language efficiently on an HPC system.

Getting Help with R¶

If you have R questions or problems, please contact NERSC's online help desk. We also encourage you to take a look at the following links:

The Official R Project Documentation
Using R for Statistical Computing, a quick R tutorial aimed at introducing the language to NERSC users
Efficient R Programming, a principled R guide
Advanced R, a guide to deepen your understanding of R and get familiar with it if you come from another programming language
The R Inferno, a guide to R's trouble spots, oddities, traps, and glitches
HPC with R, a workshop delving deeper into using R on a high-performance system