R and RStudio on Quest#

Important

R: All versions of R will launch on RHEL8, but user-installed packages, especially those that use system or external libraries, are not guaranteed to work. Such packages may need to be reinstalled in the RHEL8 environment.

Overview#

R is a popular statistical programming language and is available on Quest. This article describes how to run R code on the cluster, either as standalone R scripts or within the RStudio graphical user interface (GUI).

Accessing R and Managing R Packages on Quest#

Like other software, R is available using the module system on Quest. There are R modules available for many of the recent version of R. A limited number of R packages have been installed at the system-level for each R module, but most R package installs are done at the user level. When installing an R package, that installation will be specific to both the user and the version of R being used. When you change R versions, you will have to reinstall the packages you are in the habit of using.

System-level R modules#

Several version of R are available on Quest as environmental modules. We recommend using the 4.4.0 version of R but you can check on the versions available with the following command:

$ module spider R

You can make a particular version of R available to use by typing the full module name with the version included as listed in the output from module spider R. Example:

$ module load R/4.4.0

After you have loaded the R module, you can use the command R to start the command line version of R. If you want to run R within RStudio, please refer to the Using R through RStudio on Quest section.

System-level R Packages#

A limited number of R packages have been installed at the system-level for each R module. These packages can be used without downloading and installing them first. You can get a list of the currently installed packages with the command:

installed.packages()

User-level R Packages#

As a user, you can also install additional R packages by running the R command line console and using the install.packages(){.command} command, which will pull from the CRAN repository:

install.packages(c("packagename1", "packagename2"))

Do keep in mind that some R packages are hosted in repositories other than CRAN. Many bioinformatics and genomics packages in particular, such as SingleR, are released from Bioconductor , which has its own command to download packages:

BiocManager::install("SingleR")

Other times, some R packages are hosted on GitHub, bitbucket, or other repositories. You can use the remotes package to directly install such packages, in this case from GitHub:

remotes::install_github()

Regardless of how you install these user-level R packages, they will be saved to the following location in your home directory on Quest, where X and Y reflect the major and minor version release of the loaded R module, respectively:

~/R/x86_64-pc-linux-gnu-library/X.Y

To illustrate, if you first load the R/4.4.0 module and install SingleR, the package will be saved to ~/R/x86_64-pc-linux-gnu-library/4.4.

First-time Installation#

The first time you install an R package on Quest while using an environmental module, you may see an error like:

Warning in install.packages("glmnet", repos = "https://cloud.r-project.org/") :
'lib = "/hpc/software/R/4.4.0/lib64/R/library"' is not writable
Would you like to use a personal library instead? (y/n)

Answer “y” to use a personal library. Then it will ask you something like:

Would you like to create a personal library
~/R/x86_64-pc-linux-gnu-library/4.4
to install packages into? (y/n)

Answer “y” again. The installation should then proceed successfully. This will install the packages in your home directory.

For information on troubleshooting R package installation with environmental modules, please see the Troubleshooting Installation of Common R Packages section.

Running R Code on Quest Without RStudio#

Like all computation on Quest, running R code should be done on a compute node, rather than the login node you first land on when accessing Quest from the command line. This section describes the various ways you can use R on Quest, with or without the RStudio graphical user interface (GUI). If you would like to use the RStudio GUI, we recommend using RStudio Server through Quest OnDemand as it provides users more control over both their software environment and the compute resources than the Analytics Nodes do.

Interactive job on the command line#

The simplest way to use R with the computational resources of Quest is through an interactive job on the command line (i.e., without RStudio). This is a quick and useful method of launching R if you have just a few lines of code to run, or if you are testing out some code and don’t need RStudio’s GUI. Once you submit an interactive job like this, replacing “a9009” with your Slurm account and relevant partition, you will be brought to a compute node. This simple example requests 8GB of RAM on 1 core for 8 hours:

$ srun -A a9009 -p a9009 -t 04:00:00 --mem=8G --pty bash -l

Once the job is allocated, you can first either load the desired system-level R module, or activate a conda/mamba virtual environment containing R. Once one of these are loaded into your compute node session, you can directly launch the R command line console with the command R.

That would look something like this for a system-level module:

$ module load R/4.4.0
$ R

R version 4.4.0 (2024-04-24) -- "Puppy Cup"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

If you instead use R within a conda/mamba virtual environment, launching R in this manner would be akin to the following:

$ conda activate seurat-env
(seurat-env) $ R

R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

Now anything you type in the “>” prompt will be R code you can directly execute on the compute node you were allocated. To close your R session, type the command q() to quit and return to the bash shell. Generally speaking, you should answer “n” for “No” when prompted with “Save workspace image? [y/n/c]:” upon quitting.

Batch job using Rscript#

If you already have a working R script that you would like to run “in the background” using Quest’s resources, we recommend submitting a non-interactive batch job. As a simple example, let’s consider the script sleepy.R, which logs the system time to stdout before and after a “sleep” interval of 30 minutes:

File: sleepy.R

# first message
message("I'm tired, I need to take a nap! Wake me 30 minutes from ", Sys.time())

# sleep for 30 minutes (1800 seconds)
Sys.sleep(1800)

# second message
message("I feel so refreshed! The time is now ", Sys.time())

To run sleepy.R in this manner with a system-level module, you will call it inside of the submission script submit-sleepy.sh with the Rscript command:

File: submit-sleepy-Rmodule.sh

#SBATCH --account=a9009        # replace with your Slurm account name
#SBATCH --partition=a9009      # replace with a relevant partition
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=01:00:00
#SBATCH --mem=1G
#SBATCH --job-name=sleepR
#SBATCH --output=%j-%x.out

module purge all
module load R/4.4.0

Rscript sleepy.R

To run this script as a non-interactive batch job, save submit-sleepy-Rmodule.sh in the same directory as sleepy.R and run the command sbatch submit-sleepy-Rmodule.sh. You will receive a numeric Slurm job ID after submitting the job. In this example, the results of the message() statements are written to the file %j-%x.out, where %j will be replaced with the Slurm job ID and %x is the Slurm job name. For example, assuming the returned job ID is 9446171, you can print the contents of this output file with the cat command:

$ cat 9446171-sleepR.out
I'm tired, I need to take a nap! Wake me 30 minutes from 2023-12-13 16:04:45
I feel so refreshed! The time is now 2023-12-13 16:34:45

To run the same script using a conda/mamba virtual environment, your script would look something like this instead:

File: submit-sleepy-mamba.sh

#SBATCH --account=a9009   # replace with your Slurm account name 
#SBATCH --partition=a9009 # replace with a relevant partition 
#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=1 
#SBATCH --time=01:00:00 
#SBATCH --mem=1G 
#SBATCH --job-name=sleepR 
#SBATCH --output=%j-%x.out 

module purge all 
module load mamba/23.1.0
eval "$(conda shell.bash hook)"
conda activate simple-r-env

Rscript sleepy.R

The simple-r-env environment was created by running the command module load mamba/23.1.0; mamba create -n simple-r-env -c conda-forge r-base --yes.

Using R through RStudio on Quest#

Many R users prefer interfacing with the RStudio graphically user interface instead of scripting. To make RStudio available on our Quest we offer the following ways of launching RStudio Server: Quest OnDemand, the Quest Analytics Nodes, and RStudio Server via an interactive job.

Quest OnDemand#

This is our recommended way to use RStudio on Quest. Quest OnDemand is a browser-based interface for Quest location at https://ondemand.quest.northwestern.edu . If you are not on campus, you will need to be on the GlobalProtect VPN for access. RStudio Server interactive jobs can be requested using the “Interactive Apps” dropdown, or from the “RStudio Server” button on the bottom of the landing page.

To meet common additional software requirements of many packages, you may add additional modules to your environment for your RStudio Server session as mentioned above.

Quest Analytics nodes#

The Quest Analytics Nodes allow users with active Quest allocations to use RStudio from a web browser. See Research Computing: Quest Analytics Nodes for an overview of the system. The Quest Analytics Nodes are shared by many researchers. Please be aware of your memory use when analyzing large data sets. Users utilizing a large amount of memory, especially those using over 60GB of RAM, may be asked to move their analysis to other systems. Multicore and parallel processes should not be run on the Analytics nodes. Users needing to run computationally intensive jobs should pick one of the other ways to use R on Quest instead of using the Analytics nodes. Please contact Research Computing at quest-help@northwestern.edu with questions about R memory use or analyzing large data sets.

Connections to the Quest Analytics Nodes are limited to the Northwestern network. If you are connecting from off-campus, you must use the GlobalProtect VPN .

Using any browser, connect to: rstudio.quest.northwestern.edu . Sign in with your NetID and password.

The Analytics Nodes are limited to running a single version of R for all users.

RStudio Server: Interactive job#

RStudio Server can be launched and run on a Quest compute node through an interactive job or batch job on Quest.

To schedule the interactive job from the command line on Quest, ssh into a login node and type:

$ srun -A <slurm_account_name> -p <queue_name> -N 1 --ntasks-per-node=1 --mem-per-cpu=4G --time=04:00:00 --pty bash -l

This example requests a single core for a 4 hour job. Substitute an active Slurm account name and queue name, for example if using Slurm account p12345 this might be:

$ srun -A p12345 -p short -N 1 --ntasks-per-node=1 --mem-per-cpu=4G --time=04:00:00 --pty bash -l

Note that the more cores requested, the longer the wait for the interactive session to start. Do not request more than 1 node for RStudio Server sessions.

Once the session begins, get the name of the compute node the session has landed on by running the command hostname, e.g.,

$ hostname
qnode0372

Next, load the version of R you would like to run and any additional modules that you need for installing or using certain R packages, e.g.,

$ module purge
$ module load R/4.4.0
$ module load hdf5/1.14.1-2-gcc-12.3.0 fftw/3.3.10-gcc-12.3.0 gdal/3.7.0-gcc-12.3.0 nlopt/2.7.1-gcc-12.3.0

After you have loaded these modules, load the rstudio-server/2024.09 module, which will display to you a short hand version of the instructions that you see here.

$ module load rstudio-server/2024.09
If you have not already done so, make sure you are running and interactive or batch
job.
Before launching RStudio Server, load the version of R you would like to run and any
additional modules that you need for installing or using certain R packages.

Once this is done, call the command `rserver <port_number>` where <port_number>
should be a value between 8000 and 9000.
Based on the port number and the compute node that RStudio Server is running on, you
will then tunnel to the server using the command:
`ssh -L <port_number>:localhost:<port_number> <netid>@login.quest.northwestern.edu ssh
-N -L <port_number>:localhost:<port_number> qnode<number>` filling in the appropriate
value for <port_number> and <compute_node>
Finally, in your *local browser* you can then put in the URL
`localhost:<port_number>` and connect to your RStudio Server session.

Once RStudio Server is running on the compute node, open a new terminal window on your local computer, and type:

$ ssh -L <port_number>:localhost:<port_number> <netid>@login.quest.northwestern.edu ssh -g -N -L <port_number>:localhost:<port_number> qnode<number>

In the command template above, be sure to replace <your_NetID> with your NetID, replace qnode<number> with the name of the compute node, and replace all <port_number> instances with the port number between 8000 and 9000 that you selected. You will be prompted for your Quest password, which will not return a prompt.

On your local computer, open up your browser and connect to http://localhost:<port_number>/. Your browser is now connected to the RStudio Server session running on Quest.

Note that your RStudio Server session will quit abruptly when the walltime of the interactive job comes to an end. Save often and be aware of walltime to avoid losing your work.

RStudio File System#

When you connect to RStudio, you will initially be in your home directory. This is /home/<netid> on Quest.

To access a projects directory, click on the button with three small dots on the right in the Files tab. Then enter the full path to your project folder (ex. ``/projects/`).

Troubleshooting Installation of Common R Packages on Quest#

Individual R packages come from many different developers who sometimes have different expectations of how your software environment is set up. R packages can also have different dependent software in addition to base R, or they may depend on other R packages but not install them for you. All of these things can make package installation trickier on a cluster system so we have included some documentation on troubleshooting particular R package installs we’ve seen often cause difficulty on Quest. Under each package name you will see “Additional module(s) needed” if you need to load modules in addition to R and “Conflicting modules” if there are modules you should not have in your environment as they conflict with the installation.

All of these instructions assume you are working with R version 4.4.0 and have the ability to load/unload modules (so not the Analytics Nodes). If you need to use a different version of R for your project and are having difficulty with package installation for that version please reach out to quest-help@northwestern.edu for assistance.

igraph#

Additional module(s) needed: None.

Conflicting modules:

glpk/4.53
glpk/4.58
glpk/4.65-gcc-12.3.0

R code for installation:

install.packages("igraph")

Or:

devtools::install_github("igraph/rigraph")

sf#

Additional module(s) needed:

hdf5/1.14.1-2-gcc-12.3.0
gdal/3.7.0-gcc-12.3.0

R code for installation:

install.packages("sf", configure.args = c(sf = "--with-sqlite3-lib=/hpc/software/spack_v20d1/spack/opt/spack/linux-rhel7-x86_64/gcc-12.3.0/sqlite-3.40.1-gzayqyouerp6yxtxcd35gxeorakrlsg4/lib"))

terra#

Additional module(s) needed:

hdf5/1.14.1-2-gcc-12.3.0
gdal/3.7.0-gcc-12.3.0

R code for installation:

install.packages("terra", configure.args = c(terra = "--with-sqlite3-lib=/hpc/software/spack_v20d1/spack/opt/spack/linux-rhel7-x86_64/gcc-12.3.0/sqlite-3.40.1-gzayqyouerp6yxtxcd35gxeorakrlsg4/lib"))

Seurat#

Conflicting modules:

glpk/4.53
glpk/4.58
glpk/4.65-gcc-12.3.0

Additional module(s) needed: None, but many packages used in conjunction with Seurat need hdf5.

R code for installation:

install.packages("Seurat")

Note: if you are installing Seurat for the first time and don’t have any of the dependencies this installation can take upwards of an hour as it will build all dependencies from source. If it fails to install a particular dependency, please try installing that dependency and then retrying the Seurat install. Some users have reported an error that asks to remove a particular file from their home directory during the installation process. If you get this error, remove the file in question and then retry the installation. If you have questions, let us know at quest-help@northwestern.edu.

hdf5r#

Additional module(s) needed:

hdf5/1.14.1-2-gcc-12.3.0

R code for installation:

install.packages("hdf5r")

jpeg#

Additional module(s) needed:

libjpeg-turbo/2.1.5-gcc-12.3.0

R code for installation:

Sys.setenv(JPEG_LIBS=-L/hpc/software/spack_v20d1/spack/opt/spack/linux-rhel7-x86_64/gcc-12.3.0/libjpeg-turbo-2.1.5-rersiv4gdrubpy3or46q2vjjdho7mc7y/lib64)

install.packages("jpeg")

Using R with Anaconda Virtual Environments#

Anaconda (i.e., conda, miniconda, and mamba) virtual environments are a great alternative to loading environmental modules for a range of applications on Quest. Although Anaconda was originally released to facilitate Python package installations, its repositories now host the vast majority of public R packages as well, especially the most commonly used ones. The benefits to using Anaconda over modules are especially apparent for R, since many of its packages compile its underlying code upon installation. Due to this, R packages that require complex compilation dependencies can often fail without other modules being loaded into your environment as well. By contrast, R packages distributed by Anaconda inherently come with these dependencies, which greatly simplifies the installation process. The names of R packages are often lowercase and prepended with “r-” in Anaconda’s various repositories. For example, the R package Seurat is named r-seurat in the conda-forge repository .

User-level R packages#

By definition, all R packages that you install with conda, miniconda, or mamba are at the user-level. The location of the packages depends on where you specify the environment to be created. It is generally best practice to create a new Anaconda virtual environment for each major piece of software (or related software) you want to use. To illustrate, if you want to install Seurat from conda-forge with mamba, you would first load a mamba module, and install it into a new environment called seurat-env :

$ module load mamba/23.1.0 
$ mamba env create -n seurat-env -c conda-forge r-seurat

In this case, the Seurat package, along with all of its dependencies, will be saved within subfolders of ~/.conda/envs/seurat-env. You can also specify another installation location by replacing -n seurat-env with --prefix=/desired/path/to/seurat-env . Once the environment is created, you can activate seurat-env to use the Seurat R package within an interactive or batch job. Please refer to our Anaconda documentation for more information on the topic.