Cluster Resources - selmling/Analytics-and-Data-Exploration GitHub Wiki

Connect to the cluster

SSH into the cluster:

ssh [email protected]

Check your home directory contents:

cd ~
ls

Clone and update repositories

Clone a GitHub repo onto the cluster with SSH:

cd ~
git clone [email protected]:USERNAME/REPO.git

Copy files between your laptop and the cluster

Copy one file from your local machine to the cluster:

scp /full/local/path/file.csv [email protected]:/home/NETID/REPO/data/

Copy a whole directory to the cluster:

scp -r /full/local/path/data_dir [email protected]:/home/NETID/REPO/

Download one file from the cluster to your local machine:

scp [email protected]:/home/NETID/REPO/figures/output.pdf /full/local/path/

Copy a missing data file into a repo on the cluster:

scp /full/local/path/laugh_turn_dat.csv [email protected]:/home/NETID/REPO/data/

Load R on a Red Hat cluster

See what R versions are available:

module avail R

Load R:

module purge
module load R/4.5.1

Check that R is working:

Rscript -e 'sessionInfo()'

Create a personal R package library

Create a user library for packages:

mkdir -p ~/R/x86_64-redhat-linux-gnu-library/4.5

Install commonly used R packages:

R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages(c("tidyverse","lme4","lmerTest","broom.mixed","furrr","future","progressr","here","cowplot","kableExtra"), repos="https://cloud.r-project.org")'

Test that the packages load:

R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'library(tidyverse); library(lme4); library(lmerTest); library(broom.mixed); library(furrr); library(future); library(progressr); library(here); library(cowplot); library(kableExtra); cat("ok\n")'

Install one missing package:

R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages("broom.mixed", repos="https://cloud.r-project.org")'

Example Slurm script for an R job

Create a file called run_analysis.slurm:

#!/bin/bash
#SBATCH --job-name=my_r_job
#SBATCH --time=24:00:00
#SBATCH --mem=64G
#SBATCH --cpus-per-task=8
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]

export PS1=""

set -eo pipefail

module purge
module load R/4.5.1
module load anaconda3/2025.6

set -u

REPO_DIR="$HOME/REPO"

cd "${REPO_DIR}"

R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 \
Rscript analyses/analysis_2/your_r_script.R

Submit and monitor Slurm jobs

Submit a job:

sbatch run_analysis.slurm

Watch your queue live:

watch -n 5 squeue -u NETID

Check your queue once:

squeue -u NETID

Cancel a job:

scancel JOBID

Check job efficiency after it finishes:

seff JOBID

Inspect job logs

Print the Slurm output log:

cat slurm-JOBID.out

Follow the log live:

tail -f slurm-JOBID.out

Common cluster errors

If you see PS1: unbound variable, use this pattern in your Slurm script:

export PS1=""
set -eo pipefail
module purge
module load R/4.5.1
set -u

If R says a package is missing, install it into your user library:

R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages("broom.mixed", repos="https://cloud.r-project.org")'

Useful repo and file commands

Bundle outputs for download:

tar -czvf results_outputs.tgz figures/ analyses/analysis_2/output_dir/

Typical workflow for running an R project on a cluster

Clone the repo:

cd ~
git clone [email protected]:USERNAME/REPO.git

Enter the repo and install packages:

cd ~/REPO
module purge
module load R/4.5.1
mkdir -p ~/R/x86_64-redhat-linux-gnu-library/4.5
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages(c("tidyverse","lme4"), repos="https://cloud.r-project.org")'

Submit the job:

sbatch run_analysis.slurm

Watch the queue:

watch -n 5 squeue -u NETID

Good sanity checks before a long run

Check your R and module setup:

module purge
module load R/4.5.1
Rscript -e 'sessionInfo()'

Check repo status:

cd ~/REPO
git status
git log --oneline -3

Check that required input files exist:

ls ~/REPO/data