Cluster Resources - selmling/Analytics-and-Data-Exploration GitHub Wiki
Connect to the cluster
SSH into the cluster:
ssh [email protected]
Check your home directory contents:
cd ~
ls
Clone and update repositories
Clone a GitHub repo onto the cluster with SSH:
cd ~
git clone [email protected]:USERNAME/REPO.git
Copy files between your laptop and the cluster
Copy one file from your local machine to the cluster:
scp /full/local/path/file.csv [email protected]:/home/NETID/REPO/data/
Copy a whole directory to the cluster:
scp -r /full/local/path/data_dir [email protected]:/home/NETID/REPO/
Download one file from the cluster to your local machine:
scp [email protected]:/home/NETID/REPO/figures/output.pdf /full/local/path/
Copy a missing data file into a repo on the cluster:
scp /full/local/path/laugh_turn_dat.csv [email protected]:/home/NETID/REPO/data/
Load R on a Red Hat cluster
See what R versions are available:
module avail R
Load R:
module purge
module load R/4.5.1
Check that R is working:
Rscript -e 'sessionInfo()'
Create a personal R package library
Create a user library for packages:
mkdir -p ~/R/x86_64-redhat-linux-gnu-library/4.5
Install commonly used R packages:
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages(c("tidyverse","lme4","lmerTest","broom.mixed","furrr","future","progressr","here","cowplot","kableExtra"), repos="https://cloud.r-project.org")'
Test that the packages load:
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'library(tidyverse); library(lme4); library(lmerTest); library(broom.mixed); library(furrr); library(future); library(progressr); library(here); library(cowplot); library(kableExtra); cat("ok\n")'
Install one missing package:
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages("broom.mixed", repos="https://cloud.r-project.org")'
Example Slurm script for an R job
Create a file called run_analysis.slurm:
#!/bin/bash
#SBATCH --job-name=my_r_job
#SBATCH --time=24:00:00
#SBATCH --mem=64G
#SBATCH --cpus-per-task=8
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]
export PS1=""
set -eo pipefail
module purge
module load R/4.5.1
module load anaconda3/2025.6
set -u
REPO_DIR="$HOME/REPO"
cd "${REPO_DIR}"
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 \
Rscript analyses/analysis_2/your_r_script.R
Submit and monitor Slurm jobs
Submit a job:
sbatch run_analysis.slurm
Watch your queue live:
watch -n 5 squeue -u NETID
Check your queue once:
squeue -u NETID
Cancel a job:
scancel JOBID
Check job efficiency after it finishes:
seff JOBID
Inspect job logs
Print the Slurm output log:
cat slurm-JOBID.out
Follow the log live:
tail -f slurm-JOBID.out
Common cluster errors
If you see PS1: unbound variable, use this pattern in your Slurm script:
export PS1=""
set -eo pipefail
module purge
module load R/4.5.1
set -u
If R says a package is missing, install it into your user library:
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages("broom.mixed", repos="https://cloud.r-project.org")'
Useful repo and file commands
Bundle outputs for download:
tar -czvf results_outputs.tgz figures/ analyses/analysis_2/output_dir/
Typical workflow for running an R project on a cluster
Clone the repo:
cd ~
git clone [email protected]:USERNAME/REPO.git
Enter the repo and install packages:
cd ~/REPO
module purge
module load R/4.5.1
mkdir -p ~/R/x86_64-redhat-linux-gnu-library/4.5
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/4.5 Rscript -e 'install.packages(c("tidyverse","lme4"), repos="https://cloud.r-project.org")'
Submit the job:
sbatch run_analysis.slurm
Watch the queue:
watch -n 5 squeue -u NETID
Good sanity checks before a long run
Check your R and module setup:
module purge
module load R/4.5.1
Rscript -e 'sessionInfo()'
Check repo status:
cd ~/REPO
git status
git log --oneline -3
Check that required input files exist:
ls ~/REPO/data