Installation onto HPC environment - EBI-Metabolights/SAFERnmr GitHub Wiki

Prerequisites

First and foremost your HPC environment needs to be have access to R. Secondly, you must have:

  • An input spectral matrix .RDS file, more on that here
  • A config file (referred to in these parts as params.yaml) that specifies various parameter values for the pipeline, and tells the R package where to save its intermediary files and outputs , more on that here
  • A healthy compute allowance and permitted maximum wall clock time for jobs

Notes

The amount of compute resources you'd want to provision is based entirely on the size of the dataset you intend to run. There is an exponential relationship between the size of the input spectral matrix and the amount of memory required to process it in SAFER. This is not necessarily inferred from the size of the matrix file itself.

The overhead on the installation of the tool and its dependencies is low. Deleting the installation between runs will somewhat obviously increase the run time. Memory profiling is recommended, as it is possible to have extreme values in the config file that will cause gargantuan memory leaks. If you do find your self hitting out of memory issues, check the config file first and foremost, referring back to the examples provided in this wiki.

Make sure your config file is correct! It will save files where it's told, even if the relevant fields in the config file are blank.

Running on LSF

Below is an example script that would enable you to run SAFER as an LSF job:

CONFIG=/path/to/params/params.yaml
LOG_FILE=/path/to/logs/safer_pipeline.log
USER_LIBS=/path/to/R/library

mkdir -p $USER_LIBS
export R_LIBS_USER=$USER_LIBS

module purge

module load r-4.0.3-gcc-9.3.0-4l6eluj

Rscript -e "rm()"
Rscript -e "install.packages('devtools', repos='https://cran.rstudio.com/')" > $LOG_FILE 2>&1
Rscript -e "library(devtools)" >> $LOG_FILE 2>&1
Rscript -e "devtools::install_github('EBI-Metabolights/SAFER@main')" >> $LOG_FILE 2>&1
Rscript -e "library(ImperialNMRTool)" >> $LOG_FILE 2>&1
Rscript -e "ImperialNMRTool::pipeline(params_loc = '$CONFIG')" >> $LOG_FILE 2>&1
Rscript -e "rm()"

You could then run the above with something like bsub -u your-user -J "SAFER-job-001" -q standard -R "rusage[mem=300000]" -M 300000 -n 14 /path/to/that/script.sh.

Running on SLURM

Below is an example script that would allow you to run SAFER as a SLURM job:

#SBATCH --mem=1000G
#SBATCH -t 5-0:0:0
#SBATCH -p standard
#SBATCH -n 12
#SBATCH -N 1

CONFIG=/path/to/your/params.yaml

SUBSTR=$(basename "$CONFIG")
LOG_FILE=/path/to/logs/safer/$SUBSTR.log
USER_LIBS=/path/to/R/library
SCRIPT_DIR=/where/you/keep/your/scripts

mkdir -p $USER_LIBS
export R_LIBS_USER=$USER_LIBS

module purge
module load r


Rscript -e "install.packages('devtools', repos='https://cran.rstudio.com/')" > $LOG_FILE 2>&1
Rscript -e "library(devtools)" >> $LOG_FILE 2>&1
Rscript -e "devtools::install_github('EBI-Metabolights/SAFER@main')" >> $LOG_FILE 2>&1
Rscript -e "library(SAFER)" >> $LOG_FILE 2>&1
Rscript -e "SAFER::pipeline(params_loc = '$CONFIG')" >> $LOG_FILE 2>&1

Rscript -e "rm()"

You could then run the the above in SLURM with something like sbatch --output=/slurm_output/log.txt /path/to/that/script.sh

We have had good success running SAFER on NMRBox, and recommend trying that as well. The latest docker/singularity images can be built with:

singularity exec safer_latest.sif R
devtools::document('path/to/local/safer/repo')
pipeline(params_loc = '…/param_template.yaml')
browse_evidence('path_data_directory')

If you are getting stuck running SAFER in a HPC workload manager then feel free to raise an issue here on github, open a discussion here on github or email me directly at [email protected]