Installation onto HPC environment - EBI-Metabolights/SAFERnmr GitHub Wiki
Prerequisites
First and foremost your HPC environment needs to be have access to R. Secondly, you must have:
- An input spectral matrix .RDS file, more on that here
- A config file (referred to in these parts as params.yaml) that specifies various parameter values for the pipeline, and tells the R package where to save its intermediary files and outputs , more on that here
- A healthy compute allowance and permitted maximum wall clock time for jobs
Notes
The amount of compute resources you'd want to provision is based entirely on the size of the dataset you intend to run. There is an exponential relationship between the size of the input spectral matrix and the amount of memory required to process it in SAFER. This is not necessarily inferred from the size of the matrix file itself.
The overhead on the installation of the tool and its dependencies is low. Deleting the installation between runs will somewhat obviously increase the run time. Memory profiling is recommended, as it is possible to have extreme values in the config file that will cause gargantuan memory leaks. If you do find your self hitting out of memory issues, check the config file first and foremost, referring back to the examples provided in this wiki.
Make sure your config file is correct! It will save files where it's told, even if the relevant fields in the config file are blank.
Running on LSF
Below is an example script that would enable you to run SAFER as an LSF job:
CONFIG=/path/to/params/params.yaml
LOG_FILE=/path/to/logs/safer_pipeline.log
USER_LIBS=/path/to/R/library
mkdir -p $USER_LIBS
export R_LIBS_USER=$USER_LIBS
module purge
module load r-4.0.3-gcc-9.3.0-4l6eluj
Rscript -e "rm()"
Rscript -e "install.packages('devtools', repos='https://cran.rstudio.com/')" > $LOG_FILE 2>&1
Rscript -e "library(devtools)" >> $LOG_FILE 2>&1
Rscript -e "devtools::install_github('EBI-Metabolights/SAFER@main')" >> $LOG_FILE 2>&1
Rscript -e "library(ImperialNMRTool)" >> $LOG_FILE 2>&1
Rscript -e "ImperialNMRTool::pipeline(params_loc = '$CONFIG')" >> $LOG_FILE 2>&1
Rscript -e "rm()"
You could then run the above with something like bsub -u your-user -J "SAFER-job-001" -q standard -R "rusage[mem=300000]" -M 300000 -n 14 /path/to/that/script.sh.
Running on SLURM
Below is an example script that would allow you to run SAFER as a SLURM job:
#SBATCH --mem=1000G
#SBATCH -t 5-0:0:0
#SBATCH -p standard
#SBATCH -n 12
#SBATCH -N 1
CONFIG=/path/to/your/params.yaml
SUBSTR=$(basename "$CONFIG")
LOG_FILE=/path/to/logs/safer/$SUBSTR.log
USER_LIBS=/path/to/R/library
SCRIPT_DIR=/where/you/keep/your/scripts
mkdir -p $USER_LIBS
export R_LIBS_USER=$USER_LIBS
module purge
module load r
Rscript -e "install.packages('devtools', repos='https://cran.rstudio.com/')" > $LOG_FILE 2>&1
Rscript -e "library(devtools)" >> $LOG_FILE 2>&1
Rscript -e "devtools::install_github('EBI-Metabolights/SAFER@main')" >> $LOG_FILE 2>&1
Rscript -e "library(SAFER)" >> $LOG_FILE 2>&1
Rscript -e "SAFER::pipeline(params_loc = '$CONFIG')" >> $LOG_FILE 2>&1
Rscript -e "rm()"
You could then run the the above in SLURM with something like sbatch --output=/slurm_output/log.txt /path/to/that/script.sh
Running on NMRBox
We have had good success running SAFER on NMRBox, and recommend trying that as well.
If you are getting stuck running SAFER in a HPC workload manager then feel free to raise an issue here on github, open a discussion here on github or email me directly at [email protected]