Running SAFER - EBI-Metabolights/SAFERnmr GitHub Wiki

Steps for the SAFER annotation pipeline:

Make sure you have gone through the setup steps
Pull the latest version of SAFER from the main branch.
Open Terminal and type ‘R’, or open Rstudio (this will start R).
Build the package by typing

devtools::document(‘../GitHub/SAFER')

Note: change the filepath to your local clone of the repository

Note: if warnings suggest running an rm() command on a function, execute those and repeat (1)

You should see the following output:

Open a previously used param.yaml file or see parameter file setup

check to make sure tmp.dir is set to a location that exists or can be created (no nested folders). For example, you want to create a directory for this run in …/Documents called current_run. Set tmp.dir to …/Documents/current_run. A timestamped directory will be created here for each run you do on the machine you're using.
- check that the study parameters are correct
- check the following are set to full filepaths on your machine:
  - …/lib.data.RDS this file should match your dataset spectrometer frequency as closely as possible, and can be pulled from our list of GISSMO slices here: http://ftp.ebi.ac.uk/pub/databases/metabolights/studies/mariana/gissmo_ref/
  - …/spectral.matrix.RDS Options for this include:

supply your own spectral matrix (see specifications)
pull from MetaboLights via a study page
use one of the pre-converted MTBLS study matrices: http://ftp.ebi.ac.uk/pub/databases/metabolights/studies/mariana/spectral_matrices/

check the corrpocket params.

Note: If not enough features are found, try the following (in order):

open up the half.window by 50% (from default 0.03 ppm to ~0.05 ppm)
raise the noise.percentile to ~ .99, or
lower the r.cutoff to as little as ~ 0.6. If half.window is too small, peak pairs may be getting missed, but if it is opened too wide, computational demands increase and inter-peak relationships may be captured. In general, the maximum expected J-coupling observed in a multiplet should be a good starting place, and this parameter shouldn’t affect too much as this is just a seed for STORM. The noise.percentile may also be too low (too strict). This should be near the top of this graph (here, set to 0.95):

More on this in the FSE description. As a last effort, you could lower either r.cutoff for STORM if the dataset is inherently very misaligned. Note that this will decrease the specificity of features, however, as it is a STOCSY threshold. You may need more samples to get misaligned features.

Run the pipeline:

pipeline(params_loc = '…/param_template.yaml')

Note: change the filepath to match the params.yaml file you modified above! This will be copied to the tmp.dir, so it can be anywhere on your machine.

You should see the pipeline log scripts begin to print, starting with the FSE module:

That’s it! For the average 250-spectrum x 130K points dataset, about 2-10K features will be generated. Expected runtime is ~ 1h on 50 cores.