Data processing - EBI-Metabolights/SAFERnmr GitHub Wiki

Preprocessing and processing

What kind of data can be used?

For now, SAFER uses 1D NMR data

While SAFER avoids traditional alignment and peakpicking methods, it does require that spectra have undergone basic 1D NMR pre-processing, including:

zero filling
line broadening (may not be as important)
zero- and first-order phasing
fourier transformation

as well as basic processing steps such as:

some form of chemical shift referencing/calibration (ideally to TSP or DSS, although chemical shift tolerance can be set quite high to account for others)
baseline correction, if necessary (the degree to which this is important is still being explored)
solvent regions and end regions do not need to be removed
rough normalization – do NOT use scaled data. Calculations include mean-centering when appropriate.
remove NA values and check for columns of all-zero values. Fill with noise if need be.

Input format for spectral matrix

The spectral matrix provided to SAFER should be an .RDS file containing a matrix, the first row of which is the vector of chemical shift (ppm) values corresponding to each spectral point (the columns of the matrix). Each additional row is the spectral intensity values for a sample. Sample names can be provided as rownames, if desired.

Library files

At present, we have formatted a set of library files at different field strengths using the GISSMO database: https://ftp.ebi.ac.uk/pub/databases/metabolights/studies/mariana/gissmo_ref/

However, HPLC fractions, some 2D data, and other datasets could conceivably be used as reference spectra for some interesting use cases. Contact us for help with formatting your data to use as a library.