Alignment and retention time correction - adelabriere/SLAW GitHub Wiki

The retention time correction used in SLAW was developed to be applied online, potentially to thousands of samples. To do so the algorithm correct retention time on two passes. First, it tries to detect references features and then, the maximum deviation of these references features in each file is evaluated, and the retention time is then corrected using a LoRANSAC algorithm, using this estimated deviations. The parameters related to the retention time correction are the following :

  • grouping/num_references: (Integer) The number of references features which will be extracted. These features are only used to estimate the parameters of the LoRANSAC algorithm, the RT correction takes into account all the feature anyway.
  • grouping/ppm: (Float) The expected maximum mass deviation in ppm, used to match references peaks.
  • grouping/dmz: (Float) The expected maximum mass deviation in m/z in Dalton. It will also be used as the bin size.
  • grouping/drt: (Float) The expected maximum retention time deviation in minutes. It used both to match the reference peaks and as the kernel of the density estimation.
  • grouping/alpha: (Float) A regularization term, it penalizes higher RT deviation at the beginning of the gradients during retention time correction.

The grouping/ppm and grouping/drt are not used as hard thresholds to map references peaks, but are used to compute a distance. They should therefore remain low.

After this step the peaks are matched together using binning along the mass axis and density kernel estimation similar to the XCMS algorithm. In each mass bin, given by grouping/dmz A kernel density estimation using is then used along the retention time dimension and peak are clustered together if they fall under the same density peak. The kernel of the gaussian is taken as a grouping/drt divided by 3 and grouping/dmz is used as the bin size.

The recommended way to tune these parameters is to use SLAW parameters optimization.