Parameters optimization - adelabriere/SLAW GitHub Wiki

Parameters optimization in SLAW can be performed using the three peak picking methods is performed using a response surface optimization method similar to IPO. It uses a derivative-free optimization method. To allow parameters optimization the optimization/need_optimization parameters need to be switched from false to true. All processing files creating by this step are placed in the output folder in a temp_optim subprocess. The final parameters set will be overwritten in the parameters.txt file, the initial parameters.txt will be written initial_parameters.txt

The parameters of the grouping or peak picking step will be optimized if they include a range file in the optimization file.

Optimization is then performed on a subset of files taken from the QC files if the information is provided or any other files otherwise. Moreover, to speed up the optimization process, only a subset of the mass ranges is considered. This subset is composed of 100 mass bins of size 1, selected to ensure that intense peaks and their isotopes are extracted. The number of files used can be tuned using the optimization/files_used arguments. These files can also be filtered to remove any point below a threshold using the optimization/noise_threshold argument.

At most optimization/num_iterations sampling steps are performed, each including optimization/number_of_points points (This number will always be at least 20 and at most 60 ) are sampled to be equally spaced in the current parameter bounds. Peak picking is performed and a metric is evaluated on each sampled experiment. These sampled points are then used to estimates a polynomial surface using a LASSO regression. The maximum of the surface is then estimated using the 'L-BFGS-B' algorithm as implemented in python. The variables corresponding to non-zero coefficients are extracted (At most 3) and the range of these parameters is reduced, the estimated maximum position on the polynomial surface is also added to the next iteration.

Some parameters notably related to alignment are tough to optimize as they depend on all the samples, or they depend on the preprocessing of the data and are very hard to estimate correctly, these notably include the raw data noise (noise_level_ms1 and noise_level_ms2) and features levels noise (peaks_deconvolution/noise_level). Therefore, these parameters should be tuned manually before any optimization.

SLAW optimization metric for both alignment and peak picking is the harmonic mean of 2 metrics, one representing sensitivity and the other representing robustness, more detail s available in the original publication.

The optimization process used by SLAW has the following parameters:

optimization/need_optimization: (Boolean) Shall the parameters be optimized at all. If it is set to true, only the parameters which have a range will be optimized.
optimization/noise_threshold: (Float) A noise threshold top filter the mzML files.
optimization/files_used: (Integer) The number of QC files that will be used during optimization.
optimization/num_iterations: (Integer) The maximum number of sampling steps. If there is no improvement after a sampling step the optimization stops. Typically 10 iterations should be enough.
optimization/number_of_points: (Integer) The number of points sampled at each step. It should be superior to 20. If it is not it will be set to 20. A higher number of points will give you better results.

The results of the optimization are stored in the temp_optim subfolder of the output directory. Two important files are notably summary_par_peakpicking.csv and summary_par_alignment.csv which store the values of the parameters as well as the optimization stored as the parameters. The mzML of the raw files after filtrations and mass range selection can be found in output/temp_optim/mzML