Optimization Strategies - Enferlain/sd-optim GitHub Wiki
Optimization Strategies
Choosing the right optimizer settings, sampling methods, and acquisition functions can significantly impact the efficiency and effectiveness of finding optimal merge parameters.
Optimizer Recommendations (General Guidelines)
These are starting points; the best settings depend heavily on the specific models being merged, the merge method, and the scorer used. Experimentation is key!
-
Short Runs (1-10 exploitation iterations):
- Goal: Find some improvement quickly.
- Optimizer: BayesOpt or Optuna.
- Acquisition Function (BayesOpt):
poi
(Probability of Improvement) - Favors exploiting known good areas. - Sampler (BayesOpt Initial / Optuna Startup):
latin_hypercube
(LHS) orrandom
are often sufficient for initial coverage. - BayesOpt
bounds_transformer
(optimizer.bounds_transformer.enabled
):True
- Can help narrow the search quickly, potentially sacrificing global exploration. - Optuna Pruner: Less critical for very short runs.
-
Medium Runs (10-50 exploitation iterations):
- Goal: Balance exploration and exploitation.
- Optimizer: BayesOpt or Optuna.
- Acquisition Function (BayesOpt):
ei
(Expected Improvement) withxi
around0.01
to0.1
. Provides a good balance. - Sampler (BayesOpt Initial / Optuna Startup):
sobol
orhalton
provide better, more uniform coverage of the parameter space than pure random, which is beneficial for higher dimensions (many parameters).lhs
is also a good option. Optuna'sTPESampler
(default) orQMCSampler
often work well here. - BayesOpt
bounds_transformer
(optimizer.bounds_transformer.enabled
):True
orFalse
. Experiment based on whether you think the optimum is likely within a smaller region.True
might speed up convergence if the initial points were good;False
allows broader exploration. - Optuna Pruner:
median
pruning can start eliminating unpromising trials.
-
Long Runs (50+ exploitation iterations):
- Goal: Prioritize thorough exploration to find the global optimum.
- Optimizer: BayesOpt or Optuna.
- Acquisition Function (BayesOpt):
ucb
(Upper Confidence Bound) withkappa
around2.5
to5.0
. Encourages exploring uncertain regions. Consider usingkappa_decay
to shift towards exploitation later in the run. - Sampler (BayesOpt Initial / Optuna Startup):
sobol
is generally recommended for its uniform coverage in high dimensions. Optuna'sTPESampler
orQMCSampler
(sobol
orhalton
type) are strong choices. - BayesOpt
bounds_transformer
(optimizer.bounds_transformer.enabled
): Consider starting withFalse
to avoid prematurely discarding regions, potentially enabling it later if convergence stalls. - Optuna Pruner:
median
orsuccessive_halving
can save significant time by stopping unpromising trials early.
sampler
)
Sampling Methods (Initial Points / BayesOpt These methods determine how the initial init_points
(BayesOpt) or n_startup_trials
(Optuna, effectively) are chosen before the main optimization loop begins. A good initial sampling helps the optimizer build a better model of the search space.
random
: Purely random sampling. Simple, but can lead to clustering and undersampling in high dimensions.latin_hypercube
(LHS): Generates points that are evenly distributed across each parameter dimension individually, while still being random overall. Good for moderate dimensions (~ up to 15-20 parameters).sobol
: A quasi-random low-discrepancy sequence. Ensures very uniform coverage, especially good for high-dimensional spaces (20+ parameters), avoiding gaps and clusters. Excellent for reproducible exploration.halton
: Another quasi-random low-discrepancy sequence. Similar to Sobol but can exhibit correlations in very high dimensions. Often works well in moderate dimensions.
(Note: For Optuna, the specific startup behavior depends on the chosen optimizer.sampler.type
. TPE often uses random initially, while QMC uses its specific sequence from the start).
Acquisition Functions (Bayesian Optimization Only)
The acquisition function guides the exploitation phase (n_iters
) by deciding which point in the parameter space looks most promising to evaluate next based on the surrogate model (Gaussian Process).
-
poi
(Probability of Improvement):- Mechanism: Calculates the probability that a point will yield a score better than the current best score by at least
xi
. - Behavior: Tends to be exploitative, focusing heavily on areas already known to be good. Can get stuck in local optima. Useful for quick refinement if you're already close.
- Parameter (
xi
): Controls the minimum required improvement. Largerxi
demands more potential improvement, becoming more conservative (more exploitative of peaks). Smallerxi
allows exploring smaller potential gains.
- Mechanism: Calculates the probability that a point will yield a score better than the current best score by at least
-
ei
(Expected Improvement):- Mechanism: Calculates the expected amount by which a point will improve upon the current best score, considering both the predicted mean score and the uncertainty (variance) at that point.
- Behavior: Offers a good balance between exploration (points with high uncertainty, even if the mean isn't the highest) and exploitation (points with high predicted mean score). Often a good default choice.
- Parameter (
xi
): A small value added to avoid over-exploitation near the current best. Largerxi
slightly favors more exploitation.0.01
is a common default.
-
ucb
(Upper Confidence Bound):- Mechanism: Selects points based on an optimistic upper bound of the score, calculated as
predicted_mean + kappa * predicted_stddev
. - Behavior: Directly balances exploration and exploitation via the
kappa
parameter. Higherkappa
encourages more exploration (visiting uncertain regions, even if the predicted mean is lower). Lowerkappa
favors exploitation (sticking closer to high-predicted means). - Parameter (
kappa
): Controls the exploration trade-off. Typical values range from 1.0 to 5.0. - Parameter (
kappa_decay
): Gradually reduceskappa
over iterations (afterkappa_decay_delay
), shifting focus from exploration to exploitation as the optimization progresses.
- Mechanism: Selects points based on an optimistic upper bound of the score, calculated as
Optuna Samplers and Pruners
Optuna offers a wider variety of samplers and pruners. Refer to the Optuna Documentation for detailed explanations of each sampler (TPE
, CMAES
, QMC
, GP
, etc.) and pruner (MedianPruner
, SuccessiveHalvingPruner
). Configure them via the optimizer.sampler
and optimizer.use_pruning
/optimizer.pruner_type
settings in config.yaml
.