Optimization Strategies - Enferlain/sd-optim GitHub Wiki

Optimization Strategies

Choosing the right optimizer settings, sampling methods, and acquisition functions can significantly impact the efficiency and effectiveness of finding optimal merge parameters.

Optimizer Recommendations (General Guidelines)

These are starting points; the best settings depend heavily on the specific models being merged, the merge method, and the scorer used. Experimentation is key!

Short Runs (1-10 exploitation iterations):
- Goal: Find some improvement quickly.
- Optimizer: BayesOpt or Optuna.
- Acquisition Function (BayesOpt): poi (Probability of Improvement) - Favors exploiting known good areas.
- Sampler (BayesOpt Initial / Optuna Startup): latin_hypercube (LHS) or random are often sufficient for initial coverage.
- BayesOpt bounds_transformer (optimizer.bounds_transformer.enabled): True - Can help narrow the search quickly, potentially sacrificing global exploration.
- Optuna Pruner: Less critical for very short runs.
Medium Runs (10-50 exploitation iterations):
- Goal: Balance exploration and exploitation.
- Optimizer: BayesOpt or Optuna.
- Acquisition Function (BayesOpt): ei (Expected Improvement) with xi around 0.01 to 0.1. Provides a good balance.
- Sampler (BayesOpt Initial / Optuna Startup): sobol or halton provide better, more uniform coverage of the parameter space than pure random, which is beneficial for higher dimensions (many parameters). lhs is also a good option. Optuna's TPESampler (default) or QMCSampler often work well here.
- BayesOpt bounds_transformer (optimizer.bounds_transformer.enabled): True or False. Experiment based on whether you think the optimum is likely within a smaller region. True might speed up convergence if the initial points were good; False allows broader exploration.
- Optuna Pruner: median pruning can start eliminating unpromising trials.
Long Runs (50+ exploitation iterations):
- Goal: Prioritize thorough exploration to find the global optimum.
- Optimizer: BayesOpt or Optuna.
- Acquisition Function (BayesOpt): ucb (Upper Confidence Bound) with kappa around 2.5 to 5.0. Encourages exploring uncertain regions. Consider using kappa_decay to shift towards exploitation later in the run.
- Sampler (BayesOpt Initial / Optuna Startup): sobol is generally recommended for its uniform coverage in high dimensions. Optuna's TPESampler or QMCSampler (sobol or halton type) are strong choices.
- BayesOpt bounds_transformer (optimizer.bounds_transformer.enabled): Consider starting with False to avoid prematurely discarding regions, potentially enabling it later if convergence stalls.
- Optuna Pruner: median or successive_halving can save significant time by stopping unpromising trials early.

Sampling Methods (Initial Points / BayesOpt `sampler`)

These methods determine how the initial init_points (BayesOpt) or n_startup_trials (Optuna, effectively) are chosen before the main optimization loop begins. A good initial sampling helps the optimizer build a better model of the search space.

random: Purely random sampling. Simple, but can lead to clustering and undersampling in high dimensions.
latin_hypercube (LHS): Generates points that are evenly distributed across each parameter dimension individually, while still being random overall. Good for moderate dimensions (~ up to 15-20 parameters).
sobol: A quasi-random low-discrepancy sequence. Ensures very uniform coverage, especially good for high-dimensional spaces (20+ parameters), avoiding gaps and clusters. Excellent for reproducible exploration.
halton: Another quasi-random low-discrepancy sequence. Similar to Sobol but can exhibit correlations in very high dimensions. Often works well in moderate dimensions.

(Note: For Optuna, the specific startup behavior depends on the chosen optimizer.sampler.type. TPE often uses random initially, while QMC uses its specific sequence from the start).

Acquisition Functions (Bayesian Optimization Only)

The acquisition function guides the exploitation phase (n_iters) by deciding which point in the parameter space looks most promising to evaluate next based on the surrogate model (Gaussian Process).

poi (Probability of Improvement):
- Mechanism: Calculates the probability that a point will yield a score better than the current best score by at least xi.
- Behavior: Tends to be exploitative, focusing heavily on areas already known to be good. Can get stuck in local optima. Useful for quick refinement if you're already close.
- Parameter (xi): Controls the minimum required improvement. Larger xi demands more potential improvement, becoming more conservative (more exploitative of peaks). Smaller xi allows exploring smaller potential gains.
ei (Expected Improvement):
- Mechanism: Calculates the expected amount by which a point will improve upon the current best score, considering both the predicted mean score and the uncertainty (variance) at that point.
- Behavior: Offers a good balance between exploration (points with high uncertainty, even if the mean isn't the highest) and exploitation (points with high predicted mean score). Often a good default choice.
- Parameter (xi): A small value added to avoid over-exploitation near the current best. Larger xi slightly favors more exploitation. 0.01 is a common default.
ucb (Upper Confidence Bound):
- Mechanism: Selects points based on an optimistic upper bound of the score, calculated as predicted_mean + kappa * predicted_stddev.
- Behavior: Directly balances exploration and exploitation via the kappa parameter. Higher kappa encourages more exploration (visiting uncertain regions, even if the predicted mean is lower). Lower kappa favors exploitation (sticking closer to high-predicted means).
- Parameter (kappa): Controls the exploration trade-off. Typical values range from 1.0 to 5.0.
- Parameter (kappa_decay): Gradually reduces kappa over iterations (after kappa_decay_delay), shifting focus from exploration to exploitation as the optimization progresses.

Optuna Samplers and Pruners

Optuna offers a wider variety of samplers and pruners. Refer to the Optuna Documentation for detailed explanations of each sampler (TPE, CMAES, QMC, GP, etc.) and pruner (MedianPruner, SuccessiveHalvingPruner). Configure them via the optimizer.sampler and optimizer.use_pruning/optimizer.pruner_type settings in config.yaml.