Multi objective optimization - chunhualiao/public-docs GitHub Wiki

AlphaEvolve

Excellent—let’s dive into the multi-objective optimization strategy used in AlphaEvolve, as described in the paper.


🎯 What Is Multi-Objective Optimization (MOO)?

Instead of optimizing for a single metric (e.g. accuracy or runtime), multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives simultaneously. For example:

  • In neural network design:

    • maximize accuracy
    • minimize training time
    • minimize model size
  • In matrix decomposition:

    • minimize reconstruction loss
    • minimize the rank
    • prefer near-integer entries

🧠 Why MOO in AlphaEvolve?

In scientific and algorithmic discovery, single-metric optimization can be limiting. AlphaEvolve’s tasks often require balancing trade-offs, e.g.:

Task Metrics Involved
Matrix multiplication reconstruction loss, numerical stability, discretization, number of scalar multiplications
Circuit optimization area, power, timing, correctness
Kernel tiling runtime, generalization across shapes, correctness
Math problems score function, simplicity of solution, symmetry, integrality

So AlphaEvolve needs to optimize across a vector of metrics, not just one scalar.


🛠️ How AlphaEvolve Implements MOO

1. Evaluation Function h() Returns a Dict of Metrics

def evaluate(program) -> dict:
    return {
        "accuracy": 0.87,
        "log_loss": 0.35,
        "simplicity": 0.92,
        "discretization_loss": 0.10,
        ...
    }

This multi-dimensional feedback is attached to every program stored in the database.


2. Pareto Frontier Tracking

AlphaEvolve uses ideas from Pareto optimization:

  • A program A dominates program B if A is better or equal in all metrics and strictly better in at least one.
  • The Pareto frontier consists of all non-dominated programs.

This encourages maintaining diverse elite candidates, each representing a trade-off.


3. Prompt Construction Using Diverse Metric Champions

When building a prompt, AlphaEvolve:

  • Samples top candidates under different metrics

    • e.g., one with best runtime, one with best simplicity, one with best robustness
  • This encourages LLMs to synthesize ideas from a variety of high-performing programs

This diversity injects “creative tension” into the evolutionary loop.


4. Metric-Aware Mutation Selection

When selecting a parent program to mutate, AlphaEvolve may:

  • Prioritize under-represented niches (e.g., high simplicity but mediocre accuracy)
  • Use a score-weighted sampling that balances exploration and exploitation across objectives

This supports MAP-Elites-style exploration, where each "cell" in the search space might represent a niche of tradeoffs.


5. Metric-Aware Filtering and Early Discarding

Using evaluation cascades, AlphaEvolve may discard a candidate early if:

  • It fails to meet minimum thresholds in key metrics
  • e.g., “Don’t bother with programs that take 10× longer to run even if accuracy is slightly better”

🧪 Example: Matrix Multiplication (Section 3.1)

Here’s how multiple metrics shaped the search:

  • Primary objective: lower rank of tensor decomposition

  • Secondary metrics:

    • Reconstruction loss (how well the matrix product is reconstructed)
    • Discretization loss (are coefficients near integers/halves?)
    • Stability (avoid huge numbers)
    • Simplicity (fewer terms, symmetric structure)

LLM prompts explicitly encouraged tradeoffs like:

“Prefer integer or half-integer coefficients; penalize large values and promote near-symmetry in factors.”

Programs were scored across these dimensions. Top performers under different metrics were used in later prompts.


📈 Benefit of MOO in AlphaEvolve

From the paper:

“Even when a single metric is of particular interest, optimizing multiple metrics often improves results for the target metric.”

This aligns with a well-known principle in evolutionary computation: diversity improves evolvability.


🧰 Summary Table

Component Role in MOO
evaluate() Returns a dictionary of scalar metrics
Program DB Stores and indexes multi-metric scores
Prompt Sampler Samples diverse “winners” across different metrics
Evolutionary Loop Promotes programs on/near Pareto front
Early Filtering Discards programs failing baseline thresholds
Search Strategy Uses niche/novelty-aware sampling (MAP-Elites inspired)

Would you like a visualization (e.g., a Pareto front plot or a MOO-aware prompt construction flow)?