AlphaEvolve - chunhualiao/public-docs GitHub Wiki

To explain how AlphaEvolve works, let’s walk through a concrete example: discovering a faster algorithm for multiplying 4×4 complex-valued matrices—a breakthrough achievement by AlphaEvolve, which improved on Strassen’s 1969 result for the first time in 56 years.

🧠 Step-by-Step: AlphaEvolve's Evolutionary Discovery

Step 1: Define the Problem (Human Setup)

The user (a scientist or engineer) defines:
- The task: find a program that computes 4×4 complex matrix multiplication using as few scalar multiplications as possible.
- An evaluation function h(program) that runs the code and returns metrics such as the number of multiplications and correctness.
- An initial baseline program, even a naive one, as the starting point.

Step 2: Set up the Evolutionary Pipeline

AlphaEvolve orchestrates a pipeline composed of:

A program database with past versions
A prompt sampler that builds detailed prompts for LLMs
An ensemble of LLMs (e.g., Gemini 2.0 Pro and Flash) that propose modifications
An evaluator pool that executes and scores the modified programs

Step 3: Generate and Evaluate Candidates

In each evolutionary cycle:

Select a parent from the program database.
Build a rich prompt: including the parent, top past solutions, scores, and feedback.
The LLM generates a diff, proposing changes to the program.
- Changes may involve loss functions, optimizers, initialization, regularization, etc.
- For example, it might suggest using adamw instead of adam, or adding stochastic noise for exploration.
Example diff:
```
<<<<<<< SEARCH
return optax.adam(learning_rate)
=======
return optax.adamw(learning_rate, weight_decay=1e-4)
>>>>>>> REPLACE
```
Apply the diff to generate a new candidate.
Evaluate:
- Run the program with randomized seeds.
- Use cascaded testing to filter out weak candidates early.
- Add auxiliary metrics like simplicity or stability if needed.
Store the new program and scores in the database.

Step 4: Iterate and Evolve

AlphaEvolve repeats the above process, guided by:

Evolutionary principles: balancing exploitation of good candidates and exploration of new ideas, using MAP-Elites
Meta prompt evolution: it can even evolve its own prompt strategies.
Multiple metrics: optimizing not just performance but also generalizability or interpretability.

In the 4×4 matrix case, AlphaEvolve:

Started from a generic gradient-based tensor decomposition method.
Evolved sophisticated versions with:
- Cyclical clipping of values
- Discretization loss for generating near-integer values
- “Hallucination loss” to promote exploration
- Gradient noise, weight decay, soft clipping, and custom annealing schedules

After 15 mutation steps, it found a solution using 48 multiplications, breaking a long-standing barrier.

🔬 Technical Innovations Highlighted

Innovation	Description
Full-code evolution	Not just one function, but entire files including loss, optimizer, init, etc. MAP-Elites
Program Database	Store info. about programs evaluated, guide future explorations
LLM-guided diffs	Edits are context-aware and structured, ensuring coherent code changes.
Automated evaluation	Programs are scored purely through machine-executable criteria, avoiding hallucinations.
Meta-prompting	AlphaEvolve evolves its own instructions to LLMs.
Parallelized evaluation	Distributes computation for speed, using up to 100 compute-hours per program.
Multi-objective optimization	Balances performance, simplicity, robustness, and other metrics.

🏁 Result

The discovered algorithm:

Performs 4×4 complex matrix multiplication with 48 scalar multiplications
Surpasses Strassen’s 49-multiplication algorithm from 1969
Was proven correct and exact over complex numbers
Represents a true scientific discovery in linear algebra

🤝 Summary

AlphaEvolve doesn’t just write better code—it acts like an automated research assistant, using evolutionary search and LLMs to make scientific and algorithmic discoveries. In this example, it rewrote the history of matrix multiplication by innovating over code, iteratively, with machine-verifiable improvements.

Let me know if you want to walk through another application, like Google datacenter scheduling or Gemini kernel optimization.