Architecture MOE Routing - kennetholsenatm-gif/q_mini_wasm_v2 GitHub Wiki
Uses max-plus semiring algebra:
| Operation | Tropical Form | Standard Equivalent |
|---|---|---|
| Addition | max(a, b) |
a + b |
| Multiplication | a + b |
a × b |
Selects K active experts from N total based on input affinity.
Hypersimplex capacity: C(N,K) distinct routing regions.
| Config | Experts | Active | Capacity |
|---|---|---|---|
| Small | 4 | 2 | 6 regions |
| Medium | 8 | 2 | 28 regions |
| Large | 16 | 4 | 1820 regions |
Auxiliary loss ensures uniform expert utilization:
L_balance = N × Σ(f_i × P_i)
Where f_i = fraction of tokens routed to expert i,
P_i = average routing probability for expert i.
Sparse routing achieves exponential combinatorial depth without proportional compute cost. <write_to_file> docs/architecture/forward-forward.md
Each layer learns independently using local goodness metrics. No gradient chain required.
Tropical inner product: goodness = max(x_i × w_i)
| Sample Type | Goal |
|---|---|
| Positive | Maximize goodness |
| Negative | Minimize goodness |
Hebbian rule (gradient-free):
w_new = w_old + lr × (goodness_pos - goodness_neg) × input
Layers train independently — fully parallelizable across SYCL work groups.
- No vanishing gradients
- Local learning (no global coordination)
- Hardware-friendly (integer operations)
- Natural fit for ternary state space
SYCL provides hardware-agnostic parallelism for GPU and multi-core CPU acceleration.
| Operation | Parallelism Strategy |
|---|---|
| Tableau updates | Row-level parallelism |
| Modulo-3 arithmetic | Sub-group vectorization |
| MoE routing | Expert-level parallelism |
| Forward-Forward | Layer-level parallelism |
queue.submit([&](handler& h) {
h.parallel_for(range{n}, [=](id<1> i) {
// Parallel tableau row update
});
});cmake -DUSE_SYCL=ON ..
cmake --build .Requires Intel oneAPI or DPC++ toolchain.
- [Stabilizer Tableau](Stabilizer Tableau) — operations being parallelized
- [Build Guide](Guides-SYCL Setup.md) — SYCL environment setup