Guides 243 Expert Config - kennetholsenatm-gif/q_mini_wasm_v2 GitHub Wiki
This guide provides specific configuration recommendations for deploying 243-expert Mixture of Experts (MoE) models at scale.
- Memory: 16 GB RAM (32 GB recommended)
- Storage: 10 GB (for expert weights and checkpoints)
- CPU: 8 cores (16 threads)
- GPU: Optional but recommended (SYCL-compatible)
- Memory: 64 GB RAM
- GPU: Intel Arc / NVIDIA with 16 GB VRAM
- Storage: NVMe SSD with 50 GB free
- Network: 1 Gbps (for distributed training)
#include "core/moe/unified_config.hpp"
#include "core/moe/unified_router.hpp"
// Use the built-in large-scale configuration
auto config = UnifiedMoEConfig::LargeScale();
// Create pre-configured router
auto router = Create243ExpertRouter();
// Ready to use!auto config = MoEConfigBuilder()
.TotalExperts(243)
.ActiveExperts(16)
.SpecializationDim(128)
.Topology(UnifiedMoEConfig::TopologyType::HIERARCHICAL)
.HierarchicalParams(16, 16) // 16 experts per cluster, 16 clusters
.EnableLoadBalancing(true)
.EnergyBudget(ternary::EnergyTrit::MEDIUM)
.Build();Experts: 243 à specialization_dim à 2 bits
Entanglement: edges à (from + to + strength) à 4 bytes
Routing weights: 243 à routing_dim à 2 bits
Cluster metadata: clusters à cluster_size à 4 bytes
Load tracking: 243 à counters à 8 bytes
Overhead: ~20% for buffers and alignment
Expert specializations: 243 Ã 128 Ã 2 = 62,208 bits = 7.8 KB
Entanglement (small-world): ~1,000 edges à 12 bytes = 12 KB
Routing weights: 243 Ã 64 Ã 2 = 31,104 bits = 3.9 KB
Cluster metadata: 16 Ã 16 Ã 4 = 1,024 bytes
Load tracking: 243 Ã 8 = 1,944 bytes
Total base: ~27 KB
With 20% overhead: ~32 KB for router state
Per-expert networks (typical 3-layer):
Weights: 3 layers à 64Ã128 à 2 bits = 6 KB per expert
243 experts à 6 KB = 1,458 KB = ~1.4 MB
Total memory: ~1.5 MB minimum
Recommended: 16-64 MB for buffers, caching, and growth
| Experts | Specialization Dim | Router Memory | Expert Networks (3-layer) | Total |
|---|---|---|---|---|
| 16 | 64 | 2 KB | 96 KB | ~100 KB |
| 64 | 128 | 8 KB | 384 KB | ~400 KB |
| 128 | 128 | 15 KB | 768 KB | ~800 KB |
| 243 | 128 | 27 KB | 1.4 MB | ~1.5 MB |
| 243 | 256 | 54 KB | 2.8 MB | ~3 MB |
Number of Experts?
âââ âĪ 32 â Use RING topology
â âââ Simple, fast, good locality
âââ 33-128 â Use SMALL_WORLD topology
â âââ Balance of locality and connectivity
âââ âĨ 129 â Use HIERARCHICAL topology
âââ Required for 243-expert scale
config.topology = UnifiedMoEConfig::TopologyType::RING;
config.use_hierarchical_selection = false;Characteristics:
- O(n) edges (2 per expert)
- Fast message passing
- Good for small clusters
- Simple implementation
config.topology = UnifiedMoEConfig::TopologyType::SMALL_WORLD;
config.small_world_rewiring_prob = 0.3; // 30% rewiring
config.small_world_k = 4; // 4 neighbors
config.use_hierarchical_selection = true;Characteristics:
- O(n log n) average path length
- Sparse: O(n) edges
- Efficient for medium scale
- Watts-Strogatz model
config.topology = UnifiedMoEConfig::TopologyType::HIERARCHICAL;
config.cluster_size = 16; // Experts per cluster
config.num_clusters = 16; // 16 Ã 16 = 256 capacity
config.use_hierarchical_selection = true;Characteristics:
- Two-level routing: cluster â expert
- O(n) total edges (sparse)
- Scales to thousands of experts
- Matches distributed memory hierarchies
How it works:
- Route to cluster (select 2-4 clusters)
- Route within cluster (select top-K from cluster)
- Total complexity: O(ân) instead of O(n)
| Total Experts | Recommended K | Notes |
|---|---|---|
| 16 | 4 | 25% activated |
| 64 | 8 | 12.5% activated |
| 128 | 12 | 9.4% activated |
| 243 | 16 | 6.6% activated |
// For 243 experts, activate top 16
config.active_experts = 16;
// Enable parallel Top-K selection
config.use_parallel_topk = true;
config.topk_batch_size = 64;// Higher K = more compute, potentially better quality
config.active_experts = 32; // More experts per token
// Lower K = less compute, faster inference
config.active_experts = 8; // Fewer experts per token// Enable load balancing
config.enable_load_balancing = true;
// Rebalance every 100 routing operations
config.load_balance_interval = 100;
// Load balancing loss weight
// Higher = more aggressive balancing (but may hurt specialization)
config.load_balance_alpha = 0.01f; // 1% load balancing penaltyauto stats = router.GetLoadStats();
// Imbalance score: 0 = perfect, 1 = worst
float score = stats.imbalance_score;
if (score < 0.2) {
// Excellent: experts well-utilized
} else if (score < 0.5) {
// Good: some imbalance (healthy specialization)
} else {
// Poor: rebalance needed
router.RebalanceLoads();
}Good specialization shows as uneven utilization:
- Some experts: 5-10% utilization (specialists)
- Other experts: 0.5-1% utilization (rare patterns)
- Imbalance score: 0.2-0.4
Bad balance (no specialization):
- All experts: ~4% utilization (uniform)
- Imbalance score: < 0.1
enum class EnergyTrit {
LOW, // < 0.5 pJ per operation
MEDIUM, // 0.5-1.0 pJ per operation
HIGH // > 1.0 pJ per operation
};// Edge / Battery-powered device
config.energy_budget = ternary::EnergyTrit::LOW;
config.energy_aware_routing = true;
config.active_experts = 8; // Fewer experts = less energy
// Data center / Desktop
config.energy_budget = ternary::EnergyTrit::MEDIUM;
config.active_experts = 16;
// High-performance computing
config.energy_budget = ternary::EnergyTrit::HIGH;
config.active_experts = 32; // More experts for quality// Router prefers low-energy paths when energy budget is tight
config.energy_aware_routing = true;
// Energy cost is factored into routing scores
// Low-energy experts preferred when close in specialization| Use Case | Recommended Dimension | Rationale |
|---|---|---|
| Simple tasks | 64 | Fast routing, less memory |
| General NLP | 128 | Good balance |
| Code generation | 256 | Complex patterns need more dimensions |
| Multimodal | 512 | Diverse input types |
config.specialization_dim = 128; // Standard for 243-expertHigher dimension:
- â Better discrimination between experts
- â More nuanced specialization
- â More memory per expert
- â Slower routing computation
// Check configuration is valid for 243 experts
bool valid = router.Validate243Config();
// Should verify:
// - Hierarchical selection enabled
// - Appropriate topology
// - Cluster sizes match// Estimate memory usage before deployment
size_t bytes = router.EstimateMemoryUsage();
std::cout << "Estimated memory: " << (bytes / 1024 / 1024) << " MB" << std::endl;
// Ensure available memory > 2x estimate for buffers
assert(available_memory > 2 * bytes);// Benchmark routing latency
std::vector<std::vector<ternary::Trit>> test_inputs(1000, input);
auto start = std::chrono::high_resolution_clock::now();
for (const auto& inp : test_inputs) {
router.Route(inp);
}
auto end = std::chrono::high_resolution_clock::now();
auto avg_latency = std::chrono::duration_cast<std::chrono::microseconds>(
end - start
// ... (truncated)
// See source for complete code// === 243-EXPERT PRODUCTION CONFIG ===
auto config = UnifiedMoEConfig::LargeScale();
// Topology: Hierarchical for 243-expert scale
config.topology = UnifiedMoEConfig::TopologyType::HIERARCHICAL;
config.cluster_size = 16;
config.num_clusters = 16;
// Routing: Top-16 of 243
config.active_experts = 16;
config.use_hierarchical_selection = true;
// ... (truncated)
// See source for complete codeFROM qminiwasm/moe:latest
# Configure 243-expert model
ENV MOE_TOTAL_EXPERTS=243
ENV MOE_ACTIVE_EXPERTS=16
ENV MOE_TOPOLOGY=hierarchical
ENV MOE_CLUSTER_SIZE=16
# Memory limits
ENV MEMORY_LIMIT=4g
// ... (truncated)
// See source for complete codeapiVersion: apps/v1
kind: Deployment
metadata:
name: moe-243-expert
spec:
replicas: 3
template:
spec:
containers:
- name: moe
image: qminiwasm/moe:v2.0
// ... (truncated)
// See source for complete codeSymptoms: OOM errors, swapping
Solutions:
- Check topology (should be HIERARCHICAL, not DENSE)
- Reduce specialization_dim (try 64 instead of 128)
- Use smaller active_experts (try 8 instead of 16)
Symptoms: High latency, poor throughput
Solutions:
- Enable hierarchical_selection
- Enable parallel_topk
- Reduce cluster_size (try 8 instead of 16)
- Use SYCL acceleration
Symptoms: Uniform utilization, low model quality
Solutions:
- Reduce load_balance_alpha (try 0.001)
- Increase training data diversity
- Check that Forward-Forward training is working
- Verify negative samples are being generated
Symptoms: Router throws exceptions, some experts unused
Solutions:
- Check config.enable_load_balancing is true
- Verify expert registration succeeded
- Check that all 243 experts have valid specializations
// 1. Export 64-expert configuration
auto old_config = UnifiedMoEConfig::MediumScale();
// 2. Create 243-expert configuration
auto new_config = UnifiedMoEConfig::LargeScale();
// 3. Copy relevant settings
new_config.specialization_dim = old_config.specialization_dim;
new_config.active_experts = old_config.active_experts; // Keep same K
// 4. Enable hierarchical routing
// ... (truncated)
// See source for complete codeWhen scaling up, you can preserve knowledge from smaller deployments:
// 1. Load old 64-expert router
auto old_router = LoadRouter("64_expert.chk");
// 2. Create new 243-expert router
auto new_router = Create243ExpertRouter();
// 3. Copy specializations from old experts to new
// (cluster 0 experts in new get old expert specializations)
for (size_t i = 0; i < 64; ++i) {
auto spec = old_router.GetExpert(i)->GetSpecialization();
new_router.UpdateSpecialization(i, spec);
// ... (truncated)
// See source for complete code- [Expert Network Architecture Guide](Expert Networks)
- [MoE Training Guide](MOE Training)
- [API Reference: UnifiedMoEConfig](API-Unified Config.md)
- [Performance Tuning](Guides-Performance Tuning.md)
Version: 1.0
Last Updated: April 2026