Model Routing - JoelBondoux/AtlasMind GitHub Wiki
Model Routing
The model router selects the best LLM for each request based on budget preference, speed preference, task profile, provider health, and the runtime-refreshed provider model catalog.
Supported Providers
| Provider | ID | Pricing Model | Catalog source | Notes |
|---|---|---|---|---|
| Anthropic | anthropic |
Pay-per-token | Runtime discovery via adapter discoverModels() / listModels() |
One seed model is registered before refresh completes |
| OpenAI | openai |
Pay-per-token | Runtime discovery via /models on the OpenAI-compatible adapter |
One seed model is registered before refresh completes |
| GitHub Copilot | copilot |
Subscription | Runtime discovery from the VS Code Language Model API | Starts with copilot/default, then refreshes to live Copilot-visible models |
google |
Pay-per-token | Runtime discovery via the Gemini OpenAI-compatible /models endpoint |
One seed model is registered before refresh completes | |
| Mistral | mistral |
Pay-per-token | Runtime discovery via /models on the OpenAI-compatible adapter |
One seed model is registered before refresh completes |
| DeepSeek | deepseek |
Pay-per-token | Runtime discovery via /models on the OpenAI-compatible adapter |
One seed model is registered before refresh completes |
| z.ai | zai |
Pay-per-token | Runtime discovery via /models on the OpenAI-compatible adapter |
One seed model is registered before refresh completes |
| Local | local |
Free | Static local fallback adapter | Currently only local/echo-1 |
The short model names you may see initially are seed entries, not AtlasMind's intended final provider catalog. On activation, and whenever the user clicks Refresh Model Metadata, Atlas scans providers for their live model list and merges that runtime discovery into the router.
Catalog Refresh And Seed Models
AtlasMind uses a two-stage catalog strategy:
registerDefaultProviders()seeds one minimal model per provider so routing works immediately.refreshProviderModelsCatalog()runs on startup and on manual refresh.- Providers with
discoverModels()contribute rich runtime metadata directly. - Providers with only
listModels()contribute IDs, which Atlas enriches using the well-known catalog and heuristics. - If refresh fails, the existing seeded/static provider catalog remains in place.
This means the provider table should be read as dynamic discovery capability, not a hardcoded model inventory.
Metadata Enrichment
Discovered model IDs are normalized and resolved through this precedence chain:
- Runtime hint from
discoverModels() - Well-known entry from
src/providers/modelCatalog.ts - Name-based heuristic fallback in
inferModelMetadata()
The well-known catalog improves pricing, capability, context-window, and premium-request metadata for models that were discovered dynamically. It does not replace runtime discovery.
Adding API Keys
- Open Command Palette → AtlasMind: Manage Model Providers
- Click Set Key for the provider
- Keys are stored in VS Code's
SecretStorage— never in settings or source
Provider Health
- The router tracks per-provider health status
- Unhealthy providers receive a health penalty (score multiplier × 0) and are deprioritised
- Health updates via
setProviderHealth()— typically after request failures
Selection Algorithm
1. Candidate Filtering
Models pass through three gates:
| Gate | Rule |
|---|---|
| Enabled | Provider and model must both be enabled |
| Health | Provider must be marked healthy |
| Whitelist | If agent has allowedModels, model must be in the list |
| Capabilities | Model must support all requiredCapabilities from the task profile |
| Budget gate | Model's budget tier must be in the allowed set for the configured budget mode |
| Speed gate | Model's speed tier must be in the allowed set for the configured speed mode |
2. Scoring
Each candidate is scored using:
score = (cheapness × budgetWeight) + (speedProxy × speedWeight)
+ (qualityProxy × qualityWeight) + taskFit + healthBonus
| Factor | How it's computed |
|---|---|
| Cheapness | 1 / max(0.0001, effectiveCost) — lower cost → higher score |
| Speed proxy | fast = 1.5, balanced = 1.0, considered = 0.6 |
| Quality | reasoning = 1.5, code = 1.2, other = 1.0 |
| Task fit | Bonus for matching preferred capabilities and task phase |
| Health bonus | +1.25 for healthy providers, 0 for unhealthy |
3. Weighting
Weights are controlled by budget and speed mode:
| Budget Mode | Budget Weight |
|---|---|
cheap |
3.0 |
balanced |
1.5 |
expensive |
0.5 |
auto |
1.5 |
| Speed Mode | Speed Weight |
|---|---|
fast |
3.0 |
balanced |
1.5 |
considered |
0.75 |
auto |
1.5 |
Budget Modes
| Mode | Allowed Model Tiers | Best For |
|---|---|---|
| cheap | cheap only | Bulk operations, simple queries |
| balanced | cheap + balanced | General development (default) |
| expensive | cheap + balanced + expensive | Architecture, complex reasoning |
| auto | Adapts per task profile | Let the profiler decide |
Budget tier classification (by total price per 1K tokens):
| Tier | Price Range |
|---|---|
| Cheap | ≤ $0.0015 / 1K |
| Balanced | ≤ $0.008 / 1K |
| Expensive | > $0.008 / 1K |
Auto Budget Mode
When budget is auto, the task profiler adjusts:
- High reasoning → balanced + expensive
- Medium reasoning → cheap + balanced
- Low reasoning → cheap + balanced
Speed Modes
| Mode | Allowed Model Tiers | Best For |
|---|---|---|
| fast | fast only | Quick edits, simple lookups |
| balanced | fast + balanced | General development (default) |
| considered | balanced + considered | Planning, architecture, deep analysis |
| auto | Adapts per task profile | Let the profiler decide |
Speed tier classification:
| Tier | Criteria |
|---|---|
| Fast | No reasoning capability AND context ≤ 128K |
| Considered | Has reasoning capability AND context ≥ 200K |
| Balanced | Everything else |
Auto Speed Mode
When speed is auto, the task profiler adjusts:
- High reasoning → balanced + considered
- Otherwise → fast + balanced
Task Profile Scoring
The task profiler infers phase, modality, and reasoning intensity. This influences scoring:
| Task Phase | Scoring Bonus |
|---|---|
planning |
+0.9 for reasoning models |
execution with code modality |
+0.7 for code models |
synthesis |
+0.9 for reasoning models |
Preferred capabilities from the profile add:
- +1.0 for
reasoningmatch - +0.6 for other capability matches
Subscription Quota Management
For subscription providers (e.g. GitHub Copilot):
Premium Request Multiplier
Some models consume multiple quota units per request. For example, Claude 4 Opus via Copilot costs 3× per request.
effectiveCost = costPerRequestUnit × premiumRequestMultiplier
Conservation Threshold
When remaining quota drops below 30% of total:
- The router interpolates effective cost from subscription rate toward API rate
- This naturally biases selection toward cheaper models as quota depletes
- At 0% remaining, subscription models are treated as pay-per-token
Quota Exhaustion
When remainingRequests ≤ 0:
- The provider is treated exactly like pay-per-token
- Models are scored at their listed API prices
- No subscription bonus applies
Parallel Slot Selection
When the task scheduler needs multiple models running in parallel (e.g. during /project):
selectModelsForParallel(slots, constraints)is called- First slot filled with the best subscription/free model
- Remaining slots filled with pay-per-token candidates
- As
parallelSlotsincreases, subscription advantage is dampened to allow overflow
The damping formula blends subscription cost toward listed API cost:
slotBlend = min(1, (parallelSlots - 1) / 3)
effectiveCost = subscriptionCost + (listedCost - subscriptionCost) × slotBlend
Cost Estimation
The CostTracker records after each request:
- Input tokens and output tokens
- Model pricing
- Running session total in USD
Use /cost or AtlasMind: Show Cost Summary to view the breakdown.
Agents can set costLimitUsd to cap per-task spending. If the limit is reached, the task is terminated with a cost-exceeded message.
Configuration
| Setting | Default | Description |
|---|---|---|
atlasmind.budgetMode |
balanced |
Budget preference: cheap, balanced, expensive, auto |
atlasmind.speedMode |
balanced |
Speed preference: fast, balanced, considered, auto |
These can also be adjusted via the Configuration settings panel.