Model Routing - JoelBondoux/AtlasMind GitHub Wiki

Model Routing

The model router selects the best LLM for each request based on budget preference, speed preference, task profile, provider health, and the runtime-refreshed provider model catalog.

Supported Providers

Provider	ID	Pricing Model	Catalog source	Notes
Anthropic	`anthropic`	Pay-per-token	Runtime discovery via adapter `discoverModels()` / `listModels()`	One seed model is registered before refresh completes
OpenAI	`openai`	Pay-per-token	Runtime discovery via `/models` on the OpenAI-compatible adapter	One seed model is registered before refresh completes
GitHub Copilot	`copilot`	Subscription	Runtime discovery from the VS Code Language Model API	Starts with `copilot/default`, then refreshes to live Copilot-visible models
Google	`google`	Pay-per-token	Runtime discovery via the Gemini OpenAI-compatible `/models` endpoint	One seed model is registered before refresh completes
Mistral	`mistral`	Pay-per-token	Runtime discovery via `/models` on the OpenAI-compatible adapter	One seed model is registered before refresh completes
DeepSeek	`deepseek`	Pay-per-token	Runtime discovery via `/models` on the OpenAI-compatible adapter	One seed model is registered before refresh completes
z.ai	`zai`	Pay-per-token	Runtime discovery via `/models` on the OpenAI-compatible adapter	One seed model is registered before refresh completes
Local	`local`	Free	Static local fallback adapter	Currently only `local/echo-1`

The short model names you may see initially are seed entries, not AtlasMind's intended final provider catalog. On activation, and whenever the user clicks Refresh Model Metadata, Atlas scans providers for their live model list and merges that runtime discovery into the router.

Catalog Refresh And Seed Models

AtlasMind uses a two-stage catalog strategy:

registerDefaultProviders() seeds one minimal model per provider so routing works immediately.
refreshProviderModelsCatalog() runs on startup and on manual refresh.
Providers with discoverModels() contribute rich runtime metadata directly.
Providers with only listModels() contribute IDs, which Atlas enriches using the well-known catalog and heuristics.
If refresh fails, the existing seeded/static provider catalog remains in place.

This means the provider table should be read as dynamic discovery capability, not a hardcoded model inventory.

Metadata Enrichment

Discovered model IDs are normalized and resolved through this precedence chain:

Runtime hint from discoverModels()
Well-known entry from src/providers/modelCatalog.ts
Name-based heuristic fallback in inferModelMetadata()

The well-known catalog improves pricing, capability, context-window, and premium-request metadata for models that were discovered dynamically. It does not replace runtime discovery.

Adding API Keys

Open Command Palette → AtlasMind: Manage Model Providers
Click Set Key for the provider
Keys are stored in VS Code's SecretStorage — never in settings or source

Provider Health

The router tracks per-provider health status
Unhealthy providers receive a health penalty (score multiplier × 0) and are deprioritised
Health updates via setProviderHealth() — typically after request failures

Selection Algorithm

1. Candidate Filtering

Models pass through three gates:

Gate	Rule
Enabled	Provider and model must both be enabled
Health	Provider must be marked healthy
Whitelist	If agent has `allowedModels`, model must be in the list
Capabilities	Model must support all `requiredCapabilities` from the task profile
Budget gate	Model's budget tier must be in the allowed set for the configured budget mode
Speed gate	Model's speed tier must be in the allowed set for the configured speed mode

2. Scoring

Each candidate is scored using:

score = (cheapness × budgetWeight) + (speedProxy × speedWeight)
      + (qualityProxy × qualityWeight) + taskFit + healthBonus

Factor	How it's computed
Cheapness	`1 / max(0.0001, effectiveCost)` — lower cost → higher score
Speed proxy	fast = 1.5, balanced = 1.0, considered = 0.6
Quality	reasoning = 1.5, code = 1.2, other = 1.0
Task fit	Bonus for matching preferred capabilities and task phase
Health bonus	+1.25 for healthy providers, 0 for unhealthy

3. Weighting

Weights are controlled by budget and speed mode:

Budget Mode	Budget Weight
`cheap`	3.0
`balanced`	1.5
`expensive`	0.5
`auto`	1.5

Speed Mode	Speed Weight
`fast`	3.0
`balanced`	1.5
`considered`	0.75
`auto`	1.5

Budget Modes

Mode	Allowed Model Tiers	Best For
cheap	cheap only	Bulk operations, simple queries
balanced	cheap + balanced	General development (default)
expensive	cheap + balanced + expensive	Architecture, complex reasoning
auto	Adapts per task profile	Let the profiler decide

Budget tier classification (by total price per 1K tokens):

Tier	Price Range
Cheap	≤ $0.0015 / 1K
Balanced	≤ $0.008 / 1K
Expensive	> $0.008 / 1K

Auto Budget Mode

When budget is auto, the task profiler adjusts:

High reasoning → balanced + expensive
Medium reasoning → cheap + balanced
Low reasoning → cheap + balanced

Speed Modes

Mode	Allowed Model Tiers	Best For
fast	fast only	Quick edits, simple lookups
balanced	fast + balanced	General development (default)
considered	balanced + considered	Planning, architecture, deep analysis
auto	Adapts per task profile	Let the profiler decide

Speed tier classification:

Tier	Criteria
Fast	No reasoning capability AND context ≤ 128K
Considered	Has reasoning capability AND context ≥ 200K
Balanced	Everything else

Auto Speed Mode

When speed is auto, the task profiler adjusts:

High reasoning → balanced + considered
Otherwise → fast + balanced

Task Profile Scoring

The task profiler infers phase, modality, and reasoning intensity. This influences scoring:

Task Phase	Scoring Bonus
`planning`	+0.9 for reasoning models
`execution` with code modality	+0.7 for code models
`synthesis`	+0.9 for reasoning models

Preferred capabilities from the profile add:

+1.0 for reasoning match
+0.6 for other capability matches

Subscription Quota Management

For subscription providers (e.g. GitHub Copilot):

Premium Request Multiplier

Some models consume multiple quota units per request. For example, Claude 4 Opus via Copilot costs 3× per request.

effectiveCost = costPerRequestUnit × premiumRequestMultiplier

Conservation Threshold

When remaining quota drops below 30% of total:

The router interpolates effective cost from subscription rate toward API rate
This naturally biases selection toward cheaper models as quota depletes
At 0% remaining, subscription models are treated as pay-per-token

Quota Exhaustion

When remainingRequests ≤ 0:

The provider is treated exactly like pay-per-token
Models are scored at their listed API prices
No subscription bonus applies

Parallel Slot Selection

When the task scheduler needs multiple models running in parallel (e.g. during /project):

selectModelsForParallel(slots, constraints) is called
First slot filled with the best subscription/free model
Remaining slots filled with pay-per-token candidates
As parallelSlots increases, subscription advantage is dampened to allow overflow

The damping formula blends subscription cost toward listed API cost:

slotBlend = min(1, (parallelSlots - 1) / 3)
effectiveCost = subscriptionCost + (listedCost - subscriptionCost) × slotBlend

Cost Estimation

The CostTracker records after each request:

Input tokens and output tokens
Model pricing
Running session total in USD

Use /cost or AtlasMind: Show Cost Summary to view the breakdown.

Agents can set costLimitUsd to cap per-task spending. If the limit is reached, the task is terminated with a cost-exceeded message.

Configuration

Setting	Default	Description
`atlasmind.budgetMode`	`balanced`	Budget preference: cheap, balanced, expensive, auto
`atlasmind.speedMode`	`balanced`	Speed preference: fast, balanced, considered, auto

These can also be adjusted via the Configuration settings panel.