Architecture Decisions - Z-M-Huang/openhive GitHub Wiki

Architecture Decisions

This is a living document. New ADRs are added as the system evolves. Fully superseded ADRs from pre-4.6.3 have been removed. Active ADRs are append-only. Partially superseded ADRs use a **Superseded:** annotation after Status.

ADR-1: Single Container over Docker-per-Team

Status: Accepted

Context: Early designs considered spinning up a separate Docker container per team to achieve process-level isolation. This would mirror how some multi-agent systems partition workloads by process boundary.

Decision: Run the entire system in a single Docker container. The AI SDK session engine provides per-session isolation via scoped cwd, tool guards, and MCP server configuration. No OS-level container boundary is needed between teams.

Consequences:

Simpler deployment: one image, one container, one restart policy.
Lower resource overhead: no per-team container lifecycle management.
Isolation is cooperative (tool-guard-level), not enforced by the OS kernel. This is an explicit trust-model trade-off — see ADR-12.
A bug that escapes tool-guard isolation has no container boundary to stop it. Mitigated by defense-in-depth guards and tool allowlists.

ADR-3: Rules-First over Code-First Architecture

Status: Accepted

Context: v2 encoded agent behavior in ~4,000–5,000 lines of TypeScript. Behavior changes required code changes, test runs, and deployments.

Decision: Agent behavior is defined in markdown rule files. System prompts are assembled from rule files at session spawn time. Code handles only structural concerns: session lifecycle, routing, enforcement, and persistence. Rules define what agents do, how they behave, and what they prioritize.

Consequences:

Behavior changes are file edits, not deployments. Non-engineers can modify agent behavior.
Rules can conflict. Conflict detection must happen at load time, not runtime (see rule validation requirement).
The rule cascade (org-rules/ → team-rules/) is a code-enforced contract. Agents cannot unilaterally override it.
The codebase shrinks dramatically. The rule corpus grows over time.

ADR-4: Tool Wrapper Pattern and Governance Guards for Invariant Enforcement (Defense in Depth)

Status: Accepted (updated — SDK hooks replaced by inline guards + withAudit() wrappers)

Context: The system must prevent agents from reading files outside their workspace, logging secrets, or taking actions outside their tool allowlist.

Decision: Enforce invariants at two layers: (1) inline guards in each tool's execute() function (assertInsideBoundary(), assertGovernanceAllowed(), assertBashSafe() from tool-guards.ts) that block prohibited operations before execution, and (2) withAudit() wrappers (tool-audit.ts) that log all tool calls with credential scrubbing. No single layer is trusted alone.

Consequences:

Guard functions are called directly inside each tool — no separate interception layer that can be bypassed.
~350 lines of guard + audit code (tool-guards.ts: 229 lines, tool-audit.ts: 117 lines).
Audit log becomes the authoritative record. Anomalies in audit output are detectable even if a guard is bypassed.
Never rely on a single enforcement point. This is a hard design rule.

ADR-5: config.yaml for Team Manifest with Inline API Keys via Provider Profiles

Status: Accepted (updated — .env files replaced by provider-resolver)

Superseded: Credentials section superseded by Architecture-Decisions#ADR-32.

Context: Team configuration needs to specify scope, allowed tools, MCP servers, provider profile, and credentials. Storing secrets inline in config.yaml would risk accidental exposure in version control or log output. An earlier iteration stored actual secret values in per-team /data/secrets/{team}.env files; this was replaced in the three-tier data model (see ADR-17).

Decision: config.yaml is the team manifest. It contains tool allowlists, MCP server references, and a provider_profile name. Scope keywords have moved to the SQLite scope_keywords table (see ADR-21). Actual API keys are stored inline in the admin-managed provider config (/data/config/providers.yaml) and resolved by the provider-resolver at spawn time into SecretString[] values. There are no per-team .env files. The Org MCP injects resolved credentials as session env, wrapped in SecretString so they are scrubbed from all log output.

Consequences:

config.yaml is safe to version-control and inspect. No secrets in it.
API keys are centrally managed by admins in /data/config/providers.yaml (config volume), not scattered in per-team files.
The Org MCP and provider-resolver own resolution — teams never see raw secret values directly.
Credential scrubbing targets the SecretString[] set built at spawn time; no pattern registry needed beyond that set.
Removing per-team .env files simplifies the filesystem layout: .run/ is pure runtime state, /data/ is admin config.

ADR-6: Deny-by-Default Tool Allowlists per Team

Status: Accepted

Context: An earlier design gave all teams access to all SDK tools and restricted behavior through rules alone. GPT-5.4 review flagged this as insufficient: a rule violation could grant a team access to tools it should never have.

Decision: Each team's config.yaml specifies an explicit allowed_tools list. Any tool not on the list is blocked by the Org MCP and tool-guard layer. The default is deny. Teams must explicitly opt in to each tool (e.g., Bash, Write, MCP tool patterns like mcp__loggly-mcp__*).

Consequences:

Tool access is code-enforced, not rule-enforced. Rules alone cannot grant tool access.
New MCP server tools must be explicitly added to allowlists before a team can use them.
Reduces blast radius of compromised or misbehaving agents significantly.
Config management overhead: allowlists must be updated when teams legitimately need new tools.

ADR-7: Trigger Engine over SDK Scheduling (SDK Has No Built-in Cron)

Status: Accepted

Context: Several team workflows require scheduled execution (e.g., nightly health checks, periodic log reviews). The AI SDK provides no built-in scheduling or cron capability.

Decision: Implement a Trigger Engine (~250 lines) in the bootstrap layer. It supports three trigger types via a formal TypeScript interface: schedule (cron), message, and keyword. Trigger configs are stored in the SQLite trigger_configs table and managed via 6 MCP tools (create_trigger, enable_trigger, disable_trigger, list_triggers, test_trigger, update_trigger). At startup, loadFromStore() loads all active triggers from SQLite. When a trigger fires, it calls delegateTask(team, task) to enqueue in the task queue. SQLite-backed deduplication and per-source rate limiting prevent duplicate task injection. A circuit breaker automatically disables triggers after consecutive failures.

Consequences:

Scheduling is a first-class system feature, not an afterthought.
The formal Trigger interface makes adding new trigger types straightforward.
Deduplication state persists across restarts (SQLite). Trigger events are idempotent.
The Trigger Engine is custom code that must be tested. Failure modes: missed triggers (cron drift), duplicate tasks (dedup failure), runaway schedule execution.

ADR-8: Credential Scrubbing in Logs as Hard Requirement

Status: Accepted

Context: Structured logs record every tool call, session spawn, and audit wrapper execution. Secret values (API keys, tokens, passwords) could appear in log output if passed as arguments or returned in tool results.

Decision: Credential scrubbing is a non-negotiable hard requirement. All secrets are wrapped in a SecretString class whose toString() method returns [REDACTED]. The withAudit() wrapper (tool-audit.ts) and credential-scrubber.ts intercept log output and scrub known secret patterns before any transport (stdout, file, external). Credential extraction is centralized via extractStringCredentials() in domain/credential-utils.ts. Raw secret values must never appear in any log.

Consequences:

Audit logs are safe to share and ship to external log aggregators.
SecretString must be used everywhere secrets are handled. Bypassing it (e.g., string interpolation) creates a gap. Code review must enforce this.
Scrubbing adds minor processing overhead per log line. Acceptable at expected log volumes.
If a secret pattern is not registered, it will not be scrubbed. Secret registration must be part of the secret injection process.

ADR-9: Priority Admission-Order Task Queue (No Mid-Task Preemption)

Status: Accepted Superseded (partial): The "one session per team" invariant is relaxed by #ADR-41 Daily-ops vs Org-ops Concurrency for daily-ops operations, and the trigger-type set is extended by #ADR-42 window Trigger Type for Continuous Watch. Priority admission order and no-mid-task preemption remain in force for org-ops and for the structural path.

Context: Each team runs one session. Multiple callers may delegate tasks to the same team concurrently. A naive FIFO queue ignores urgency. Preemption (interrupting a running task for a higher-priority one) was considered but creates complex rollback and consistency problems.

Decision: Each team has a priority admission-order queue managed by the Org MCP in SQLite. Tasks are ordered by priority level, then FIFO within the same priority. Once a task starts executing, it runs to completion — no preemption. Only pending tasks can be reordered. Queue depth and current task are visible via get_status.

Carve-out (ADR-34): The trigger engine may abort a trigger's own stale session when the overlap policy requires replacement. This is not cross-task preemption — it is the engine reclaiming a stuck resource of the same trigger. The no-preemption guarantee for cross-task priority ordering is unaffected. See Triggers#Cancellation Mechanism.

Consequences:

High-priority tasks jump the queue, but never interrupt a running task. Callers must account for this.
No rollback logic needed. Task state machine is simple: pending → running → done/failed/cancelled (cancelled added by ADR-34 for overlap policy).
A long-running low-priority task blocks all queued tasks until it finishes. This is a known trade-off.
Queue state persists in SQLite. On restart, pending tasks are re-queued.

ADR-10: Disposable Sessions with Durable Workflow State in SQLite

Status: Accepted

Context: AI SDK streamText() sessions are ephemeral by design. Depending on session continuity for workflow state creates fragility: a crash loses all in-flight state.

Decision: Sessions are explicitly treated as disposable. No workflow state lives only in a session. All durable state — org tree (parent/child relationships), task queues, team status, trigger dedup state, escalation correlation IDs, memory entries — is persisted to SQLite. On restart, the org tree is loaded from SQLite, sessions are re-spawned fresh from config.yaml, and pending tasks are re-queued from durable state. Memory (context continuity) is in SQLite per-team, not in sessions.

Consequences:

Restarts are clean and deterministic. No partial session state to reconcile.
Re-spawning sessions on restart means agents re-read their rules and memory fresh. This is a feature, not a bug.
SQLite is a single point of failure. Nightly automated backups are required (see pre-mortem requirement). Key stored separately from data.
Session IDs are never persisted. If a session dies mid-task, the task is marked failed and re-queued.

ADR-11: Central Approved Provider Profiles (Not Per-Team Raw Config)

Status: Accepted

Context: Teams need to select an AI provider and model. Allowing teams to specify raw API URLs and keys in config.yaml creates security and governance risk: a team could point at an unapproved model or leak keys.

Decision: Provider configuration is centrally managed in /data/config/providers.yaml (admin-managed, not team-managed). This file defines named profiles (e.g., default-sonnet, default-haiku, oauth) with provider URL, API key reference, and model. Teams select a profile by name in config.yaml: provider_profile: default-sonnet. Teams cannot specify raw URLs or inline keys. The Org MCP resolves the profile and sets the appropriate session environment variables.

Consequences:

Admins control which models and providers are approved. Teams cannot circumvent this.
Switching all teams to a new model version is a one-line change in providers.yaml.
Teams have no visibility into raw API keys. Keys are resolved by the Org MCP at spawn time.
Adds an admin-managed file that must be kept in sync with available provider credentials.

ADR-12: Cooperative Isolation Model (SDK-Level, Not OS-Level)

Status: Accepted

Context: Isolation between teams could be enforced at the OS level (separate containers, separate processes, mandatory access controls) or at the SDK/application level (scoped cwd, env, hook-enforced boundaries, deny-by-default tool allowlists).

Decision: v3 uses cooperative isolation at the SDK level. Each team session gets a scoped cwd, scoped env (only its own resolved secrets), scoped MCP servers (only tools it is allowed), and guard-enforced file path boundaries. This is an explicit trust-model choice: teams are expected to operate within their boundaries, and the tool guards + audit wrappers enforce the contract.

Consequences:

A sufficiently motivated or buggy agent that finds an SDK escape path has no OS boundary to stop it. This risk is accepted for v3. The mitigation is defense-in-depth (tool guards + tool allowlists + audit wrappers) and the assumption that the Claude model is cooperative.
Single-container deployment is simple and cheap to operate.
If the cooperative model proves insufficient in practice, OS-level isolation can be added in a future version without changing the application architecture.
This decision must be documented clearly in operational runbooks so operators understand the trust assumptions.

ADR-13: Uniform Recursive Team Design (All Nodes Identical, Rules Define Scope)

Status: Accepted

Partially superseded by ADR-40: The main agent is now an exception to "all nodes identical" — it routes and delegates only, has no subagents, no skills, and no learning/reflection triggers.

Context: Some hierarchical agent systems use distinct node types: a special "orchestrator" type at the top, "worker" types at the leaves, and "manager" types in between. This requires different code paths for each type.

Decision: All team nodes are structurally identical. Every team is a query() session with a config.yaml, a rule set, a cwd (the team directory), an optional task queue, and optional child teams. What a node does (orchestrate, execute, monitor) is defined entirely by its rules, not by its type. The Org MCP applies the same spawn, delegate, escalate, and shutdown logic to every node regardless of depth.

Consequences:

No special-casing in code for orchestrator vs. worker vs. manager nodes.
The hierarchy can grow to arbitrary depth without code changes.
A team's behavior can be changed entirely by swapping its rules. The structure is inert.
Risk: rules must be carefully authored to prevent a leaf node from trying to act as an orchestrator. Rule validation at load time mitigates this.

ADR-14: Skills Separate from Agent Identity (Modular HOW vs. WHO)

Status: Accepted

Superseded by ADR-40: "The orchestrator can also invoke skills directly" is no longer valid. Orchestrators always delegate to subagents (ADR-40).

Context: Initial designs bundled agent personality, scope, and procedural knowledge into a single file. This made it hard to reuse procedures across agents or evolve HOW something is done without touching agent identity.

Decision: Agent definitions (subagents/*.md) define WHO: personality, perspective, and a list of skill references. Skills (skills/*.md) define HOW: step-by-step procedures for specific tasks. They are stored and managed separately. An agent references skills; it does not contain them. Multiple agents can reference the same skill. Skills can be evolved independently.

Consequences:

HOW-level changes (procedure updates) do not touch agent identity files.
New agents are assembled from existing skills with minimal new authoring.
The skill library grows as a reusable corpus across the org.
Skills must be written to be agent-agnostic (no assumptions about caller identity).
Skill loading at spawn time adds to system prompt size. Large skill sets may need lazy loading via Document MCP.

ADR-15: Rule Cascade with org-rules/ and team-rules/ Directories

Status: Accepted

Context: Some rules should apply to an entire subtree of the org (e.g., "all teams under Engineering follow this coding standard"). Others should apply only to the specific team. A flat rule file per team cannot express this distinction.

Decision: Each team has two rule directories: org-rules/ (cascades to the current team and all its descendants) and team-rules/ (applies only to the current team). Both live under .run/teams/{name}/ (runtime workspace volume). Admin org-rules live in /data/rules/ (config volume) and apply to all agents. System rules are baked into the image. The Org MCP builds the systemPrompt at spawn by concatenating: system rules (baked in) → admin org-rules (/data/rules/) → ancestor org-rules/ (root to parent, in .run/teams/) → team's own org-rules/ → team's team-rules/. All .md files in each directory are included.

Consequences:

Org-wide policies are defined once and automatically inherited. No duplication.
A team can override or supplement inherited behavior via team-rules/ without affecting siblings.
Rule conflicts are possible (ancestor rule vs. team rule). Load-time conflict detection and explicit precedence markers are required.
System prompts grow with tree depth. Deep hierarchies may hit context length limits. Mitigated by keeping rule files concise and using Document MCP for large reference material.

ADR-16: On-Demand Team Spawning (Spawn When Needed, Shutdown on Idle)

Status: Accepted

Context: Teams could be spawned at system startup and kept alive indefinitely, or spawned on first use and shut down when idle. Persistent sessions consume resources and accumulate context that may drift from desired behavior.

Decision: Teams are spawned on demand — when the first delegate_task call targets them and no session is already running. A configurable idle timeout triggers automatic shutdown via shutdown_team. On restart or re-activation, a fresh session is spawned from config.yaml and memory. Sessions are intentionally disposable (see ADR-10). This is the "spawn when needed, shutdown on idle" model.

Consequences:

System resources scale with actual work being done, not with the number of configured teams.
Each fresh spawn re-reads rules and memory, ensuring agents start with current configuration.
First-task latency includes session spawn time. For latency-sensitive use cases, a warm-up spawn can be triggered explicitly.
Idle timeout must be tuned per deployment. Too short: excessive spawn churn. Too long: stale sessions consuming resources.
Session warmup state (tool caches, MCP handshakes) is lost on shutdown. This is accepted.

ADR-17: Three-Tier Data Model (Image + Config Volume + Runtime Workspace)

Status: Accepted

Context: Early versions stored team configurations, rules, and runtime state in a mixed layout under /data/. This made it hard to distinguish what was immutable system behavior, what was admin-managed configuration, and what was runtime state that teams generate during operation. Teams were treated as pre-configured entities rather than runtime entities spawned on demand.

Decision: The system uses a strict three-tier data model:

System rules (baked into image): Invariants compiled into the container image at build time. Workspace boundary enforcement, secret handling, communication path rules. Cannot be changed without a new image build. Invisible to agents — enforced entirely in code.
Config volume (/data/): Admin-managed configuration mounted at container start. Includes provider profiles (/data/config/providers.yaml), admin org-rules (/data/rules/), trigger definitions, and logging config. Admins edit these files; agents cannot modify them. Survives container restarts; updated by redeploying config.
Runtime workspace volume (.run/): All runtime state generated during operation. Team directories (.run/teams/{name}/), SQLite database (.run/openhive.db), and backups (.run/backups/). Teams are runtime entities: they do not exist in config until spawned. Team config, rules, memory, and credentials live here.

Motivation: Teams are runtime entities, not pre-configured fixtures. Separating immutable system behavior (image) from admin configuration (/data/) from runtime state (.run/) makes each tier independently replaceable: upgrade the image without touching config or state; update admin rules without restarting; wipe .run/ to reset all runtime state without affecting system behavior or admin config.

Consequences:

The three tiers have different operational lifecycles: image = redeploy, /data/ = remount/edit, .run/ = runtime mutations.
Admins control the config volume; teams control their .run/teams/{name}/ subtree within the guard-enforced boundary.
There are no per-team .env files. API keys live in /data/config/providers.yaml (admin-managed), not in .run/.
A clean recovery is possible by restoring .run/openhive.db from .run/backups/ without touching image or config.
Deep hierarchies store all rule files in .run/; this volume must be sized appropriately for the org tree depth.

ADR-18: No Environment Variables for Configuration

Status: Accepted

Context: Early iterations used environment variables (OPENHIVE_DATA_DIR, OPENHIVE_RUN_DIR, OPENHIVE_SYSTEM_RULES_DIR, OPENHIVE_LISTEN_ADDRESS, OPENHIVE_LOG_LEVEL, OPENHIVE_ENABLE_API_MESSAGE) to configure the system. This created configuration sprawl across docker-compose.yml, Dockerfile, and code.

Decision: All configuration lives in files under /data/config/. No environment variables for system behavior. Paths are hardcoded to the Docker image layout (/data, /app/.run, /app/system-rules). The only exception is OPENHIVE_LISTEN_PORT which defaults to 8080.

Configuration files:

/data/config/config.yaml — System settings (log level)
/data/config/providers.yaml — API keys, model selection, provider profiles
/data/config/channels.yaml — Channel adapters (Discord, WebSocket)
.run/teams/{name}/config.yaml — Per-team config (generated at runtime)

Consequences:

Single source of truth: all config is in /data/config/ (mounted as a volume).
No env var sprawl in docker-compose.yml.
Narrow exception: TZ environment variable. Standard Linux TZ (e.g., America/New_York) is permitted for cron schedule evaluation. Default: America/New_York. Cron expressions in trigger_configs are evaluated in the configured TZ. All log timestamps remain UTC regardless of TZ setting. This does not reopen general app configuration through environment variables. See Triggers#Timezone Handling.
BootstrapDeps still accepts path overrides for testing (deps?.dataDir, etc.) but production uses hardcoded defaults.
Log level is configured in config.yaml as log_level: info|debug|warn|error|trace.

ADR-19: Full Tool Permission Inside Container

Status: Accepted

Context: Early designs used deny-by-default tool allowlists (allowed_tools: [Read, Write, Edit, ...]) and a tool-safety.md rule that told agents "Bash is denied by default." This made sense for a multi-tenant system but is unnecessary when the entire system runs inside a single Docker container that IS the security boundary.

Decision: Agents inside the container get full tool access by default. allowed_tools: ['*'] is a wildcard that means "allow all tools." The activeTools resolver recognizes '*' as "include all tools" without filtering. The tool-safety.md seed rule was removed.

Consequences:

Agents can use Bash, Read, Write, Edit, Glob, Grep, and all MCP tools without restriction.
The Docker container boundary IS the security boundary, not the tool allowlist.
Per-team tool restrictions are still possible by setting allowed_tools to a specific list in the team's config.yaml. This is opt-in, not the default.
Governance guards (workspace boundary, cross-team write blocking) still apply — they prevent agents from writing to other teams' directories regardless of tool permissions.

ADR-21: LLM-Driven Routing with list_teams (Supersedes ADR-20)

Status: Accepted

Context: ADR-20 introduced keyword-gated scope admission: a recursive CTE computed each team's "effective scope" (own keywords UNION descendants' keywords), and delegate_task/query_team rejected tasks whose description did not match any keyword. In practice, this created several problems: (1) keyword extraction from natural-language task descriptions was brittle and produced false rejections; (2) maintaining keyword lists added operator burden without proportional safety benefit; (3) the LLM already has the context to make better routing decisions than a keyword matcher.

Decision: Remove the automated scope admission gate from delegate_task and query_team. Add a new list_teams tool that returns child teams with their descriptions, scope keywords, status, and queue depth. The parent LLM calls list_teams to inspect available teams, then makes the routing decision itself. Scope keywords remain in the scope_keywords SQLite table but serve as routing hints for the LLM and learning domain signal for autonomous learning, not as an automated gate. The effective scope recursive CTE is removed.

Consequences:

No more false rejections from keyword mismatches. The LLM routes based on full task context, team descriptions, and keywords together.
delegate_task and query_team no longer run scope validation — they only validate parent-child relationships. This simplifies the code path.
Routing quality depends on the LLM's judgment. A poorly prompted parent agent could route tasks to the wrong team. Mitigated by clear team descriptions and scope keywords as hints.
The list_teams tool adds a round-trip before delegation, but the cost is negligible compared to the task execution itself.
Scope keywords are still useful: they appear in list_teams output, help the LLM understand each team's domain without reading full configs, and define the learning domain for autonomous learning.

ADR-22: Inline AI SDK Tools Replace Organization MCP Server

Status: Accepted (Supersedes ADR-2)

Context: ADR-2 introduced a custom Organization MCP Server (~700 lines: http-server.ts 132, registry.ts 289, registry-types.ts 57, plus mcp-bridge.ts 119) served over HTTP/JSON-RPC. Both the MCP server and AI SDK sessions run in the same Node.js process. HTTP transport serializes/deserializes tool calls that never leave the process boundary. Built-in tools (Read, Write, Edit, Glob, Grep, Bash) already use the inline tool() pattern — the Org MCP was the sole outlier.

Decision: Replace the Organization MCP Server with inline AI SDK tool() definitions. Org tools become org-tools.ts, trigger tools become trigger-tools.ts, browser tools become browser-tools.ts. A new web_fetch tool is added (web-fetch-tool.ts). Guard logic extracted to guards.ts. MCP protocol retained only for @playwright/mcp (child process over stdio) and future external MCP servers. The handlers/ transport layer (http-server.ts, registry.ts, registry-types.ts) and sessions/mcp-bridge.ts are removed. Tool handler files (handlers/tools/*.ts) are preserved unchanged.

Consequences:

~597 lines of HTTP/JSON-RPC transport code removed. Tool calls are direct function invocations.
All tools share the same inline guard and audit patterns.
Tool names change from mcp__org__spawn_team to spawn_team (config migration required for allowed_tools).
mcp_servers: ['org'] becomes a no-op.
ensureOrgMcp() removed.

ADR-23: Prompt Cache Boundary for Rule Cascade

Status: Accepted

Context: The rule cascade produces a system prompt that is largely static for a given deployment. Only the trailing portion changes between turns. AI providers (Anthropic) support prompt caching that avoids re-processing a static prefix, reducing latency and cost.

Decision: The prompt builder returns {staticPrefix, dynamicSuffix} instead of a flat string. Static prefix: system rules + admin org-rules + tool usage guide + HTTP rules. Dynamic suffix: core instructions (contains cwd), tool availability, credentials, ancestor org-rules, team rules, skills, memory, conversation history. The AI engine passes cache control hints to the provider. Tool definitions sorted per-partition before merging for stable checksums. Inspired by Claude Code's SYSTEM_PROMPT_DYNAMIC_BOUNDARY.

Consequences:

Prompt cache hit rates increase for teams with stable rule sets. Cost/latency reduction scales with prompt size.
Rule file edits invalidate cache for affected teams.
The boundary marker is provider-agnostic; AI engine translates to provider-specific cache control.
Requires interface changes in prompt-builder.ts, message-handler.ts, and ai-engine.ts.

ADR-24: Typed Task Queue

Status: Accepted

Context: The task queue treats all tasks as undifferentiated work items. In practice, tasks have distinct origins and lifecycle needs. The isInternalTask() function parsed options JSON for {internal: true}. The correlationId prefix 'trigger:' was used for type detection. These are ad-hoc hacks that don't scale.

Decision: Add a type field to the task queue with four values: delegate, trigger, escalation, bootstrap. Each type carries type-specific defaults for notification, redaction, and circuit breaker behavior. The isInternalTask() hack is removed. query_team remains synchronous (no task type needed). Migration backfills existing bootstrap and trigger rows using options JSON and correlationId patterns, with dual-read fallback during transition.

Consequences:

Task processing becomes a type switch instead of scattered boolean checks.
New types can be added by extending the union.
Migration is backward-compatible (DEFAULT 'delegate').
Existing bootstrap/trigger rows require backfill SQL.
correlationId is NOT removed — still used by circuit breaker for trigger naming.

ADR-25: Structured Task Workflow (Discover, Plan, Act, Verify, Done)

Status: Accepted

Context: Agents receiving tasks tend to jump directly to implementation without adequate discovery or verification. Structured workflows improve completion quality, but code-orchestrated pipelines add complexity and rigidity.

Decision: Implement as a system rule only (task-workflow.md), not a code-orchestrated pipeline. Five phases: Discover, Plan, Act, Verify, Done. The rule instructs agents to be especially thorough with discover/plan for structural changes (team spawn/shutdown, trigger create/update) — presenting the plan to the user and waiting for confirmation. No code-level phase enforcement. Rules-first approach (ADR-3). Inspired by Claude Code's Ralph Loop plugin.

Consequences:

Zero code added. Compliance depends on model instruction-following.
The workflow can be customized per team via team-rules overrides.
No phase state stored in SQLite.
If compliance proves insufficient, code enforcement can be added later.

ADR-26: Online Skill Repository via Vercel Skills Ecosystem

Status: Accepted

Context: Teams bootstrap skills from scratch with no cross-team or cross-deployment reuse. Common patterns are reimplemented repeatedly. Meanwhile, the Vercel Skills ecosystem (skills.sh) has 269+ reusable skills with 91K+ installs across 40+ coding agents, including contributions from Vercel Labs, Anthropic, and the community.

Decision: Integrate with the Vercel skills ecosystem as OpenHive's primary skill repository. Add search_skill_repository as an inline AI SDK tool that queries skills.sh for matching skills. Flow: understand user intent → search → if match ≥60% similarity, download the Vercel SKILL.md → tailor to OpenHive's skill format and user's specific needs → write locally → pre-assume works → test. Repository sources (skills.sh endpoint, curated GitHub repos) are pre-defined in the container, not user-configurable. Trust signals derived from ecosystem data: install counts, source reputation, GitHub stars.

Consequences:

Leverages existing 269+ skills without building a custom registry
Cross-agent compatibility: skills contributed to the ecosystem benefit all 40+ supported agents
Format conversion needed: Vercel SKILL.md (simple name+description+markdown) → OpenHive skill format (Purpose, Steps, Inputs, Outputs, Error Handling). LLM handles this during tailoring.
Online dependency with graceful degradation (fall back to create from scratch)
No custom manifest schema to maintain — we consume the existing format
Vercel skills are coding-agent-focused; many may not be directly applicable to orchestration tasks. The 60% threshold and LLM judgment filter these.

ADR-27: Topic-Based Conversation Threading

Status: Accepted

Context: Current architecture processes one message at a time per WebSocket (per-socket serialization). Second messages are rejected. Users cannot work on multiple topics concurrently or reply to specific ongoing conversations.

Decision: Introduce topic-based threading:

Classification: Server-side only. At 0 active topics, the message always starts a new topic (no LLM needed). At 1 active topic, the main agent evaluates during normal processing whether the message continues the current topic or starts a new one (no separate classification step). At 2+ active topics, a lightweight LLM call classifies the message to an existing topic or creates a new one.
Parallel sessions: Each topic gets its own main agent streamText() session. Child teams are unaware of topics — tasks queue normally.
Storage: New topics SQLite table (id, channel_id, name, description, state, created_at, last_activity). task_queue and channel_interactions gain topic_id column.
Protocol: WebSocket carries interleaved responses with topic_name as informational context. New topic_list message type.
Adapters: All adapters send flat messages. Topics are internal — no external threading features (e.g. Discord threads). Multi-topic classifier can route a single message to multiple topics.
Recovery: Topics in SQLite; restart marks all sessions as idle; rehydrated on next message from channel_interactions history.

Consequences:

Users can work on multiple topics without waiting
No latency added in common case (0-1 topics)
SQLite schema: +1 table, +2 columns
WebSocket clients must handle multiplexed responses
Child teams unaware of topics — tasks queue normally

ADR-29: SQLite-Only Memory System (Supersedes ADR-28)

Status: Accepted

Context: The two-tier memory system from ADR-28 (filesystem source of truth + SQLite search index) created unnecessary complexity. The SQLite index had to stay in sync with filesystem files, the memory_files table existed solely to track sync state, and the IMemoryStore interface wrapped simple file I/O. Filesystem storage also lacked transactional guarantees — concurrent writes used last-write-wins semantics. A multi-AI design review (11 participants, 2 rounds) reached near-unanimous consensus that SQLite should be the sole store.

Decision: Remove the filesystem memory directory (.run/teams/{name}/memory/) entirely. Make the SQLite memories table the sole source of truth. Key design elements:

6 memory types with mandatory aliases: identity, lesson, decision, context (auto-injected, budget-capped at ~50 entries), reference, historical (search only). Type is never auto-changed by the system.
Supersede mechanism: memories cannot be silently overwritten. Agent must provide supersede_reason to replace an existing entry. Old entries remain searchable for audit trail.
Search tables (memory_chunks, memory_chunks_fts, embedding_cache) are derived from memories.content via application-triggered re-indexing on INSERT/UPDATE.
Data access layer enforces WHERE team_name = ? on all queries — no raw SQL access.
init-context.md replaced by team-rules/team-context.md (permanent team context, auto-injected via rule cascade).
.bootstrapped marker replaced by org_tree.bootstrapped column.

See Memory-System for full schema, tool specifications, and search pipeline.

Consequences:

Single source of truth eliminates sync bugs between filesystem and SQLite
ACID transactions replace last-write-wins — no more race conditions on concurrent writes
Supersede audit trail provides traceable knowledge evolution that filesystem overwrites lacked
Per-team isolation shifts from path validation to data access layer enforcement — consistent with existing cooperative isolation model (ADR-12)
memory_files sync tracking table is eliminated
IMemoryStore filesystem interface is replaced by SQLite store operations
Embedding search is unchanged in capability — still requires OpenAI-compatible provider
Team shutdown must now include memories and memory_chunks cleanup

ADR-30: Inbound Sender Trust Model

Status: Accepted

Context: Currently all inbound messages are processed without sender validation. Any user who can reach a channel adapter (Discord, WebSocket) can inject messages that the main agent processes as legitimate input. This creates a prompt injection risk from untrusted sources — a malicious sender on a public Discord server could craft messages that manipulate agent behavior, exfiltrate data via tool calls, or trigger unintended delegations. There is no mechanism to distinguish trusted operators from unknown or explicitly denied senders.

Decision: Introduce a code-enforced TrustGate between ChannelRouter and TopicClassifier that evaluates every inbound message before it reaches the LLM. The trust policy is configured in channels.yaml under a trust: section with default_policy (deny/allow), per-channel policies, sender allowlists/denylists, and channel-specific overrides.

Trust evaluation follows a strict 6-step order: (1) sender denylist check, (2) sender_trust database lookup, (3) sender allowlist check, (4) channel-specific overrides, (5) channel-level policy, (6) default policy. The first match wins.

Implementation adds ~550 lines of new TypeScript. Two new SQLite tables are created: sender_trust (columns: channel_type, channel_id, sender_id, trust_level, granted_by) for persistent per-sender trust grants, and trust_audit_log (append-only, no automatic retention) for recording all trust evaluation decisions. A trust_decision column is added to the existing channel_interactions table to record the outcome for each message. Three new org tools are provided: add_trusted_sender, revoke_sender_trust, and list_trusted_senders. A new system rule file /app/system-rules/sender-trust.md documents the trust model for agents.

Consequences:

Backward compatible: if no trust: section is present in channels.yaml, all senders are allowed and a startup warning is logged advising the operator to configure trust policy
Two distinct rejection behaviors: unknown senders (no match in any allowlist or trust DB) receive a static "Not authorized" response; explicitly denied senders (matched in denylist) get silent deny (no response sent), preventing sender enumeration
All rejection responses are static templates from TrustGate code — rejected messages are never routed through the LLM, eliminating prompt injection risk from untrusted sources
WebSocket X-Sender-Id limitation: sender identity is assertion-based (the client declares its own ID via header), suitable for internal/trusted networks only — not a substitute for authentication
Failure modes: if SQLite is inaccessible, channels with deny policy fail closed (all messages rejected). Operators can use Discord with an allowlisted sender or direct DB access as fallback

ADR-31: Admin Dashboard

Status: Accepted

Context: Operators currently have no visual interface for inspecting system state. Diagnosing issues requires direct SQLite queries and log file inspection. As the system grows in complexity (org trees, task queues, triggers, memory entries, conversation threads), operators need a consolidated view of system health and runtime state without requiring database access.

Decision: Add an admin dashboard served as vanilla HTML/CSS/JS by Fastify via @fastify/static from /app/public/. The dashboard communicates with the backend through a REST API mounted under /api/v1. The API is read-only with one exception: trigger enable/disable toggling. No authentication is implemented — the dashboard is designed for local-only access, with operators expected to use a reverse proxy for remote access scenarios. The dashboard runs in the same Docker container as the rest of the system, requiring no additional infrastructure.

Consequences:

No build step: vanilla HTML/CSS/JS means no bundler, no transpiler, no node_modules for the frontend — files are served as-is from /app/public/
7 views planned for v1: health overview, org-tree visualization, task queue inspection, log viewer, memory browser, trigger management, and conversation history
Excluded from v1: rule editing, raw SQL access, team CRUD operations, and authentication — these are deferred to avoid scope creep and security surface area in the initial release
Local-only access model means no auth code to maintain, but operators exposing the dashboard remotely without a reverse proxy will have an unauthenticated admin interface — this risk is accepted with clear documentation
@fastify/static adds one production dependency; the REST API reuses existing SQLite queries from the data access layer

ADR-32: Team Vault

Status: Accepted (Supersedes credentials section of ADR-5)

Context: Teams need two classes of durable data beyond memory: secrets (API keys, tokens) and team state (operational config, learning journals). Memory (ADR-29) is for prompt-injected knowledge, not machine-consumed state. Credentials in config.yaml are a filesystem artifact inconsistent with the SQLite-only direction.

Decision: Add a team_vault SQLite table with enforced write-access separation: is_secret = 1 rows are system-managed (read-only to teams via vault_get), is_secret = 0 rows are team-writable via vault_set/vault_delete. Legacy credential accessor removed; bootstrap migration moves credentials from config.yaml to vault.

Consequences:

Single source of truth for credentials (SQLite) replaces filesystem artifact
Write-access separation preserves the invariant: teams cannot modify their own secrets
Generic team state storage enables operational data without misusing memory

See Team-Configuration#Credentials for access patterns, Team-Configuration#Data Store Decision Tree for vault vs memory vs files guidance, Organization-Tools#vault-tools.ts for tool definitions, and Scenarios#15. Credential Migration for the migration walkthrough.

ADR-33: Autonomous Learning System

Status: Accepted

Superseded: Bootstrap section superseded by Architecture-Decisions#ADR-35. Team-level learning trigger approach superseded by Architecture-Decisions#ADR-40 — learning is now subagent-level.

Context: Teams start with only bootstrap context and accumulate knowledge only through assigned tasks. Relevant external knowledge (best practices, tool updates, security advisories) goes undiscovered. Manual knowledge curation doesn't scale.

Decision: Add a system skill (/app/system-rules/skills/learning-cycle.md) implementing a 6-phase learning cycle (Journal Read → Topic Analysis → Web Discovery → Validation → Storage → Journal Update). A learning trigger learning-cycle-{subagent} is created per subagent at bootstrap (active with readiness gates per ADR-35, scoped to subagents per ADR-40). The skill composes existing tools (web_fetch, memory, vault) — no new tools or tables. Main agent has no learning triggers (no subagents).

Key architectural decisions:

Cross-domain corroboration: 3+ independent root domains → high confidence (lesson); 2 → medium; 1 → reference only. Mirror/syndicated content counts as one source.
Vault journal, not memory: Learning progression state (learning:{team}:{subagent}:journal) is operational data stored in vault, keeping memory focused on knowledge.
Parent-only trigger management: Subagents cannot create, enable, or disable their own learning triggers.
Non-notifying default: Results stored in memory; significant findings escalated via escalate().

Consequences:

Autonomous knowledge acquisition using existing infrastructure (triggers, memory, web tools, vault)
Cross-domain corroboration prevents single-source misinformation from becoming lessons
Journal loss is non-fatal (treated as first run; existing memories prevent duplicates)

See Self-Evolution#Autonomous Learning for the full 6-phase cycle, vault journal structure, duration budget, and interaction with window triggers. See Self-Evolution#Runtime Tool Bundle Check for the required tools gate. See Team-Configuration#Autonomous Learning for configuration defaults.

ADR-34: Trigger Instance Overlap Policy (Skip-Then-Replace)

Status: Accepted

Context: Under the one-session-per-team model (ADR-9), a trigger can fire while a previous task from the same trigger is still pending or running — for example, a nightly cron task that takes longer than 24 hours, or a keyword trigger that matches twice before the first task starts. Without overlap detection, the engine unconditionally enqueues a new task via delegateTask, leading to duplicate work and resource waste. Existing deduplication (event IDs with TTLs) prevents the same event from firing twice, but does not prevent a new firing while a previous instance is still in the pipeline. Always-skip would miss real work if the old instance is stuck. Always-replace would discard partial work from the old instance prematurely.

Decision: The trigger engine applies a graduated overlap policy when a trigger fires while its previous instance is still active (pending or running). The default policy is skip-then-replace:

First overlap — skip this firing, alert the user. The old instance continues.
Second consecutive overlap — cancel the old instance (mark task as cancelled), start a new instance, alert the user.

This is configurable per trigger via the overlap_policy field in trigger_configs:

Policy	Behavior
`skip-then-replace`	Default. First overlap skips + alerts. Second consecutive overlap cancels old + starts new + alerts.
`always-skip`	Every overlap is skipped. Old instance always runs to completion.
`always-replace`	Every overlap immediately cancels old + starts new.
`allow`	No overlap detection. `active_task_id` is not tracked (stays NULL). Pre-ADR-34 behavior.

Three new columns in trigger_configs:

overlap_policy (TEXT, default 'skip-then-replace') — the policy
overlap_count (INTEGER, default 0) — consecutive overlap counter
active_task_id (INTEGER, nullable) — soft reference to the current task (pending or running) in task_queue. Not used under the allow policy.

The overlap check runs after deduplication and rate limiting but before delegateTask. Non-replacement paths (skip, normal fire) execute within a single SQLite transaction. Replacement is a multi-step sequence with defined failure ordering. A skipped overlap consumes the firing event — it is not replayed later.

Cancellation mechanism. When replacement occurs, the engine: (1) marks the old task as cancelled in task_queue, (2) calls session.abort() to terminate the AI SDK session (in-memory, not transactional), (3) enqueues the replacement task. Steps 1 and 3 are SQLite transactions; step 2 is a best-effort in-memory abort. Failure ordering is defined: if abort fails, the DB state is already correct and cleanup happens on idle timeout or restart. A stale-outcome guard coerces any late done/failed from the old session to cancelled, preventing status overwrites and circuit breaker corruption. See Triggers#Cancellation Mechanism for the full sequence.

A new terminal task status cancelled is introduced (pending → running → done | failed | cancelled). This is distinct from failed because cancellation is not a failure of the trigger's logic — it must not increment the circuit breaker's consecutive_failures counter. cancelled tasks are not reset to pending on restart (they are terminal).

Relation to ADR-9 (no preemption): ADR-9 prohibits priority-based preemption of different tasks in the queue. Overlap cancellation is a narrow carve-out: the trigger engine may abort its own trigger's stale session when the overlap policy requires replacement. This is not one task preempting another — it is the engine reclaiming a stuck resource of the same trigger. ADR-9's no-preemption guarantee for cross-task priority ordering remains intact. A corresponding carve-out is noted in ADR-9.

Consequences:

Default skip-then-replace balances patience (first overlap waits) with pragmatism (second overlap replaces). Operators can tune per trigger.
cancelled task status prevents overlap policy from corrupting the circuit breaker's failure tracking.
overlap_count is reset on trigger disable/re-enable (no stale skip count carries over). active_task_id is not cleared on disable — a running task completes normally, and the stale reference check handles it at next fire. Both are reset on restart (consistent with ADR-10: disposable sessions).
The allow policy preserves pre-ADR-34 behavior. Under allow, active_task_id is never set, so no single-slot tracking limitation applies. Switching from allow to a tracking policy takes effect on the next firing — no migration of existing state is needed since there is none.
Stale active_task_id references (task in terminal state but reference not yet cleared) are handled defensively — the engine verifies the task is still active (pending or running) before applying the overlap policy. Only terminal states (done, failed, cancelled) are treated as stale.
test_trigger does not participate in overlap tracking — it neither sets active_task_id nor triggers overlap detection.

ADR-35: Bootstrap Learning Triggers as Active

Status: Accepted

Context: ADR-33 bootstraps learning triggers as disabled, requiring the parent to call enable_trigger before learning can begin. In practice, every parent enables the trigger immediately after spawn, making the disabled-by-default state a ceremony step that delays learning without adding safety. Three of six production teams never had learning enabled because no parent remembered to call enable_trigger.

Decision: Bootstrap creates the learning trigger with state='active' and the existing daily cron (0 2 * * * + per-team jitter). Three readiness gates are evaluated at runtime when the trigger fires:

bootstrapped=1 — team has completed its bootstrap task
scope_keywords present — at least one scope keyword exists for topic derivation
All 6 required tools present — web_fetch, vault_set, vault_get, memory_save, memory_search, memory_list

If any gate fails, the skill logs a warning naming the failing gate(s) and skips execution. The trigger remains active — the next scheduled firing re-evaluates gates. The parent retains full authority to disable or update the trigger via disable_trigger / update_trigger.

Supersedes ADR-33 bootstrap section. ADR-33's "disabled by default" bootstrap behavior is replaced by "active with readiness gates." All other ADR-33 decisions (session validation, corroboration, journal, deprioritized sources) remain unchanged.

See Triggers#Learning Trigger, Self-Evolution#Autonomous Learning, Team-Configuration#Autonomous Learning.

Consequences:

Teams begin learning as soon as readiness gates pass — no parent ceremony step.
Gate-skip-and-log means a trigger created before tools are provisioned is harmless; it simply skips until ready.
Parents can still disable learning at any time — the authorization model is unchanged.
Existing trigger management tools (enable/disable/update/test) work identically.
The readiness gate check adds negligible overhead (3 SQLite reads per firing).

ADR-36: Main Team Task Processing and Dead-Letter Detection

Status: Accepted

Superseded: Dead-Letter Detection section superseded by Architecture-Decisions#ADR-38. "Root-team owns maintenance triggers" superseded by Architecture-Decisions#ADR-40 — main agent has no subagents, no learning/reflection. Main Team Bootstrap and Root-Team Escalation sections remain active.

Context: The main team runs a task consumer like any other team, but its operational role is unique: it handles user-facing interactions, delegates to children, and runs maintenance. There is no mechanism to detect tasks stuck in pending or running for abnormally long periods — a crashed child or blocked session leaves tasks silently unprocessed. The current docs describe root escalation as an error (Scenarios.md:303-306). In production, 5 escalations sat pending for 1-4 days because main never dequeued them.

Decision:

Main Team Bootstrap

Main bootstraps with bootstrapped=1 and a task consumer enabled. Maintenance operations (rule updates, org-tree changes) run in dedicated topic sessions per Conversation-Threading. The main team processes its own task queue like any child team — escalations, trigger adjustments, and governance tasks all flow through the standard workflow (see Task-Workflow).

Dead-Letter Detection

Superseded — see Architecture-Decisions#ADR-38.

Root-Team Escalation

The root team (main) has no parent. The escalation chain is: child → parent → main → user. Main escalates directly to the user when it cannot address an issue. (Superseded by ADR-40: Main agent has no subagents, no learning/reflection triggers — it routes and delegates only.) See Triggers#Reflection Trigger and Durability-Recovery#Stall Detection.

Consequences:

Alert routing reuses existing escalation chain; no new notification infrastructure.
Root-team special case avoids infinite escalation loop.

ADR-37: Self-Reflection Sessions

Status: Accepted

Superseded by ADR-40: Team-level reflection approach superseded — reflection is now subagent-level, main agent excluded.

Context: Teams improve reactively through the evolution flow (detect problem → propose fix) and proactively through learning (discover external knowledge). Neither mechanism provides structured introspection: reviewing recent task outcomes to identify systematic inefficiencies or recurring failures. In production, every fix has been user-initiated — zero autonomous self-improvement occurred.

Decision: Add a reflection-cycle skill (/app/system-rules/skills/reflection-cycle.md) that runs a 6-phase introspective cycle:

JOURNAL READ — Load reflection journal from vault (vault_get("reflection:{team}:{subagent}:journal"))
EVIDENCE GATHER — Query completed tasks via list_completed_tasks for outcome patterns (failures, duration outliers, user corrections)
DIAGNOSE — Identify the single highest-impact inefficiency or failure pattern
PROPOSE — Draft ONE skill/rule change targeting accuracy or efficiency only. No scope expansion, no new capabilities.
APPLY — Apply through the standard evolution flow (governance guards enforced). Cooldown: same skill cannot be modified in consecutive cycles.
JOURNAL UPDATE — Record diagnosis, proposal, outcome, and next focus in vault

Constraints:

ONE change per cycle. Multiple diagnoses are journaled for future cycles.
Changes target accuracy (fewer failures) or efficiency (faster completion) only — no scope expansion, no new capabilities.
Teams can only modify their own team-owned skills and plugins (governance enforced).
Max duration: 15 minutes per session.

Schedule: 1 hour after the learning trigger (3 AM base + per-team jitter). Per subagent, named reflection-cycle-{subagent} (e.g., reflection-cycle-learner). Main agent excluded (ADR-40). Same readiness gates as learning (Architecture-Decisions#ADR-35): bootstrapped=1, scope_keywords present, required tools present.

Required tools: vault_get, vault_set, memory_save, memory_search, memory_list, list_completed_tasks

See Self-Evolution#Self-Reflection, Triggers#Reflection Trigger, Organization-Tools.

Consequences:

Teams gain a structured mechanism for self-improvement based on their own task outcomes.
ONE-change-per-cycle + cooldown prevents cascading self-modifications.
Governance guards prevent scope creep — reflection cannot grant new capabilities.
Journal provides audit trail of all reflection decisions and their outcomes.
15-minute budget limits resource consumption per reflection session.

ADR-38: Task Queue Stall Detection (Engine-Level Infrastructure)

Status: Accepted

Supersedes: Dead-Letter Detection section of Architecture-Decisions#ADR-36

Context: ADR-36 introduced dead-letter detection as a scheduled trigger consuming an LLM session every 10 minutes for a SQL scan. The detection logic requires no LLM judgment — this is unnecessary overhead coupled to the main team's trigger pipeline.

Decision: Stall detection is a periodic infrastructure check (setInterval every 10 min in task-consumer.ts), not a trigger. Thresholds: pending > 1 hour → warning; pending/running > 24 hours → error + escalation. Alerts route to sourceChannelId when present, otherwise escalate() up the hierarchy. Always-on, non-configurable, no schema changes.

Consequences:

Stalled tasks surfaced within 10 minutes (unchanged behavior)
No LLM sessions consumed for stall checks
One fewer trigger on the main team
Always-on — no admin toggle to accidentally disable

See Durability-Recovery#Stall Detection for operational details and Scenarios#18. Stall Detection for the end-to-end walkthrough.

ADR-39: Plugin-First Invariant

Status: Accepted

Context: When teams create skills — whether from the Vercel skills ecosystem or from scratch — agents sometimes embed external operations (API calls, data parsing) directly in skill steps instead of registering them as plugin tools first. This produces skills that bypass the security scan, cannot be reused across skills, and violate the plugin/skill separation documented in Skills#Skill + Plugin Workflow. The root cause is that the plugin-first pattern was described as a convention, not enforced as an invariant.

Decision: Establish a hard invariant: every external operation must be a registered plugin tool before a skill can reference it. Skills are orchestration only — they wire plugin tools together, interpret output, and make decisions. They never contain raw API calls, HTTP requests, or direct data-source access.

Invariant rule:

Skill ## Steps must not contain raw API calls or direct external access
All external operations delegate to tools declared in ## Required Tools
Each Required Tool maps to a plugin file at .run/teams/{name}/plugins/{tool_name}.ts
Built-in tools (memory_search, delegate_task, vault_get, etc.) are exempt — they are not external operations

Canonical workflow (4-step summary):

flowchart LR
    A[1. Search skills.sh] --> B[2. Create/extract plugins]
    B --> C[3. Create skill]
    C --> D[4. Wire to subagent]
    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#e8f5e9
    style D fill:#f3e5f5

Search — search_skill_repository for existing skills ≥60% match
Plugins first — extract/create plugin tools via register_plugin_tool(), verify each (typecheck + security scan)
Skill last — write skill markdown with ## Required Tools referencing the plugins
Wire to subagent — add skill to target subagent's ## Skills section

For the full adoption flow with match decision tree, see Skill-Repository#Skill Adoption Flow.

Consequences:

Skills become pure orchestration — judgment, sequencing, decision-making only
Plugin tools are single source of truth for all external integrations
Security scanning covers all external operations (no bypass via inline skill steps)
Existing skills with inline API calls must be refactored: extract operations to plugins, update Required Tools

Supersedes: None. Extends Architecture-Decisions#ADR-26 by mandating plugin extraction as part of the adoption flow.

ADR-40: Subagent-Only Execution

Status: Accepted

Context: The architecture described orchestrators as capable of invoking skills directly or handling simple tasks without subagents. In practice, this created inconsistency: some teams used subagents while others had orchestrators executing skills directly. Learning and reflection cycles were documented as team-level processes, but conceptually they belong to the entity that actually performs work — the subagent.

Decision: Establish three strict invariants:

Subagent-only execution. Orchestrators ALWAYS delegate to subagents. No direct skill invocation by orchestrators. A {subagent}.md is always loaded into context for task execution.
Main agent routes only. The main agent routes user requests to child teams and delegates. It has no subagents, no skills, and performs no direct execution.
Subagent-level learning and reflection. Learning and reflection cycles run at the subagent level, not the team/orchestrator level. The main agent has no learning or reflection triggers.

Additional decisions:

Propose+confirm model for self-evolution. Subagents can identify issues in their own files (subagent.md, skills/*.md, plugins/*.ts) and draft proposals, but they escalate to the orchestrator for confirmation before any changes are applied. The orchestrator reviews and applies approved changes (governance guards still enforce scope).
Trigger targeting. New optional subagent field in trigger configuration. If set, the orchestrator routes the triggered task directly to the named subagent (deterministic, no LLM cost). If null, the orchestrator reads subagent definitions and decides via LLM reasoning. Learning and reflection triggers always specify subagent. Constraint: skill may only be provided when subagent is also provided.

Consequences:

Every task execution flows through a subagent. No ambiguity about who executes work.
Orchestrators become pure managers — they select subagents but never follow skills themselves.
Learning/reflection journals and findings belong to specific subagents, not teams as a whole.
The main agent's role is strictly limited: routing and delegation only.
The propose+confirm model adds an orchestrator approval step to self-evolution, trading autonomy for oversight.

Supersedes: ADR-33 team-level learning trigger approach (learning is now subagent-level). ADR-37 team-level reflection approach (reflection is now subagent-level). ADR-14 "orchestrator can also invoke skills directly" (orchestrators always delegate to subagents).

ADR-41: Daily-ops vs Org-ops Concurrency

Status: Accepted Partial supersede of ADR-9: the "one session per team" invariant is relaxed for daily-ops.

Context: ADR-9's "one session per team" invariant forces every operation to serialize. The diagnostic evidence at Scenarios#19-Minute Trading Cycle Evidence shows an 11-call sequential query_team chain consuming 19 minutes for work that could run concurrently. Many team operations are read-heavy or append-only (memory queries, peer queries, log writes) and do not require the preemption guarantee that ADR-9 gives the structural path.

Decision: Classify every team operation as daily-ops (parallel allowed) or org-ops (serialized via a per-team mutex). A team may process up to max_concurrent_daily_ops (default 5) concurrent sessions; org-ops block new daily-ops admission and wait for in-flight daily-ops to drain before running.

Concurrency rule:

Daily-ops (read-heavy, append-only, per-key mutations): parallel up to max_concurrent_daily_ops (default 5).
Org-ops (structural changes, governance mutations): single-flight per-team mutex.
Per-key exception: mutable stores like memories use per-subject_key locking so concurrent writes to different keys proceed in parallel while same-key writes serialize. Derived indexes (e.g., memory_chunks_fts) follow the parent store's lock.

Org-ops tool set (single-flight per team): spawn_team, shutdown_team, update_team, modify_subagent, memory-schema edits, register_plugin_tool, update_trigger.

Drain policy: When an org-op queues while daily-ops are in flight, the mutex waits for in-flight daily-ops to finish, blocks new daily-ops admission, then runs. No mid-flight abort.

Consequences:

Peer fan-out via query_teams becomes wall-clock max(child_duration) instead of sum(...).
SQLite WAL writer serialization is the architectural bound; flagged for review at > 10 teams × > 3 daily-ops each.
Priority-admission order from ADR-9 is preserved for the structural path; daily-ops admission is unordered within the cap.

See Architecture#Execution Model for the canonical pool + mutex diagram and Organization-Tools for per-tool class tagging.

ADR-42: `window` Trigger Type for Continuous Watch

Status: Accepted Partial supersede of ADR-9: extends the trigger-type set with a fourth kind (window). Sessions remain disposable per ADR-10.

Context: Some duties require the team to be "on watch" during a specific window (e.g., market hours) with sub-minute responsiveness — not a discrete 15-minute cron. Vercel AI SDK's streamText has no pause/resume primitive, and Anthropic times out idle streams at ~10 minutes, so a literal long-lived self-motivated session is not architecturally achievable on our stack. Polling via a new trigger type delivers functional continuity while preserving ADR-10's disposable-session guarantee.

Decision: Add window as a fourth trigger type inside the existing Trigger Engine (ADR-7), peer to schedule / message / keyword. The engine opens the window on a cron expression, fires ticks at tick_interval_ms cadence while open, and closes on the exit cron. Each tick dispatches a fresh disposable session.

window config fields (stored in trigger_configs):

Field	Purpose
`watch_window`	cron expression defining when polling is active
`tick_interval_ms`	cadence within window (default 30000)
`max_tokens_per_window`	hard cap per `watch_window` occurrence
`max_ticks_per_window`	hard cap on tick count per window
`overlap_policy`	Reuses existing trigger overlap policy (`skip-then-replace`, `always-skip`, `always-replace`, `allow`) — applies when a prior tick is still running

Correctness model: Each tick spawns a fresh disposable session per ADR-10. Tick idempotency is the team's responsibility — subagents persist progress keys in memory (e.g., last_scan_cursor, last_event_id) so repeat ticks do not duplicate work. Rate limiting against external sources uses existing web_fetch with an optional rate_limit_key parameter plus per-team per-domain token buckets documented in Team-Configuration.

Warm session cache is explicitly deferred. This ADR ships polling only. A future ADR (to be numbered on creation — not ADR-44, which is reserved for the Activation Decision Framework) may add a warm-session cache after polling proves its value; that ADR will need to reconcile with ADR-10's "sessions are disposable" statement and explicitly exclude vault secret reads from any cache.

Consequences:

User-facing "continuous watch" semantics without a persistent LLM stream.
Each tick start pays one prompt-cache warmup cost (inherent to ADR-23).
Window boundaries: "complete current tick, do not start new ticks past window end"; holiday calendar is a plugin; watch_window cron uses the server's configured timezone (see Triggers#Timezone Handling).
No parallel scheduler — scheduling remains unified under the Trigger Engine (ADR-7).
Provider-side stream-abort risk (Anthropic ~10-min idle timeout) is sidestepped entirely because each tick is a bounded call.

See Triggers#window Trigger Type for the canonical state machine and Tool-Guidelines#Why window ticks feel long-running for the continuous-watch explainer.

ADR-43: Work-Handoff via `enqueue_parent_task`

Status: Accepted

Context: During a window tick, a child team may detect an event that warrants the parent's immediate action (e.g., a news-scanner detects breaking news mid-market). The existing escalate() tool is a notification channel — it does not create work in the parent's queue. Forcing the parent to poll the child is inefficient; coupling children directly to peers would violate the hierarchy.

Decision: Add enqueue_parent_task(task, priority, correlation_id?) as a new tool. Only the immediate parent is targetable (enforced by the existing parent-child check used by query_team). The payload carries context only, not subagent directives — the parent's orchestrator still decides routing. The existing escalate() notification tool is unchanged.

correlation_id is an opaque string supplied by the caller (e.g., a UUID). If omitted, the engine generates one. The engine uses it for dedup (same ID within 60 s is deduplicated) and rate-cap accounting.

Hierarchy preservation (ADR-40): The child does not bypass the parent's orchestrator. It writes a row to the parent's task_queue with the specified priority; the parent's task consumer dequeues it and routes through the orchestrator → subagent chain exactly as ADR-40 requires. This is push-based delegation, not peer-to-peer skill invocation.

Guard: per-child per-minute rate cap (default 10 / min). Same correlation_id within 60 s is also deduplicated as an implementation-level safeguard.

Consequences:

ADR-40 invariant (orchestrators always delegate; main routes only) is preserved.
escalate() remains informational; enqueue_parent_task handles work handoff.
Work-handoff storms are bounded by the rate cap.
Parent queue's priority admission (ADR-9) ensures high-priority handoff tasks jump ahead of routine work.

See Organization-Tools#enqueue_parent_task for the canonical flowchart.

ADR-44: Activation Decision Framework (Tier 1 System Rule)

Status: Accepted

Context: The existing Task Routing Decision Framework in Tool-Guidelines answers who does work (main → team → subagent). It does not teach LLMs when work should be on-demand vs trigger-scheduled vs continuously watched. Without a Tier 1 rule (ADR-3), every team — including future ones — will default to on-demand query_team even when a trigger or window is the correct activation. This is a system-wide gap, not a per-team coaching issue.

Decision: Introduce the Activation Decision Framework as a Tier 1 system rule in the rule cascade. Every team's orchestrator and subagents inherit it via the existing rule distribution mechanism — no team-level YAML edits required.

The framework answers one question: "What activates this work?"

User asks now → on-demand (query_team / query_teams)
Recurring clock → schedule trigger
Inbound user message → message trigger
Keyword in channel → keyword trigger
Continuous watch in a window → window trigger (ADR-42)

Activation vs Routing — two independent decisions:

Activation (new) answers when → choose trigger or on-demand mode.
Routing (existing) answers who → choose team and subagent.
An orchestrator consults Activation first, then Routing.

Design inspiration: The Hermes Agent (NousResearch) validates the cron + memory + silent-no-op pattern for delivering continuous-watch UX without a persistent LLM stream. OpenHive adopts that pattern — periodic ticks, memory cursors, and a no-op return contract — scoped to the window trigger type.

See Tool-Guidelines#Why window ticks feel long-running for the full continuous-watch explainer (guard-on-duty analogy), the no-op tick contract ({ action: "noop", reason: string }), and cursor discipline (namespaced <subagent_name>:<cursor_name>).

Tier 1 distribution:

The framework is delivered via the existing rule cascade to every team's orchestrator and subagents.
Rules-Architecture enumerates Tier 1 topics; this ADR adds "activation-mode selection" to that list.
Every existing team (main, trading, financial-news, stock, paper-trader) and every future team inherits the framework automatically.

Consequences:

Every team self-selects the right activation mode without per-team configuration.
The framework is a strong signal, not an enforced constraint — the LLM may still pick on-demand when a trigger would be correct. Self-Evolution reflection cycles flag repeated on-demand use of a pattern that should be trigger-driven.
Orchestrator subagent templates embed "Before delegating, consult the Activation Decision Framework" as a standard reminder.
The no-op contract and cursor discipline are enforceable via subagent Responsibilities (see Subagents#Window-Trigger Subagents).

See Tool-Guidelines#Activation Decision Framework for the canonical decision flowchart.

Architecture Decisions - Z-M-Huang/openhive GitHub Wiki

Architecture Decisions

ADR-1: Single Container over Docker-per-Team

ADR-3: Rules-First over Code-First Architecture

ADR-4: Tool Wrapper Pattern and Governance Guards for Invariant Enforcement (Defense in Depth)

ADR-5: config.yaml for Team Manifest with Inline API Keys via Provider Profiles

ADR-6: Deny-by-Default Tool Allowlists per Team

ADR-7: Trigger Engine over SDK Scheduling (SDK Has No Built-in Cron)

ADR-8: Credential Scrubbing in Logs as Hard Requirement

ADR-9: Priority Admission-Order Task Queue (No Mid-Task Preemption)

ADR-10: Disposable Sessions with Durable Workflow State in SQLite

ADR-11: Central Approved Provider Profiles (Not Per-Team Raw Config)

ADR-12: Cooperative Isolation Model (SDK-Level, Not OS-Level)

ADR-13: Uniform Recursive Team Design (All Nodes Identical, Rules Define Scope)

ADR-14: Skills Separate from Agent Identity (Modular HOW vs. WHO)

ADR-15: Rule Cascade with org-rules/ and team-rules/ Directories

ADR-16: On-Demand Team Spawning (Spawn When Needed, Shutdown on Idle)

ADR-17: Three-Tier Data Model (Image + Config Volume + Runtime Workspace)

ADR-18: No Environment Variables for Configuration

ADR-19: Full Tool Permission Inside Container

ADR-21: LLM-Driven Routing with list_teams (Supersedes ADR-20)

ADR-22: Inline AI SDK Tools Replace Organization MCP Server

ADR-23: Prompt Cache Boundary for Rule Cascade

ADR-24: Typed Task Queue

ADR-25: Structured Task Workflow (Discover, Plan, Act, Verify, Done)

ADR-26: Online Skill Repository via Vercel Skills Ecosystem

ADR-27: Topic-Based Conversation Threading

ADR-29: SQLite-Only Memory System (Supersedes ADR-28)

ADR-30: Inbound Sender Trust Model

ADR-31: Admin Dashboard

ADR-32: Team Vault

ADR-33: Autonomous Learning System

ADR-34: Trigger Instance Overlap Policy (Skip-Then-Replace)

ADR-35: Bootstrap Learning Triggers as Active

ADR-36: Main Team Task Processing and Dead-Letter Detection

Main Team Bootstrap

Dead-Letter Detection

Root-Team Escalation

ADR-37: Self-Reflection Sessions

ADR-38: Task Queue Stall Detection (Engine-Level Infrastructure)

ADR-39: Plugin-First Invariant

ADR-40: Subagent-Only Execution

ADR-41: Daily-ops vs Org-ops Concurrency

ADR-42: window Trigger Type for Continuous Watch

ADR-43: Work-Handoff via enqueue_parent_task

ADR-44: Activation Decision Framework (Tier 1 System Rule)

ADR-42: `window` Trigger Type for Continuous Watch

ADR-43: Work-Handoff via `enqueue_parent_task`