Scenarios - Z-M-Huang/openhive GitHub Wiki
End-to-end operational walkthroughs. Each scenario shows the complete flow from user action to final result, including failure modes and system behavior. For component details, see the linked pages.
Convention: Scenarios show the flow (who calls what, in what order). They do not duplicate component documentation -- they link to the canonical pages for details.
spawn_team is a two-step async flow. The handler returns immediately with a queued status; the bootstrap session runs in the background and the originating channel is notified when it completes.
- User messages: "Create a QA team"
- Main agent calls
spawn_team(name: "qa", description: "...", scope_accepts: ["testing and QA automation", "end-to-end test strategy"], init_context: "...") -
spawn_teamhandler (see Organization-Tools#org-tools.ts):- a. Validates: name not already in org tree (duplicate check before scaffolding)
- b. Scaffolds directory:
.run/teams/qa/withconfig.yaml,org-rules/,team-rules/(includingteam-context.md),skills/,subagents/ - c. Registers in org tree (SQLite
org_treetable) - d. Stores scope keywords (SQLite
scope_keywordstable) - e. Enqueues bootstrap task (type:
bootstrap, priority: critical) - f. Returns
{ status: 'queued', bootstrap_task_id: "task_abc123", message_for_user: "QA team is being set up — I'll let you know when it's ready." }immediately
- Main agent echoes
message_for_userto the user — the caller MUST relay this message so the user understands the team is initialising asynchronously.
- TaskConsumer dequeues the bootstrap task and calls
handleMessage(init_context)in a fresh session for the QA team. - Bootstrap session creates subagent definitions, skills, plugins, and learning/reflection triggers.
- On completion, TaskConsumer posts
"[qa] Team bootstrapped and ready."to the originating channel. - Parent sees via
get_status:(initializing)while bootstrap runs, thenactiveafter.
spawn_team rejects at step 3a (before any scaffolding occurs)
- --> Returns error:
Team 'qa' already exists - --> No filesystem or SQLite changes made
Step 3a passes, 3b fails mid-way
- --> Rollback: removes partially scaffolded directory, returns error to parent
- --> No partial state left in SQLite or filesystem (spawn-team.ts rollback logic)
Steps 3a-3f succeed, but bootstrap session hits max_turns or errors in step 5
- --> Task marked
failed; parent sees "Bootstrap failed" viaget_status - --> Team exists in org tree but may need manual intervention or
shutdown_team
sequenceDiagram
participant User
participant Main as Main Agent
participant Handler as spawn_team handler
participant FS as Filesystem
participant DB as SQLite
participant TC as TaskConsumer
participant QA as QA Session
User->>Main: "Create a QA team"
Main->>Handler: spawn_team(name:"qa", ...)
Handler->>DB: Check org_tree for "qa"
alt Duplicate name
DB-->>Handler: exists
Handler-->>Main: Error: "Team 'qa' already exists"
else Name available
DB-->>Handler: not found
Handler->>FS: Scaffold .run/teams/qa/
Handler->>DB: Insert org_tree + scope_keywords
Handler->>DB: Enqueue bootstrap task (critical, bootstrap_task_id)
Handler-->>Main: { status: 'queued', bootstrap_task_id, message_for_user }
Main-->>User: "QA team is being set up — I'll let you know when it's ready."
Note over TC,QA: Step B runs asynchronously
TC->>DB: Dequeue bootstrap task
TC->>QA: handleMessage(init_context)
QA-->>TC: Bootstrap complete
TC->>Main: "[qa] Team bootstrapped and ready."
Main-->>User: "[qa] Team bootstrapped and ready."
end
- User messages: "Review the PR on the auth module"
- Main agent calls
list_teams()to see available children (see Organization-Tools#LLM-Driven Routing) - Main agent picks "engineering" based on description/keywords -- calls
delegate_task("engineering", "Review PR on auth module", priority: "high") -
delegate_taskhandler validates: "engineering" is a direct child of caller (delegate-task.ts:51) - Task enqueued in SQLite
task_queue(type:delegate, priority: high,sourceChannelIdfrom caller) - TaskConsumer dequeues task -- fresh
handleMessage()call creates new session for engineering - Engineering orchestrator reads subagent definitions, picks the best-fit subagent (e.g.,
code-reviewer), and delegates the task to it (ADR-40) - Result stored in
task_queue-- notification routed tosourceChannelId
delegate_task validates parent-child, not just ancestry -- caller must be direct parent
- --> Returns error:
Team 'engineering' is not a child of 'main' - --> Must traverse through intermediate teams (escalate or re-route)
Steps 1-7 proceed, but engineering team cannot handle the task
- --> Engineering calls
escalate({message: "This is a frontend issue, not backend"}) - --> Escalation creates type:
escalationtask for parent (Main) - --> Main receives escalation, re-routes to correct team
Steps 1-3 proceed, but "engineering" not in org tree
- -->
delegate_taskreturns error:Team 'engineering' not found - --> Main agent may spawn the team first, then delegate
sequenceDiagram
participant User
participant Main as Main Agent
participant Handler as delegate_task handler
participant DB as SQLite
participant TC as TaskConsumer
participant Eng as Engineering Session
User->>Main: "Review the PR on auth module"
Main->>Main: list_teams() → picks "engineering"
Main->>Handler: delegate_task("engineering", task, "high")
Handler->>DB: Validate direct parent-child
alt Not direct child
DB-->>Handler: validation fails
Handler-->>Main: Error: not a child
else Valid
Handler->>DB: Enqueue task (type:delegate, priority:high)
Handler-->>Main: Task queued
TC->>DB: Dequeue task
TC->>Eng: handleMessage(task)
Eng->>Eng: Process task
alt Task succeeds
Eng-->>TC: Result
TC->>DB: Store result
TC->>Main: Notification via sourceChannelId
else Cannot handle
Eng->>Handler: escalate({message:"frontend issue"})
Handler->>DB: Enqueue escalation for Main
TC->>Main: Escalation notification
Main->>Main: Re-route to correct team
end
end
- Parent team needs quick info -- calls
query_team("ops", "What's the current deployment status?") -
query_teamhandler validates caller is direct parent of target (query-team.ts:58 checks parent-child, not ancestry) - Calls
queryRunnerRef()--handleMessage()creates fresh session for ops with the query - Ops session processes query synchronously (runs to completion within the parent's tool call)
- Response returned directly to parent as tool result (no task queue, no notification)
queryRunnerRef is undefined or providers not configured (query-team.ts:63)
- --> Returns error:
providers not configured(same error message for both cases) - --> Parent sees error, can retry after providers are configured and bootstrap completes
Query completes but LLM produces no text (hits max_turns with only tool calls)
- -->
queryRunnerreturns empty string or throws - --> Parent receives error, can rephrase query or use
delegate_taskinstead
sequenceDiagram
participant Parent as Parent Agent
participant Handler as query_team handler
participant DB as SQLite
participant Runner as queryRunnerRef
participant Ops as Ops Session
Parent->>Handler: query_team("ops", "deployment status?")
Handler->>DB: Validate direct parent-child
alt Not direct parent
Handler-->>Parent: Error: not authorized
else Valid
Handler->>Runner: queryRunnerRef()
alt Runner unavailable or no providers
Runner-->>Handler: undefined / error
Handler-->>Parent: Error: "providers not configured"
else Runner available
Runner->>Ops: handleMessage(query)
Ops->>Ops: Process query synchronously
alt Response text produced
Ops-->>Runner: Response text
Runner-->>Handler: Response
Handler-->>Parent: Direct tool result
else Empty response (max_turns, no text)
Ops-->>Runner: Empty string / error
Runner-->>Handler: Error
Handler-->>Parent: Error: empty response
Note over Parent: Can rephrase or use delegate_task
end
end
end
- Keyword trigger "deploy-monitor" fires on keyword match in a channel (see Triggers#Execution Flow)
- Engine checks: trigger is active, not rate-limited, event not deduplicated, overlap policy allows firing
- Trigger engine calls
delegateTask(team, task)-- enqueues intask_queue(type:trigger, correlationId:trigger:deploy-monitor:<uuid>) -
TRIGGER_NOTIFY_INSTRUCTIONappended to task content bytask-consumer.ts - TaskConsumer dequeues -- fresh
handleMessage()call processes task - LLM includes
{"notify": true/false}in response -
parseLlmNotifyDecision()extracts decision,stripNotifyBlock()cleans response - If
notify: true-- response routed tosourceChannelId(channel where keyword was detected) - If
notify: false-- result stored intask_queueonly (no channel notification)
- Schedule trigger "daily-report" fires at 9:00 AM (see Triggers#Execution Flow)
- Engine checks: trigger is active, not rate-limited, event not deduplicated, overlap policy allows firing
- Trigger engine calls
delegateTask(team, task)-- enqueues intask_queue(type:trigger, correlationId:trigger:daily-report:<uuid>) -
TRIGGER_NOTIFY_INSTRUCTIONappended to task content bytask-consumer.ts - TaskConsumer dequeues -- fresh
handleMessage()call processes task - Schedule triggers have no
sourceChannelId-- results are stored intask_queuebut not pushed to any channel - If the LLM determines the result warrants attention, it calls
escalate()to notify the parent team
Event ID already processed within TTL window
- --> Trigger does not fire; silently skipped (see Triggers#Execution Flow)
Trigger source exceeded rate limit
- --> Trigger does not fire; logged as rate-limited
Trigger is in pending or disabled state
- --> Trigger does not fire; handler not registered in engine
Task fails -- trigger engine increments failure counter (see Triggers#Circuit Breaker)
- --> After N consecutive failures (
failure_threshold, default 3) -- trigger auto-disabled - --> Logged as warning, can be re-enabled via
enable_trigger
Response has no {"notify": ...} block
- --> Fail-safe: default to
notify: true(never silently suppress)
Keyword or message trigger has no sourceChannelId (unexpected — these should always have one)
- --> Logged as error:
Task notification has no sourceChannelId -- cannot route(index.ts:241) - --> Result stored but no notification sent
Note: Schedule triggers are inherently non-notifying — missing sourceChannelId on a schedule trigger is expected, not an error. See Triggers#Notification Routing & Policy.
sequenceDiagram
participant Cron as Cron / Event Source
participant Engine as Trigger Engine
participant DB as SQLite
participant TC as TaskConsumer
participant Team as Team Session
participant Ch as Channel Adapter
Cron->>Engine: Timer/event fires trigger
Engine->>Engine: Check: active? dedup? rate-limit? overlap?
alt Dedup/rate-limit/inactive/overlap-skip
Engine-->>Engine: Skip (no task created)
else Passes checks
Engine->>DB: Enqueue task (type:trigger, correlationId)
TC->>DB: Dequeue task
Note over TC: Appends TRIGGER_NOTIFY_INSTRUCTION
TC->>Team: handleMessage(task + notify instruction)
Team->>Team: Process task
Team-->>TC: Response with {"notify": true/false}
TC->>TC: parseLlmNotifyDecision()
alt No notify decision in response
Note over TC: Fail-safe: default to notify: true
end
alt keyword/message trigger + notify: true
TC->>Ch: Route response to sourceChannelId + topicId
else notify: false OR schedule trigger
TC->>DB: Store result only
Note over Team: Schedule: escalate() if significant
end
end
- Child team encounters issue outside its scope
- Calls
escalate({message: "Need database access for migration", reason: "no DB credentials"})(schema is{message, reason?}-- there is noseverityfield; see escalate.ts:14) -
escalatehandler validates parent exists, generates escalation correlation ID - Task enqueued for parent (type:
escalation, priority: high) - Parent session processes escalation, takes action (e.g., delegates to DB team)
escalate called by main (root) team — the root team has no parent, so escalate returns an error. The main agent communicates directly with the user via the channel adapter; it does not need an escalation path. Learning and reflection findings originate from child-team subagents (ADR-40), which escalate through the normal child → parent → main → user chain.
sequenceDiagram
participant Child as Child Team
participant Handler as escalate handler
participant DB as SQLite
participant TC as TaskConsumer
participant Parent as Parent Team
Child->>Handler: escalate({message, reason?})
Handler->>DB: Look up parent of child
alt Root team (no parent)
DB-->>Handler: no parent found
Handler-->>Child: Error: root team cannot escalate
else Has parent
Handler->>DB: Enqueue escalation task (type:escalation, priority:high)
Handler-->>Child: Escalation sent
TC->>DB: Dequeue escalation
TC->>Parent: handleMessage(escalation)
Parent->>Parent: Decide action
Parent->>Parent: delegate_task to appropriate team
end
- Parent decides child team's work is done
- Calls
shutdown_team(name: "marketing-q4", cascade: false) -
shutdown_teamhandler (see Organization-Tools#org-tools.ts):- a. Deletes all
task_queuerows for team (shutdown-team.ts:87 -- rows are DELETED viaremoveByTeam, not marked failed) - b. Deletes all
memoriesandmemory_chunksrows for team - c. Removes triggers for team (see Triggers#Per-Team Registry)
- d. Removes team from org tree
- e. Terminates session
- f. Deletes team directory (
.run/teams/marketing-q4/)
- a. Deletes all
- Team data is gone. The team name can be re-used -- a fresh
spawn_teamwith the same name creates a new team.
cascade: true -- shuts down team AND all descendants
- --> Each descendant goes through the same shutdown sequence
- --> Order: leaves first (deepest children), then upward
sequenceDiagram
participant Parent as Parent Agent
participant Handler as shutdown_team handler
participant DB as SQLite
participant FS as Filesystem
participant Session as Team Session
Parent->>Handler: shutdown_team("marketing-q4", cascade:false)
Handler->>DB: Delete task_queue rows (removeByTeam)
Handler->>DB: Delete memories + memory_chunks rows
Handler->>DB: Remove triggers
Handler->>DB: Remove from org_tree
Handler->>Session: Terminate session
Handler->>FS: Delete .run/teams/marketing-q4/
Handler-->>Parent: Team shut down
Note over Parent: Name "marketing-q4" is now available for re-use
- Team calls
browser_navigate(url: "https://example.com/api/docs") -
Gate 1: Check team has
browser:config (browser-tools.ts:18) -
Gate 2:
validateBrowserUrl()checks SSRF + domain allowlist (see Browser-Proxy#SSRF Protection) -
BrowserRelay.callTool()forwards to@playwright/mcpchild process -
@playwright/mcpnavigates Chromium, returns result - Result returned to team session
Team's config.yaml has no browser: section
- --> Gate 1 fails (browser-tools.ts:18): returns
{success: false, error: "browser tools not enabled for this team"}
@playwright/mcp not installed or init failed at startup
- --> Browser tools not registered in tool set at all (conditional registration)
- --> Model cannot invoke browser tools (they do not appear in
activeTools)
URL points to private IP (e.g., 169.254.169.254 AWS metadata)
- -->
validateBrowserUrl()rejects immediately - --> Error:
URL blocked: private/reserved address
Team has browser.allowed_domains, URL hostname does not match
- -->
validateBrowserUrl()rejects - --> Error:
URL hostname not in allowed domains
sequenceDiagram
participant Team as Team Session
participant BT as browser_navigate handler
participant Val as validateBrowserUrl
participant Relay as BrowserRelay
participant PW as @playwright/mcp
Team->>BT: browser_navigate("https://example.com/...")
BT->>BT: Gate 1: browser: config exists?
alt No browser config
BT-->>Team: Error: "browser tools not enabled"
else Config present
BT->>Val: validateBrowserUrl(url)
alt SSRF blocked
Val-->>BT: Rejected: private/reserved IP
BT-->>Team: Error: "URL blocked"
else Domain not in allowlist
Val-->>BT: Rejected: hostname not allowed
BT-->>Team: Error: "URL hostname not in allowed domains"
else URL valid
Val-->>BT: OK
BT->>Relay: callTool("browser_navigate", url)
alt Relay unavailable
Relay-->>BT: Error
BT-->>Team: Error: "BrowserRelay unavailable"
else Relay OK
Relay->>PW: Navigate Chromium
PW-->>Relay: Page result
Relay-->>BT: Result
BT-->>Team: Navigation result
end
end
end
- Parent agent calls
create_trigger(team: "ops", name: "daily-health", type: "schedule", config: {cron: "0 9 * * *"}, task: "Run health check") - Trigger created in
pendingstate (see Triggers#Trigger State Machine) - Parent calls
test_trigger(team: "ops", trigger_name: "daily-health")-- enqueues a one-shot task; returnstaskIdfor tracking (does NOT return the task result directly) - Parent checks task result via
get_status-- then callsenable_trigger(team: "ops", trigger_name: "daily-health") - Trigger moves to
activestate; handler registered in engine
Note: create_trigger does NOT validate cron expressions at creation time (create-trigger.ts:31). Invalid cron expressions will fail at runtime when the trigger engine attempts to schedule the handler.
Name must match /^[a-z0-9]+(-[a-z0-9]+)*$/ (create-trigger.ts:11)
- --> Returns validation error with expected format
Cron expression is syntactically invalid -- enable_trigger sets state to active in SQLite before attempting handler registration (enable-trigger.ts:40). If node-cron rejects the expression at schedule() time (schedule.ts:18), the trigger is left in active state but without a running handler.
- --> No automatic rollback to
pending-- the trigger appears active but does not fire - --> Caller can use
disable_triggerthen fix the cron expression viaupdate_trigger
sequenceDiagram
participant Parent as Parent Agent
participant CT as create_trigger
participant TT as test_trigger
participant ET as enable_trigger
participant DB as SQLite
participant Engine as Trigger Engine
Parent->>CT: create_trigger("ops", "daily-health", schedule, ...)
CT->>CT: Validate slug format
alt Invalid name
CT-->>Parent: Error: invalid trigger name
else Valid name
CT->>DB: Insert trigger (state: pending)
CT-->>Parent: Trigger created (pending)
end
Parent->>TT: test_trigger("ops", trigger_name: "daily-health")
TT->>DB: Enqueue one-shot task (type: trigger)
TT-->>Parent: {taskId} (track via get_status)
Parent->>ET: enable_trigger("ops", trigger_name: "daily-health")
ET->>DB: Set state: active
ET->>Engine: Register handler
alt Cron expression valid
Engine-->>ET: Handler registered
ET-->>Parent: Trigger enabled
else Invalid cron at runtime
Engine-->>ET: node-cron rejects expression
Note over ET: DB still shows active (no rollback)
ET-->>Parent: Trigger enabled (but will not fire)
Note over Parent: Fix: disable_trigger + update_trigger cron + re-enable
end
- Team calls
web_fetch(url: "https://api.example.com/status", method: "GET") - SSRF check via
validateBrowserUrl()(same protection as browser tools; see Browser-Proxy#SSRF Protection) - Domain allowlist check against team's
browser.allowed_domains(if configured) - HTTP request made with configurable timeout
- Returns
{status, headers, body}(body truncated at limit)
Same as browser_navigate -- private/reserved IPs blocked
- -->
validateBrowserUrl()rejects immediately - --> Error:
URL blocked: private/reserved address
HTTP request exceeds timeout
- --> Returns error with timeout information
sequenceDiagram
participant Team as Team Session
participant WF as web_fetch handler
participant Val as validateBrowserUrl
participant HTTP as HTTP Client
Team->>WF: web_fetch("https://api.example.com/status", "GET")
WF->>Val: validateBrowserUrl(url)
alt SSRF blocked
Val-->>WF: Rejected: private/reserved IP
WF-->>Team: Error: "URL blocked"
else Domain not in allowlist
Val-->>WF: Rejected: hostname not allowed
WF-->>Team: Error: "URL hostname not in allowed domains"
else URL valid
Val-->>WF: OK
WF->>HTTP: GET https://api.example.com/status
alt Timeout
HTTP-->>WF: Timeout error
WF-->>Team: Error: request timed out
else Success
HTTP-->>WF: {status, headers, body}
WF-->>Team: {status, headers, body}
end
end
- User messages: "Create a skill for frontend code review"
- Main agent identifies engineering team as the owner — delegates via
delegate_task("engineering", "Create a code review skill for frontend") - Engineering orchestrator routes to its skill-builder subagent
- Subagent calls
search_skill_repository("frontend code review best practices") - Repository returns two matches:
- "frontend-design" from anthropics/skills (222K installs, 78% match)
- "code-review-guidelines" from community (12K installs, 85% match)
- Subagent presents both to user (via escalation) with install counts, sources, and match scores
- User picks "code-review-guidelines" (higher match score)
- Subagent downloads the SKILL.md content from the GitHub source
-
Extract/create plugins (plugin-first per ADR-39): Subagent identifies executable operations the skill needs. Registers each as a plugin tool via
register_plugin_tool({ tool_name, source_code })— e.g.,diff_analyzer.tsfor parsing PR diffs - Subagent tailors: reads the Vercel SKILL.md, converts to OpenHive format (Purpose, Steps, Inputs, Outputs, Error Handling), adds
## Required Toolslisting the plugins, adds team-specific review checklist, adapts to team's tech stack - Subagent writes the adapted skill to
.run/teams/engineering/skills/code-review.md -
Wire to subagent: Adds the skill to the target subagent's
## Skillssection insubagents/code-reviewer.md(see Skills#4-Step Creation Workflow) - Subagent tests the skill by invoking it on a sample task
- Result flows back: subagent → orchestrator → main → user: "Created code-review skill, adapted from community/code-review-guidelines (85% match, 12K installs), wired to code-reviewer subagent"
search_skill_repository returns no results ≥60%. Agent creates the skill from scratch per Skills#Initial Skill Creation — analyzes the team's purpose and generates a custom skill file.
Network error when querying skills.sh. Agent logs warning and creates the skill from scratch. Never blocks.
sequenceDiagram
participant User
participant Main as Main Agent
participant Eng as Engineering Orchestrator
participant SA as skill-builder Subagent
participant Repo as skills.sh
User->>Main: "Create a skill for frontend code review"
Main->>Eng: delegate_task("engineering", "Create code review skill")
Eng->>SA: invoke skill-builder subagent
SA->>Repo: search_skill_repository("frontend code review")
Repo-->>SA: [{name: "code-review-guidelines", match: 85%, installs: 12K}]
SA->>Eng: escalate("Found match, need user confirmation")
Eng->>Main: escalate to user
Main->>User: "Found 'code-review-guidelines' (85% match). Use this?"
User->>Main: "Yes"
Main->>Eng: "User confirmed"
Eng->>SA: "Proceed with match"
SA->>Repo: Download SKILL.md content
SA->>SA: Generate plugins (ADR-39) + tailor skill
SA->>SA: Wire to subagents/code-reviewer.md
SA-->>Eng: "Skill created and wired"
Eng-->>Main: result
Main-->>User: "Created code-review skill, wired to code-reviewer subagent"
- User sends: "Add 2FA to the login page"
- 0 active topics → new topic-1 ("Add 2FA") created automatically, no classification needed
- Main agent begins working: researching auth libraries, delegating to engineering team
- While topic-1 is processing, user sends: "What's the deploy status?"
- 1 active topic → main agent evaluates during processing and recognizes this is unrelated to 2FA
- New topic-2 ("Deploy status") created automatically
- Topic-2 session starts in parallel
- Topic-2 responds quickly: "Last deploy was 2 hours ago, all green"
- Response arrives with
topic_id: "t-def456",topic_name: "Deploy status"
- Response arrives with
- Topic-1 continues working in the background
- User sends: "Use WebAuthn instead of TOTP"
- 2 active topics → lightweight LLM classification call
- Classified to topic-1 ("Add 2FA") based on semantic match
- Topic-1 session receives the message and adjusts approach
User's message is routed to the wrong topic. The response doesn't match context. User clarifies with a follow-up message — the classifier re-routes to the correct topic based on content matching.
sequenceDiagram
participant User
participant TC as TopicClassifier
participant T1 as Topic-1: Add 2FA
participant T2 as Topic-2: Deploy Status
User->>TC: "Add 2FA to the login page"
Note over TC: 0 topics → new
TC->>T1: Create topic, handleMessage()
T1->>T1: Researching, delegating...
User->>TC: "What's the deploy status?"
Note over TC: 1 topic → agent evaluates → unrelated → new
TC->>T2: Create topic, handleMessage()
T2-->>User: "Last deploy 2h ago, all green" [topic: Deploy Status]
User->>TC: "Use WebAuthn instead of TOTP"
Note over TC: 2 topics → LLM call → matches "Add 2FA"
TC->>T1: Route to existing topic
T1-->>User: "Switching to WebAuthn..." [topic: Add 2FA]
-
Operations team completes an incident response. During the session, it saves a lesson:
- Calls
memory_save(key: "redis-timeout", content: "Redis connections timeout after 30s under load. Increase pool size to 20.", type: "lesson") - New row inserted in
memoriestable withteam_name: "operations",is_active: 1 - Content is chunked and indexed in
memory_chunks+memory_chunks_fts
- Calls
-
Three days later, a new session starts. The lesson is auto-injected into the system prompt's memory section under
[LESSON], making it available in the agent's reasoning context. -
During this session, the agent investigates a new Redis issue and discovers the original lesson was wrong:
- Calls
memory_search(query: "redis connection pool")→ returns theredis-timeoutentry with score 0.92 - Agent realizes the root cause was actually connection leak, not pool size
- Agent asks user: "I have a memory that says Redis timeouts are fixed by increasing pool size to 20, but I'm seeing a connection leak pattern. Which is correct?"
- User confirms: "It was a connection leak. The pool size change masked it temporarily."
- Calls
-
Agent supersedes the old memory with the correction:
- Calls
memory_save(key: "redis-timeout", content: "Redis timeouts were caused by a connection leak in the retry handler, not pool exhaustion. Fix: close connections in the finally block.", type: "lesson", supersede_reason: "User confirmed original diagnosis was wrong. Pool size increase masked the real issue (connection leak in retry handler).") - Old entry:
is_activeset to0 - New entry:
is_active: 1,supersedes_idpoints to old entry,supersede_reasonrecorded
- Calls
-
Later,
memory_search(query: "redis timeout")returns both entries:- Active entry (score 0.95): the corrected lesson
- Superseded entry (score 0.72, marked
[SUPERSEDED]): the original wrong diagnosis — still searchable for audit trail
Agent tries to save a new entry with a key that already exists but forgets the reason:
- Calls
memory_save(key: "redis-timeout", content: "New content...", type: "lesson") - Tool rejects:
"Active memory 'redis-timeout' already exists. Provide supersede_reason to replace it, or use a different key." - Agent must either provide a reason or choose a different key
Agent notices a typo in an existing memory:
- Calls
memory_save(key: "redis-timeout", content: "...corrected typo...", type: "lesson", supersede_reason: "minor correction") - Same mechanics, no user verification needed
sequenceDiagram
participant Agent as Operations Agent
participant MT as memory_save
participant DB as SQLite (memories)
participant Search as memory_search
participant User
Note over Agent: Session 1: Incident response
Agent->>MT: memory_save("redis-timeout", "Increase pool to 20", type:"lesson")
MT->>DB: INSERT (is_active=1)
Note over Agent: Session 2: New investigation
Agent->>Search: memory_search("redis connection pool")
Search->>DB: FTS5 + vector query
DB-->>Search: redis-timeout (score: 0.92)
Search-->>Agent: [{key: "redis-timeout", snippet: "...pool size to 20...", score: 0.92}]
Agent->>User: "Memory says pool size fix, but I see connection leak. Which is correct?"
User-->>Agent: "Connection leak. Pool size masked it."
Agent->>MT: memory_save("redis-timeout", "Connection leak in retry handler", type:"lesson", supersede_reason:"Pool size was wrong diagnosis")
MT->>DB: UPDATE old row (is_active=0)
MT->>DB: INSERT new row (supersedes_id=old.id, is_active=1)
Note over Agent: Session 3: Future search
Agent->>Search: memory_search("redis timeout")
Search-->>Agent: Active: corrected lesson (0.95)<br/>Superseded: original diagnosis (0.72, marked)
- Discord user (ID in sender_trust DB with trust_level="trusted") sends a message
- ChannelRouter forwards to TrustGate with channelType="discord", senderId="112233445566"
- TrustGate evaluates: denylist (not found) → DB (found, trusted) → allow
- Message proceeds to TopicClassifier → normal processing
- Admin later grants trust to a new sender via
add_trusted_sendertool - New sender's next message succeeds
Unknown sender (not in allowlist or DB) sends a message on a deny-policy channel.
- --> TrustGate returns static "Not authorized." response
- --> Message never reaches TopicClassifier or LLM
- --> Decision logged to trust_audit_log with reason="default_policy_deny"
Sender in denylist or with trust_level="denied" sends a message.
- --> TrustGate issues silent deny — no response at all
- --> Prevents enumeration (sender cannot determine if the system exists)
- --> Decision logged to trust_audit_log with reason="sender_denylist"
SQLite becomes inaccessible during operation.
- --> Deny-policy channels: TrustGate fails closed (all messages denied)
- --> Startup warning logged if no trust: section in channels.yaml
sequenceDiagram
participant User
participant Adapter as Channel Adapter
participant Router as ChannelRouter
participant TG as TrustGate
participant TC as TopicClassifier
participant Main as Main Agent
User->>Adapter: Send message
Adapter->>Router: Forward (channelType, senderId, message)
Router->>TG: Evaluate trust (channelType, senderId)
Note over TG: 6-step evaluation order:<br/>1. Sender denylist check<br/>2. DB lookup (sender_trust)<br/>3. Sender allowlist check<br/>4. Channel-specific overrides<br/>5. Channel-level policy<br/>6. Default policy
alt Denylist match (trust_level="denied")
TG-->>Router: DENY (silent)
Note over Router: No response sent to user<br/>Logged: reason="sender_denylist"
else Unknown sender + deny-policy channel
TG-->>Router: DENY (static message)
Router-->>Adapter: "Not authorized."
Adapter-->>User: "Not authorized."
Note over Router: Logged: reason="default_policy_deny"
else Trusted (allowlist or DB)
TG-->>Router: ALLOW
Router->>TC: Forward message
TC->>Main: Route to topic
Main-->>TC: Response
TC-->>Router: Response
Router-->>Adapter: Response
Adapter-->>User: Response
end
- Operator opens browser to container port (e.g., http://localhost:8080)
- Dashboard loads — SPA shell with navigation
- System Health Overview displays: uptime, SQLite size, team count, queue backlogs
- Operator navigates to Live Org Tree — sees team hierarchy with status and queue depth
- Operator checks Task Queue Dashboard — filters by team, sees pending/running/done/failed/cancelled
- Operator browses Log Viewer — searches by level, team, time range
- Operator opens Memory Browser — views memories by team/type, supersede chains
- Operator opens Trigger Manager — enables a disabled trigger via toggle (one of two write operations; the other is plugin lifecycle actions)
- Operator checks Conversation History — sees message flow and topic states
Dashboard port not accessible (firewall, container not running).
- --> Operator can use Discord or direct DB access as fallback
Note: No authentication. Operator manages network access. For remote access, use Traefik, Cloudflare Tunnel, or similar reverse proxy with their own auth.
sequenceDiagram
participant Op as Operator
participant Browser
participant SPA as Dashboard SPA
participant API as REST API
participant DB as SQLite
Op->>Browser: Open http://localhost:8080
alt Dashboard reachable
Browser->>SPA: Load SPA shell
SPA->>API: GET /api/v1/overview
API->>DB: Query uptime, team count, queue stats
DB-->>API: System metrics
API-->>SPA: Health overview
SPA-->>Op: Dashboard loaded
Op->>SPA: Navigate to Trigger Manager
SPA->>API: GET /api/v1/triggers?team=ops-team
API-->>SPA: Trigger list with states
Op->>SPA: Enable a disabled trigger
SPA->>API: POST /api/v1/triggers/:id/enable
API->>DB: Update trigger state
DB-->>API: Updated
API-->>SPA: Trigger state changed
Note over SPA: Two write mutations: trigger toggle + plugin lifecycle
else Dashboard unreachable
Browser-->>Op: Connection refused
Note over Op: Fallback: Discord or direct DB access
end
- Operator runs bootstrap with
--dry-runflag to preview what credential migration would change - Dry-run output shows: which teams have
credentials:in config.yaml, which vault entries would be created, which runtime artifacts contain legacy credential accessor references - Operator reviews output and confirms migration is safe to proceed
- Bootstrap runs migration: for each team with
credentials:in config.yaml, insert each key-value pair intoteam_vaultwithis_secret=1,updated_by='migration'. Config.yaml-wins rule: if a key exists in both config.yaml and vault, the config.yaml value overwrites the vault entry - Runtime artifact scan: search
.run/teams/{name}/{skills,subagents,team-rules,org-rules}/*.mdfor legacy credential accessor references, replace each occurrence withvault_get - Remove
credentials:section from each team's config.yaml - Team sessions start with vault-based credential access — no behavioral change from the team's perspective
Migration encounters an error for a specific team (e.g., malformed credentials block, write failure).
- --> Team is quarantined (skipped); migration continues for remaining teams
- --> Other teams start normally with vault-based credentials
- --> Error logged with team name and failure reason
- --> Operator fixes the issue and re-runs migration for the quarantined team
Operator re-runs migration after a partial failure or after updating config.yaml credentials.
- --> Config.yaml-wins rule ensures re-running migration overwrites stale vault data with current config.yaml values
- --> Vault entries created by a previous run are overwritten, not duplicated
- --> Safe to re-run migration any number of times (idempotent)
sequenceDiagram
participant Op as Operator
participant Boot as Bootstrap
participant Cfg as Config Scanner
participant Vault as team_vault (SQLite)
participant FS as Filesystem (.run/teams/)
participant Team as Team Session
Op->>Boot: --dry-run
Boot->>Cfg: Scan all config.yaml files
Cfg-->>Boot: Teams with credentials: [ops, eng, qa]
Boot->>FS: Scan .run/teams/*/skills,subagents,team-rules,org-rules/*.md
FS-->>Boot: Files with legacy credential accessor refs: [3 files]
Boot-->>Op: Dry-run report (teams, vault entries, artifact refs)
Op->>Boot: Run migration
Boot->>Cfg: Scan config.yaml for each team
loop For each team with credentials:
alt Migration succeeds
Boot->>Vault: INSERT/UPDATE team_vault (is_secret=1, updated_by='migration')
Note over Vault: Config.yaml-wins: overwrites existing vault entries
Boot->>FS: Replace legacy credential accessor → vault_get in artifacts
Boot->>Cfg: Remove credentials: section from config.yaml
else Migration fails for team
Boot-->>Boot: Quarantine team, log error
Note over Boot: Continue with remaining teams
end
end
Boot-->>Op: Migration complete (N migrated, M quarantined)
Boot->>Team: Start team sessions
Team->>Vault: vault_get(key) for credential access
Vault-->>Team: Credential value
Bootstrap creates active learning-cycle-{subagent} triggers per subagent with readiness gates checked at runtime (ADR-35, ADR-40). Each trigger targets a specific subagent for deterministic routing. Main agent has no learning triggers (no subagents). For the full 6-phase cycle, vault journal structure, and configuration defaults, see Self-Evolution#Autonomous Learning.
- Schedule trigger
learning-cycle-learnerfires for thelearnersubagent in ops-team (see Triggers#Execution Flow). Orchestrator routes deterministically — no LLM cost. -
Readiness gates pass:
bootstrapped=1,scope_keywordspresent, 6-tool bundle available. -
Journal Read:
vault_get("learning:ops-team:learner:journal")— loads prior state (or initializes empty on first run). -
Topic Analysis: Derives topics from
scope_keywords, ranks by task-history gaps, checks existing memories to avoid re-learning. -
Web Discovery: Fetches sources via
web_fetch+ browser tools, skipping cached URLs (TTL-based) and deprioritized sources (90-day expiry). - Validation: Cross-domain corroboration — 3+ independent root domains → high confidence (lesson), 2 → medium (lesson), 1 → low (reference only). Near-duplicate/mirror content counts as one source.
-
Storage:
memory_savewith deterministic keylearn:{topic_slug}:{claim_hash}for dedup. Capped atmax_learnings_per_session(default 5). - Journal Update: Persists progress, prunes expired deprioritized sources, sets next focus.
-
Duration budget: If
max_duration_minutes(default 30) exceeded, in-progress operation completes, then skips to journal update and exits gracefully.
- Task fails — journal NOT updated (no partial corruption)
- Circuit breaker increments; after 3 consecutive failures: trigger auto-disabled (see Triggers#Circuit Breaker)
- Non-fatal: treated as first run — existing memories prevent duplicate storage via dedup keys
- Topic coverage lost but no incorrect data injected
sequenceDiagram
participant Cron as Schedule Trigger
participant SA as learner Subagent
participant Skill as learning-cycle skill
participant Vault as vault_get / vault_set
participant Web as web_fetch / browser
participant Mem as memory_save / memory_search
Cron->>SA: delegateTask (deterministic routing)
SA->>Skill: follow learning-cycle.md
Skill->>Skill: Readiness gates + 6-tool check
Skill->>Vault: vault_get(journal)
Vault-->>Skill: prior state (or empty)
Skill->>Skill: Derive topics from scope_keywords
Skill->>Web: Fetch sources (skip cached + deprioritized)
Web-->>Skill: Candidate findings
Skill->>Skill: Cross-domain corroboration
Skill->>Mem: memory_save (lessons + references)
Skill->>Vault: vault_set(journal with updated progress)
alt Duration budget exceeded
Skill->>Vault: vault_set(journal — partial progress)
Note over Skill: Graceful exit
end
alt Web tools fail
Web-->>Skill: Error
Note over Skill: Task fails — journal untouched
end
- Organization-Tools -- Tool definitions and guardrails
- SDK-Integration -- Session engine and tool assembly
- Triggers -- Trigger engine, circuit breaker, management tools
- Memory-System -- Memory types, schema, search pipeline, supersede mechanism
- Browser-Proxy -- Browser automation, SSRF protection, web fetch
- Durability-Recovery -- Crash recovery scenarios (recovery-specific, not operational)
- Skill-Repository -- Online skill discovery via Vercel skills ecosystem
- Conversation-Threading -- Topic-based parallel conversations
- Architecture-Decisions#ADR-30 -- Sender trust gate architecture decision
- Architecture-Decisions#ADR-33 -- Autonomous learning system architecture decision
- Request-Processing -- Request processing pipeline
- Channel-Adapters -- Channel adapter configuration and trust policies
- Admin-Dashboard -- Dashboard panels, API, and deployment
- Architecture-Decisions#ADR-35 -- Readiness gates for trigger execution
- Architecture-Decisions#ADR-38 -- Stall detection architecture decision
- Architecture-Decisions#ADR-37 -- Self-reflection architecture decision
- Architecture-Decisions#ADR-41 -- Daily-ops vs org-ops concurrency
-
Architecture-Decisions#ADR-42 --
windowtrigger type for continuous watch -
Architecture-Decisions#ADR-43 -- Work-handoff via
enqueue_parent_task - Architecture-Decisions#ADR-44 -- Activation Decision Framework
- Tool-Guidelines#Activation Decision Framework -- When to use on-demand vs triggers
- Schedule trigger
reflection-cycle-learnerfires thereflection-cycleskill for thelearnersubagent in the operations team at 3 AM (see Triggers#Reflection Trigger). Orchestrator routes directly to the subagent (deterministic routing per ADR-40). -
READINESS GATES: Skill checks
bootstrapped=1,scope_keywordspresent, and required tool bundle (vault_get,vault_set,memory_save,memory_search,memory_list,list_completed_tasks). All gates pass -- proceed -
JOURNAL READ:
vault_get("reflection:ops-team:learner:journal")returns previous journal withnext_focus: "task completion time" -
EVIDENCE GATHER:
list_completed_tasksretrieves last 50 completed tasks. Pattern analysis identifies 12 tasks with >10 min completion time for simple queries - DIAGNOSE: Highest-impact issue: slow response template loading adds ~3 min per task. Evidence: 12/50 tasks affected
- PROPOSE: Draft change: add early-exit to response template skill when input matches a known pattern. Before/after comparison shows expected 60% reduction in affected task completion time
- APPLY: Subagent escalates proposal to orchestrator for confirmation (propose+confirm model per ADR-40). Orchestrator reviews, approves, and applies the change via Edit tool with governance enforcement
-
JOURNAL UPDATE:
vault_set("reflection:ops-team:learner:journal", ...)records diagnosis, proposal, outcome (applied), next focus ("error rate patterns")
- Trigger fires, readiness gates pass
- Evidence gathered -- all tasks within normal parameters
- Diagnosis: no significant inefficiency detected
- Change skipped -- journal updated with "no action taken", next focus unchanged
- Trigger fires
- Readiness gate check:
list_completed_tasksnot in team'sallowed_tools - Skill logs warning:
"Reflection skipped: missing tool list_completed_tasks" - Exit without error. Trigger remains active for next firing
sequenceDiagram
participant TE as Trigger Engine
participant Orch as Orchestrator
participant SA as Subagent (learner)
participant Skill as reflection-cycle skill
participant Tasks as list_completed_tasks
participant Vault as team_vault
participant Gov as Governance
TE->>Orch: reflection-cycle-learner fires (3 AM)
Orch->>SA: Route to learner (deterministic)
alt Readiness gates pass
SA->>Skill: Follow reflection-cycle.md steps
Skill->>Vault: vault_get("reflection:ops-team:learner:journal")
Vault-->>Skill: Previous journal (next_focus: "task completion time")
Skill->>Tasks: list_completed_tasks (last 50)
Tasks-->>Skill: Task outcomes with duration data
alt Actionable issue found
Note over Skill: Diagnose: slow template loading (12/50 tasks)
Skill->>SA: Propose: add early-exit to template skill
SA->>Orch: escalate() proposal for confirmation
Orch->>Gov: Governance check (scope, cooldown)
Gov-->>Orch: Approved
Orch->>SA: Apply change via Edit tool
SA->>Skill: Change applied
Skill->>Vault: vault_set("reflection:ops-team:learner:journal", {applied, next_focus})
else No actionable issue
Note over Skill: All tasks within normal parameters
Skill->>Vault: vault_set("reflection:ops-team:learner:journal", {no_action, next_focus unchanged})
end
else Tool missing (e.g., list_completed_tasks)
SA->>Skill: Readiness gate check
Skill-->>SA: warn("Reflection skipped: missing tool")
Note over SA: Exit without error. Trigger remains active.
end
- Stall detector periodic scan runs (every 10 minutes, engine-level infrastructure — see Architecture-Decisions#ADR-38)
- Scan queries
task_queueforpendingtasks older than 1 hour andpending/runningtasks older than 24 hours - One task found:
task_id=42, teamops-team, statuspending, age 2 hours 15 minutes -
Warning-level alert: logged at
warnlevel. Alert routed to originating channel (if available) or escalated - Next scan (10 min later): same task still pending, age 2 hours 25 minutes -- warning repeated
- Team processes the task before the 24-hour threshold -- task transitions to
done, no further alerts
- Task
task_id=99, teamresearch-team, statusrunning, age 25 hours -
Error-level alert: logged at
errorlevel. Alert escalated through hierarchy - Main team delivers escalation to user via channel adapter (root team escalation path)
- Operator investigates -- blocked session due to missing API credentials. Credentials added, task resumes
- Stall detector scan runs
- No tasks exceed either threshold
- Logged at
debuglevel: "Stall detection scan clean" - No alerts generated
sequenceDiagram
participant SD as Stall Detector
participant DB as task_queue (SQLite)
participant Log as Logger
participant Ch as Channel Adapter
participant Main as Main Agent
participant User
loop Every 10 minutes
SD->>DB: Query pending >1hr, pending/running >24hr
alt Stalled task found (<24hr)
DB-->>SD: task_id=42, ops-team, pending, 2hr 15min
SD->>Log: warn("Stalled task", {task_id, team, age})
alt sourceChannelId present
SD->>Ch: Route warning to originating channel
else No sourceChannelId (schedule-triggered)
SD->>Main: escalate() warning to parent team
end
Note over SD: Next scan: re-check same task
else Stalled task found (>24hr)
DB-->>SD: task_id=99, research-team, running, 25hr
SD->>Log: error("Critical stall", {task_id, team, age})
SD->>Main: escalate() through hierarchy
Main->>User: "Task stalled >24hr in research-team"
else Clean scan
DB-->>SD: No tasks exceed thresholds
SD->>Log: debug("Stall detection scan clean")
end
end
This scenario demonstrates the activation primitives introduced by ADR-41 through ADR-44 working together: a team on continuous watch during a bounded window detects an event, hands work to its parent, and the parent fans out a parallel research query across peers before executing a mutation. The walkthrough uses a generic "inventory-watcher" analogue and can be re-read with domain-specific names (trading, security, news, etc.) without structural changes.
Before ADR-41/42, an orchestrator that needed answers from three peers invoked query_team sequentially because each child enforced the ADR-9 "one session per team" invariant. The live database recorded a single cycle of 1,136,410 ms ≈ 19 minutes — eleven query_team calls, none overlapping. Each arrow below is a distinct blocking call:
sequenceDiagram
participant T as parent-orch
participant F as peer-A
participant S as peer-B
participant P as peer-C
Note over T: 19:10:04 cycle starts
T->>F: query_team (19:11:53)
F-->>T: result (200,891 ms)
T->>F: query_team (19:15:19)
F-->>T: empty (198,960 ms)
T->>S: query_team (19:18:51)
S-->>T: result (85,104 ms)
T->>P: query_team (19:20:21)
P-->>T: result (164,105 ms)
T->>S: query_team (19:23:42)
S-->>T: result (114,058 ms)
T->>P: query_team (19:25:47)
P-->>T: result (26,499 ms)
T->>P: query_team (19:27:10)
P-->>T: result (23,046 ms)
T->>F: query_team (19:30:33)
F-->>T: result (7,250 ms)
T->>S: query_team (19:30:46)
S-->>T: result (78,945 ms)
T->>P: query_team (19:32:14)
P-->>T: result (70,427 ms)
T->>P: mutate (19:33:35)
P-->>T: result (42,893 ms)
Note over T: cycle ends (≈19 min)
Wall-clock of the cycle equals the sum of child durations, not the max. ADR-41 removes the per-team single-flight for daily-ops so peers can run concurrently; query_teams (see Organization-Tools#query_teams — Parallel Fan-out) exposes that concurrency as a single tool call.
-
Bootstrap (setup). Parent orchestrator creates a
windowtrigger on the watcher child:create_trigger(team="watcher", name="market-hours-watch", type="window", config={ watch_window: "30 9-16 * * 1-5", tick_interval_ms: 30000, max_tokens_per_window: 200000, max_ticks_per_window: 800, overlap_policy: "always-skip" }, subagent="event-scanner", skill="scan-window"). Trigger is createdpending, verified withtest_trigger, then enabled (see Triggers#window Trigger Type and Tool-Guidelines#Trigger Tools). -
Window opens. At 09:30, the cron expression matches — engine transitions
market-hours-watchfromWindowClosedtoWindowOpen(see Triggers#window Trigger Type state diagram). -
Routine tick (no-op). At 09:30:30 the engine dispatches a fresh disposable session to the watcher (ADR-10). The
event-scannersubagent reads its cursorevent-scanner:last_scan_cursorfrom memory, fetches the upstream feed viaweb_fetch(url, rate_limit_key="upstream-feed"), finds no new items past the cursor, and returns{ "action": "noop", "reason": "no new items since 2026-04-15T09:30:00Z" }. The engine records the cursor update and produces no notification, no parent-queue insertion. See Tool-Guidelines#No-op Tick Contract. -
Eventful tick. At 10:47:30 a new item appears. The scanner records it, advances its cursor, and calls
enqueue_parent_task({ task: "new actionable event detected: <context>", priority: "high", correlation_id: "<uuid>" }). The payload carries context only, no subagent directive — the parent still routes (ADR-40/ADR-43). See Organization-Tools#enqueue_parent_task — Work Handoff. - Parent dequeues. The parent orchestrator dequeues the high-priority task, consults the Activation Decision Framework (Tool-Guidelines#Activation Decision Framework), and recognises this is on-demand research (a specific event is already in hand) rather than recurring work.
-
Parallel fan-out. Parent calls
query_teams([{team: "peer-A", query: "..."}, {team: "peer-B", query: "..."}, {team: "peer-C", query: "..."}], default_timeout_ms: 150000). Three child sessions run concurrently — wall-clock ismax(child durations)rather than the sum. All three are daily-ops per ADR-41 and share the parent's SQLite WAL without serialisation. -
Partial-failure tolerance.
peer-Ctimes out;peer-Aandpeer-Breturn{ok: true, result_or_error: ...}. Parent evaluates partial results per Tool-Guidelines#query_teams Partial Failure — quorum met, proceeds without retry. -
Mutation step. Parent delegates a single mutating task to a specialist child via
delegate_task(org-ops-free —delegate_taskitself is daily-ops; any structural change inside the specialist remains single-flight per-team). Specialist returns within its own cap. -
Window closes. At 16:00 the cron no longer matches. Any in-progress tick completes; no new ticks start. Engine transitions to
WindowClosed. The learner trigger (if configured for this team) then runs after window close per [[Self-Evolution#Interaction withwindowTriggers]].
The scanner detects an item, emits enqueue_parent_task, but crashes before writing the updated cursor.
- → Next tick re-reads the unchanged cursor, detects the same item again, and would emit a duplicate handoff.
- → Guard:
enqueue_parent_taskdedup window (default 60 s bycorrelation_id) absorbs the duplicate within the guard window. See Organization-Tools#enqueue_parent_task. - → Outside the dedup window, the subagent's idempotency invariant requires gating the external side effect on cursor advancement; re-queuing the same event after the dedup expiry is a subagent-level bug (see Subagents#Window-Trigger Subagents Responsibilities template).
Other failure modes (rate-limited handoff storms, query_teams partial failure, target saturation) are documented at their canonical locations: Organization-Tools#enqueue_parent_task — Work Handoff and Tool-Guidelines#query_teams Partial Failure.
sequenceDiagram
participant Eng as Trigger Engine
participant W as watcher subagent
participant Mem as memory (cursor)
participant P as parent-orch
participant A as peer-A
participant B as peer-B
participant C as peer-C
participant Sp as specialist
Note over Eng: Window opens at 09:30 (watch_window cron)
loop tick_interval_ms = 30s
Eng->>W: dispatch fresh session (ADR-10)
W->>Mem: read event-scanner:last_scan_cursor
alt no new items
W-->>Eng: {action:"noop", reason:"..."} (see Tool-Guidelines#No-op)
Note over Eng: No notification, no parent queue, cursor updated
else event detected
W->>Mem: write updated cursor + event id
W->>P: enqueue_parent_task(context, priority:high, correlation_id)
Note over P: Parent orchestrator routes per ADR-40
P->>P: consult Activation Framework → on-demand fan-out
par daily-ops parallel
P->>A: query_teams child
and
P->>B: query_teams child
and
P->>C: query_teams child
end
A-->>P: {team:"A", ok:true, result_or_error:"..."}
B-->>P: {team:"B", ok:true, result_or_error:"..."}
C-->>P: {team:"C", ok:false, result_or_error:"saturation"}
Note over P: Partial failure tolerated (Tool-Guidelines#query_teams Partial Failure)
P->>Sp: delegate_task(mutation with synthesized context)
Sp-->>P: result
end
end
Note over Eng: Window closes at 16:00 — no new ticks
| Concept | Canonical page |
|---|---|
query_teams fan-out sequence |
Organization-Tools#query_teams — Parallel Fan-out |
enqueue_parent_task handoff flowchart |
Organization-Tools#enqueue_parent_task — Work Handoff |
window trigger state machine |
Triggers#window Trigger Type |
| Activation Decision Framework | Tool-Guidelines#Activation Decision Framework |
| Daily-ops vs org-ops pool + mutex | Architecture#Execution Model |
When a user reports an issue, the fix flows through the full 5-layer hierarchy: Main → Orchestrator → Subagent → Skill → Plugin.
sequenceDiagram
participant User
participant Main as Main Agent
participant Orch as Team Orchestrator
participant SA as Subagent
participant SK as Skill
participant PL as Plugin
User->>Main: "Loggly alerts aren't working"
Note over Main: Routes only. Identifies ops-team.
Main->>Orch: delegate_task("ops-team", "Fix loggly alert issue")
Note over Orch: Reads subagent defs → picks loggly-monitor
Orch->>SA: invoke loggly-monitor subagent
Note over SA: Context loaded: loggly-monitor.md + skills + task
SA->>SK: follow alert-check skill steps
SK->>PL: loggly_fetch.ts → fetch recent alerts
PL-->>SK: alert data
SK->>PL: classify_entries.ts → analyze
PL-->>SK: classification
SK-->>SA: diagnosis complete
SA-->>Orch: "Fixed: alert threshold was misconfigured"
Orch-->>Main: result
Main-->>User: "Fixed. Alert threshold in loggly was misconfigured."