Tool Guidelines - Z-M-Huang/openhive GitHub Wiki

Tool Guidelines

This is a system rule (tool-guidelines.md) baked into the container image. It defines how agents choose which tool to use, when to delegate vs handle directly, and what structural changes require user confirmation. Every agent in the system receives this rule at the highest cascade priority.

For how system rules fit into the rule cascade, see Rules-Architecture#System Rules. For the structured workflow agents follow when executing tasks, see Task-Workflow.


Activation Decision Framework

Before asking who does the work, ask when the work should run. An orchestrator consults this framework first; routing decisions come after.

Activation vs Routing — two independent decisions:

  • Activation (this section) answers when → choose trigger or on-demand.
  • Routing (next section) answers who → choose team and subagent.

Decision Flowchart

flowchart TD
    Q{What activates this work?}
    Q -->|User asks now| OD["on-demand<br/>query_team / query_teams"]
    Q -->|Recurring clock| SCH["schedule trigger"]
    Q -->|Inbound user message| MSG["message trigger"]
    Q -->|Keyword in channel| KW["keyword trigger"]
    Q -->|Continuous watch in a window| WIN["window trigger - ADR-42"]

    WIN --> Cur["persist memory cursor per tick<br/>(namespaced: subagent:cursor_name)"]
    WIN --> NoOp["return no-op marker when nothing changed"]
    WIN --> Wake{Event detected?}
    Wake -->|work handoff| EPT["enqueue_parent_task - ADR-43"]
    Wake -->|info only| Esc["escalate"]
Loading

Rule of thumb: if the same task would be run on a clock, by event, or by continuous watch, it belongs in the trigger engine (ADR-7). On-demand is for work the user is asking for right now.

Why window ticks feel long-running

Literal long-running sessions are architecturally impossible on the Vercel AI SDK: streamText has no pause/resume primitive, and Anthropic times out idle streams at ~10 min. The window trigger does not pretend otherwise. Instead it delivers functional continuity — the user-facing experience of "this team is on duty" — through four mechanisms working together:

Security guard on duty 9am–5pm (real-world analogy)
  ✗ One uninterrupted 8-hour conscious state → impossible (humans blink)
  ✓ Periodic rounds every N minutes          → watch_window + tick_interval_ms
  ✓ Logbook between rounds                   → memory cursors (ADR-29)
  ✓ Silent when nothing happens              → no-op return (Hermes [SILENT])
  ✓ Shift handoff                            → window close/open boundary

The LLM experiences: "I'm on watch from 9:30 to 16:00; I wake every 30 s to check; my cursor tells me what I already saw; I stay silent when nothing changed." Each tick is a fresh disposable session per Architecture-Decisions#ADR-10 Disposable Sessions with Durable Workflow State in SQLite — continuity lives in memory, not in process state.

No-op Tick Contract

A window-tick subagent MUST return a structured no-op marker when its scan finds nothing actionable:

{ "action": "noop", "reason": "<why nothing to do>" }
  • This shape is canonical. Any other no-activity signal (empty string, null, "nothing to report") is treated as a real result.
  • The trigger engine treats a no-op return as success with empty output: no downstream notification, no parent-queue insertion, no memory mutation beyond the cursor update.
  • Rationale: prevents tick churn from polluting parent queues and notification channels; makes the continuous-watch experience cheap.

See Subagents#Window-Trigger Subagents for the enforceable Responsibilities template.

Cursor Discipline

Window-tick subagents MUST persist progress keys in memory so repeat ticks do not duplicate work:

  • Typical keys: last_scan_cursor, last_event_id, window_start_summary.
  • Each disposable session reads cursors at tick start, writes updated cursors at tick end (via memory_save; see Memory-System).
  • Namespacing is required: <subagent_name>:<cursor_name> (e.g., news-scanner:last_event_id). Memory supersede (ADR-29) uses subject_key; the namespace prevents collision when multiple subagents share a team.
  • The LLM experiences continuity through memory recall, not through process state.

Task Routing Decision Framework

When you receive a task, work through these steps in priority order. Stop at the first match.

Priority Condition Action
1 Main agent: Status check, clarification, or quick lookup Use org tools (list_teams, get_status, query_team) directly. The main agent has no subagents and no skills — it routes and delegates only.
1 Team orchestrator: The task matches one of your subagents Delegate to the appropriate subagent. Orchestrators always delegate to subagents — never invoke skills directly (ADR-40).
2 The task matches a child team's scope Call list_teams() to confirm available children, then delegate_task() to the best match.
3 No existing child team matches, but a new team would be appropriate Consider spawn_team() with appropriate config. See Structural Change Guidance first.
4 The task is outside your scope entirely Call escalate() to your parent with a clear explanation of why you cannot handle it.
5 You need a quick, synchronous answer from a child Call query_team() for an immediate response (no task queue, no notification).

Do not skip steps. If you are an orchestrator and a subagent fits (Priority 1), delegate to it. If a child team exists for the work (Priority 2), do not spawn a duplicate.


Hybrid Decisions

The LLM decides what to do. Code enforces how it happens.

Operation LLM Decides (judgment) Code Enforces (invariants)
Spawning a team Whether to spawn, team name, description, scope keywords, init_context Directory scaffolding, org tree registration, config validation, bootstrap task enqueue
Shutting down a team Whether the team is no longer needed, whether to cascade Session termination, DB row deletion, workspace directory removal
Escalating When to escalate, urgency framing, context summary Parent-child validation, correlation tracking, notification routing
Delegating Which child team fits, task description, priority level Parent-child validation, task queue insertion, channel threading
Rule assembly Working within the assembled rules Cascade loading order, conflict detection, override validation

You make the judgment calls. The code ensures those calls execute safely within system invariants. See Organization-Tools#Guardrails for the specific enforcement checks.


Per-Tool-Category Guidance

Organization Tools

Use list_teams() before every routing decision -- do not assume you know what children exist. Prefer delegate_task() for work that takes time; use query_team() only for quick lookups that the child can answer immediately. Use get_status() to monitor progress on delegated work.

Peer fan-out rule: when querying two or more peers with independent inputs, use query_teams() (plural). Never describe the fan-out as a numbered list of sequential steps in a prompt — the LLM will execute it sequentially. query_teams() makes parallelism explicit and bounds wall-clock to max(child_duration) instead of sum(...). See Organization-Tools#query_teams — Parallel Fan-out.

Work-handoff rule: use escalate() for "FYI, parent" (no parent queue mutation). Use enqueue_parent_task() for "parent, please do Y now" (priority admission into parent's queue, preserving ADR-40 hierarchy). See Organization-Tools#enqueue_parent_task — Work Handoff.

query_teams Partial Failure

query_teams returns {team, ok, result_or_error}[]. The orchestrator decides how to handle partial results:

  • Proceed with partial results when the missing child's answer is non-critical for the current decision.
  • Retry the failed subset when the decision requires all children. Wrap the retry in a fresh query_teams call targeting only the failed teams; do not retry silently inside a loop without a bound.
  • Abort and escalate when too many children fail or a specific critical child is missing.

Do not treat {ok: false} as a fatal error for the whole fan-out unless the orchestrator's routing logic requires it.

For tool definitions and parameters, see Organization-Tools#org-tools.ts.

Trust Tools

Trust tools manage the sender allowlist. These are main-agent-only tools — child teams cannot modify trust policy.

  • add_trusted_sender — grant a sender access to the system. Use when the operator explicitly approves a new sender or channel.
  • revoke_sender_trust — remove or deny a sender's access. Use when a sender should no longer be permitted, or to block a specific sender preemptively.
  • list_trusted_senders — audit the current trust list. Use before granting or revoking to understand the current state.

Always confirm trust changes with the operator before executing. Cross-ref: Organization-Tools and Architecture-Decisions#ADR-30.

Trigger Tools

Triggers create recurring or event-driven work. Always create triggers in pending state first, verify the configuration with test_trigger(), then enable_trigger(). Never create and enable in a single step.

For trigger tool definitions, see Organization-Tools#trigger-tools.ts. For trigger engine internals (circuit breaker, failure handling), see Triggers.

Browser Tools

Browser tools require the team to have browser: config. Use browser_navigate() + browser_snapshot() as the primary browsing pattern -- snapshots return structured accessibility data that is more reliable than screenshots for interaction. Use browser_screenshot() when visual context is needed.

For SSRF protection and domain allowlists, see Browser-Proxy.

Skill Repository

When creating a new skill or trigger, search the Vercel skills ecosystem first: understand what the user needs → call search_skill_repository → present matches ≥60% with trust signals (install counts, source reputation) → user picks a match or says "create from scratch" → download the SKILL.md → tailor to OpenHive format and user's specific needs → test. If skills.sh is unreachable, create from scratch without blocking.

For the full adoption flow and trust signals, see Skill-Repository.

Plugin Tools

Plugin tools are team-local TypeScript tools registered at runtime. They provide executable logic (API calls, data parsing) that skills orchestrate.

Registration: Call register_plugin_tool with a tool_name and source_code. Returns success with the tool object, or failure with a diagnostic message.

Naming rules:

  • snake_case only (e.g., fetch_logs, classify_entries)
  • No reserved names — built-in tool names (spawn_team, delegate_task, vault_set, etc.) are rejected

Security:

  • Source code is scanned for forbidden patterns (shell injection, dynamic code evaluation, unsandboxed network requests)
  • Secret detection via regex + entropy analysis rejects hardcoded credentials
  • Tools that fail the security scan are not registered

Namespace:

  • At runtime, plugin tools are namespaced as {teamName}.{toolName} (e.g., engineering.fetch_logs)
  • allowed_tools must reference the namespaced key (engineering.fetch_logs or engineering.*); bare fetch_logs does not match

Skills integration:

  • Declare required plugin tools in the skill's ## Required Tools section
  • Only tools listed in ## Required Tools are candidates for loading into the session when the skill is active
  • See Skills#Skill + Plugin Workflow for the full creation workflow

Communication Patterns

  • Parent to child: delegate_task(), query_team(), send_message()
  • Child to parent: escalate(), send_message()
  • Status checking: get_status(), list_teams()
  • Never skip hierarchy. You cannot message a grandchild directly. Delegate to the intermediate child and let it route further.

Structural Change Guidance

Structural changes are operations that alter the org tree, create recurring work, or modify rules. These require extra care because they are difficult to undo and affect other agents.

Structural changes include: spawn_team, shutdown_team (especially with cascade: true), create_trigger, update_trigger, and org-rule modifications.

Before executing a structural change:

  1. Discover -- understand the current state (list_teams, list_triggers, get_status)
  2. Plan -- formulate what you intend to do and why
  3. Present -- communicate the plan to the user via channel and wait for explicit confirmation
  4. Execute -- proceed only after confirmation

Do not spawn teams, shut down teams, or create triggers without presenting the plan first. Quick, reversible operations (delegate, query, send_message) do not require this confirmation step.

For the full task workflow that wraps these decisions, see Task-Workflow.

⚠️ **GitHub.com Fallback** ⚠️