Logging - Z-M-Huang/openhive GitHub Wiki
Logging
Requirement: Verbose structured logging for everything. All operations, at all levels, with full context. Logs are the primary debugging tool for an AI-driven system where behavior is non-deterministic.
Log Levels
| Level | Purpose | Examples |
|---|---|---|
trace |
Fine-grained execution detail | Tool input/output, rule loading, prompt assembly, MCP message payloads |
debug |
Internal system decisions | Scope matching results, queue operations, session lifecycle events |
info |
Operational events | Team spawns/shutdowns, task delegation, escalation, trigger fires |
warn |
Recoverable problems | Task retry, session re-spawn, dedup collision, scope ambiguity |
error |
Failures requiring attention | Task permanently failed, session crash, config validation error, hook failure |
Default level: info. Configurable globally via log_level in /data/config/config.yaml.
What Is Logged (Verbose by Default)
Every operation produces structured log entries:
- Team lifecycle: spawn, idle, shutdown, crash, re-spawn (with session duration, task count)
- Task flow: delegation, queue insertion, dispatch, completion, failure, re-queue (with correlation IDs)
- Escalation: escalation sent, received, resolved, timeout (with full escalation chain)
- Tool calls: tool name, input summary, duration, result summary, audit decisions (via withAudit() wrapper)
- Routing: list_teams queries, delegation target selection
- Triggers: fire, dedup hit, dedup miss, rate limit, dispatch target
- Rule loading: which rules loaded, cascade order, any conflicts detected
- Organization tool operations: tool calls, server connections/disconnections
- Errors: full stack traces, context, affected team/task/agent
- Trust evaluation decisions: TrustGate logs allowed/denied decisions with sender_id, channel_type, and reason to the
trust_audit_logtable
Vault Operation Events
All vault tool calls produce structured log entries via the withAudit() wrapper:
vault_set-- key logged, value scrubbed. If the operation is rejected because the key hasis_secret=1, the rejection is logged with reason ("Secret entries are system-managed").vault_get-- key logged. Secret values (is_secret=1) are scrubbed from audit output; only the key and access metadata appear in logs.vault_list-- prefix (if any) logged. Values foris_secret=1entries are omitted from audit output.vault_delete-- key logged. If the operation is rejected because the key hasis_secret=1, the rejection is logged with reason.
Learning Cycle Events
Learning cycle operations (triggered by the learning-cycle.md system skill) produce structured log entries:
- Learning session start -- logged at
infolevel with team name, trigger source, and topics to explore. - Finding stored -- logged at
infolevel when a new memory entry is created from a learning finding. Includes topic, source URL, confidence level, and memory type (lessonorreference). - Journal updated -- logged at
debuglevel when the learning progression journal is written back to the vault viavault_set("learning:{team}:{subagent}:journal", ...). Includes topics covered count and next focus topic.
Reflection Cycle Events
Reflection cycle operations (triggered by the reflection-cycle system skill) produce structured log entries:
- Reflection session start -- logged at
infolevel with team name and trigger source. - Evidence gathered -- logged at
infolevel when completed tasks are queried for outcome patterns. Includes task count and date range. - Diagnosis identified -- logged at
infolevel when the highest-impact issue is identified. Includes issue description and evidence count. - Change applied -- logged at
infolevel when a skill/rule change is applied via evolution flow. Includes before/after summary. - Change skipped -- logged at
infolevel when no actionable issue is found or cooldown is active. - Reflection complete -- logged at
infolevel with duration, diagnosis summary, and outcome. See Self-Evolution#Self-Reflection.
Stall Detection Events
Stall detection operations (emitted by the task queue stall detector) produce structured log entries:
- Stalled task warning -- logged at
warnlevel when apendingtask exceeds 1 hour. Includes task ID, team name, and age. - Stalled task error -- logged at
errorlevel when apendingorrunningtask exceeds 24 hours. Includes task ID, team name, age, and escalation target. - Stall detection scan clean -- logged at
debuglevel when a scan finds no stalled tasks. See Durability-Recovery#Stall Detection.
Transports (Dual Output)
Logs go to two destinations simultaneously:
- stdout -- Structured JSON via Pino. Default sink. Always active. Enables
docker logs, log aggregation, and piping to external services. - SQLite -- All log entries at
infolevel and above persisted to.run/openhive.dblog_entriestable. Schema:(id, level, message, context, duration_ms, created_at). Context is a JSON string containing structured metadata (team, taskId, tool name, etc.).
The logging configuration schema (loadLogging() in config/loader.ts) supports additional transports and per-team overrides, but these are not yet wired into bootstrap. Currently only the global log_level from config.yaml is used.
Credential Scrubbing (Hard Requirement)
All log output is scrubbed before reaching ANY transport. Secret values are inline API keys resolved by the provider-resolver at spawn time; they are never read from .env files.
SecretStringclass -- wraps all secret values (including inline API keys from provider-resolver);toString()returns[REDACTED]. TheSecretString[]array of active credentials is registered at spawn time as the scrub list. First line of defense.- Team credential scrubbing -- The
withAudit()wrapper (tool-audit.ts) receives team credential values (fromteam_vaultwhereis_secret=1) asrawSecretsand passes them toscrubSecrets(). This ensures team API keys are redacted from tool input/output. Values shorter than 8 characters are excluded to prevent over-redaction (aligned withcredential-scrubber.tsthreshold). Credential extraction is centralized viaextractStringCredentials()fromdomain/credential-utils.ts. - Dynamic credential extraction -- The
withAudit()wrapper inspectstool_input.credentialsat call time and augments the scrub list with any credential values found. This catches credentials passed dynamically (e.g., viaspawn_teamwith new credentials). Extraction is tool-agnostic. - SDK
stderrcallback -- intercepts SDK process output, scrubs all registeredSecretStringvalues before forwarding to Pino. - Inline scrubbing layers --
scrubSecrets()utility applied at multiple points: audit wrappers, handlers, task consumer. Scrubs all registered secret values before any output reaches log sinks. - Scrub-before-truncate --
withAudit()scrubs tool results before truncating to 200 characters, preventing partial secret leakage at the cutoff boundary. - Agent response scrubbing --
handleMessageandTaskConsumerscrub team credential values from SDK responses before returning or storing results.
No secret value ever appears in any log transport.
Interaction Audit Store (Separate from Logging)
In addition to the structured log system above, OpenHive maintains a separate interaction audit store in the channel_interactions SQLite table. This is not a logger transport — it is written directly via interactionStore.log() in channel-handler-factory.ts and task-consumer.ts.
The interaction store records inbound user messages, outbound agent replies, and async task notifications with channel/team attribution. With conversation threading, each record also includes a topic_id column for per-topic history filtering (see Conversation-Threading). It has a 24-hour retention policy enforced by a periodic cleanup interval (cleanOlderThan() in index.ts). Content snippets are capped at 2000 characters — this is not a full transcript store.
These records are queried at session start to inject recent conversation history into the system prompt, giving agents awareness of prior channel activity.