Self Evolution - Z-M-Huang/openhive GitHub Wiki
Agents can propose changes to their own rules, skills, and subagent definitions. This is the mechanism by which the system improves over time without human intervention for every change.
This page covers rule governance and the self-evolution protocol. For the overall rule system, see Rules-Architecture. For skill files, see Skills. For subagent definitions, see Subagents.
Not all rules are equal. Different rule types have different modification policies.
| Rule Type | Who Can Modify | Approval Required |
|---|---|---|
/app/system-rules/*.md |
Nobody (baked into Docker image) | Rebuild image required |
/data/rules/*.md |
Admin only (volume mount) | Admin approval (out-of-band) |
.run/teams/main/org-rules/*.md |
Main agent | Tool guards allow (own org-rules); audited |
.run/teams/{name}/org-rules/*.md |
Team orchestrator | Tool guards allow (own org-rules); audited; affects all descendants |
.run/teams/{name}/team-rules/*.md |
Team orchestrator | Tool guards allow (own team-rules); audited; team-scoped impact |
.run/teams/{name}/skills/*.md |
Team orchestrator | Tool guards allow (own directory); audited |
.run/teams/{name}/subagents/*.md |
Team orchestrator | Tool guards allow (own directory); audited |
Rule governance is enforced by inline tool guards (tool-guards.ts, 229 lines) and audit wrappers (tool-audit.ts, 117 lines). When an agent attempts to write to a rule file, the guard:
- Identifies the file type based on its path (global, org-rule, team-rule, skill, subagent).
- Checks authorization. Is this agent allowed to modify this type of file?
- Validates scope. A team can only modify files in its own directories. It cannot modify a parent's org-rules or a sibling's team-rules.
- Logs the change. Every rule modification is logged with the team name, file path, and a summary of the change.
- Blocks or allows the write based on the above checks.
- A child team cannot modify its parent's org-rules (those cascade down and affect other teams)
- No team can modify global rules (admin-only)
- A team cannot modify another team's rules (scope boundary)
- All rule modifications are audited
- A team can modify its own
team-rules/,skills/, andsubagents/(with logging) - A team can modify its own
org-rules/(with logging -- this affects descendants) - Agents can propose modifications through the self-evolution protocol (see below)
Self-evolution begins at team creation. When a team is spawned, it runs a bootstrap task that creates its initial skills, memory, and configuration. This is the "seed" of the evolution cycle — the team starts with a baseline it creates itself, then refines those skills through the detect-propose-validate-apply-monitor cycle as it encounters real tasks.
| Phase | Action |
|---|---|
| Detect | Subagent encounters a problem or inefficiency during normal work |
| Propose | Subagent drafts a specific change: which file, current content, proposed content, and rationale |
| Escalate | Subagent escalates the proposal to the orchestrator for confirmation (ADR-40) |
| Validate | Orchestrator reviews proposal. Tool guards check authorization and scope (see #Governance Validation Layers) |
| Apply | Orchestrator applies the approved change via the standard Edit tool with audit logging. Subagents do NOT apply changes directly. |
| Monitor | Subagent observes outcomes; proposes revert via escalation if results degrade |
sequenceDiagram
participant SA as Subagent
participant Orch as Orchestrator
SA->>SA: Detect issue in own skill/subagent/plugin file
SA->>SA: Draft proposal (file, current, proposed, rationale)
SA->>Orch: escalate(proposal)
Note over Orch: Reviews proposal against boundaries + team goals
alt Approved
Orch->>Orch: Apply change via Edit tool (governance guards still enforce)
Orch-->>SA: "Change applied"
else Rejected
Orch-->>SA: "Change rejected: [reason]"
end
Subagents can identify issues in their own subagent.md, skills/*.md, and plugins/*.ts files. They propose changes but escalate to the orchestrator for confirmation before any write. The orchestrator applies the change (governance guards still enforce scope/authorization).
1. Detect. The agent notices a problem during normal work. Examples:
- A skill's steps are outdated (a command no longer works)
- A rule conflicts with observed requirements
- A procedure is missing a step that the agent keeps adding manually
2. Propose. The agent formulates a specific change. The proposal includes:
- Which file to modify
- The current content (relevant section)
- The proposed new content
- Why the change is needed
3. Validate. The tool guard checks:
- Is this agent authorized to modify this file type?
- Is the file within this agent's scope?
- Is the change logged for audit?
Org-rules changes (which affect descendants) are allowed for the team's own org-rules directory. Changes to other teams' directories are blocked by assertGovernanceAllowed().
4. Apply. The orchestrator applies the approved change via the standard Edit tool. The withAudit() wrapper logs the modification. Subagents do NOT apply changes directly — they escalate to the orchestrator for confirmation (see #Evolution Flow).
5. Monitor. After applying, the subagent observes whether the change improves outcomes. If results degrade, the subagent can propose a revert via escalation and log what went wrong in its memory.
Skills evolve through the same detect-propose-validate-apply-monitor cycle. An agent using a skill notices a problem (step fails, output is wrong), proposes a fix to the skill file, and the tool guard validates the modification.
Because skills are separate from agent identity, a skill revision benefits all agents that reference it without requiring changes to their subagent definitions.
When revising a skill, search the skill repository for updated or alternative patterns that may have been contributed to the Vercel skills ecosystem since the skill was last modified. See Skill-Repository.
- Agents cannot bypass governance guards. The guards are code-enforced (TypeScript inline in each tool's execute()), not rule-enforced.
- System rules cannot be modified by anyone at runtime (baked into the Docker image). Admin org rules (
/data/rules/) can only be changed by admins via the volume mount. - Self-evolution changes are always logged. The audit trail shows who changed what, when, and why.
- If an agent's proposed change is rejected by the governance guard, the agent receives an error explaining why. It can escalate to its parent if it believes the change is necessary.
The following is an example of the sdk-capabilities.md global rule. Its purpose is to document what the session engine provides out of the box, so agents do not reinvent built-in features or request capabilities they already have.
This rule lives at /app/system-rules/sdk-capabilities.md (baked into image) and is loaded into every agent's systemPrompt.
This rule documents the tools and features available through the AI SDK — built-in tools, subagents, organization tools, file system boundaries, session lifecycle, and communication patterns. It prevents agents from reinventing built-in features. For the actual tool definitions and capabilities, see Organization-Tools, Skills, and SDK-Integration.
Autonomous learning is a scheduled self-improvement mechanism. Where the evolution flow above is reactive (a subagent encounters a problem and proposes a fix), autonomous learning is proactive: a subagent periodically seeks out new knowledge relevant to its domain and integrates validated findings into its skills and rules.
Learning runs at the subagent level (ADR-40). The parent creates and manages learning triggers per subagent — subagents cannot create their own learning triggers. Each trigger specifies the target subagent explicitly for deterministic routing. See Triggers for the trigger configuration.
sequenceDiagram
participant TE as Trigger Engine
participant TQ as Task Queue
participant Orch as Team Orchestrator
participant SA as learner subagent
participant SK as learning-cycle skill
Note over TE: 2:00 AM + jitter
TE->>TQ: delegateTask("ops-team", task, subagent="learner", skill="learning-cycle")
TQ->>Orch: dequeue
Orch->>SA: invoke learner (deterministic routing)
SA->>SK: 6-phase learning cycle
Note over SK: Journal Read → Topic Analysis →<br/>Web Discovery → Validation →<br/>Storage → Journal Update
SK-->>SA: cycle complete
SA-->>Orch: "Learning complete. 2 findings stored."
alt Significant finding
SA->>Orch: escalate("Critical CVE found in loggly SDK")
Orch->>Orch: evaluate significance
end
Main agent has NO learning trigger, NO reflection trigger, NO subagents. It only routes.
Each learning cycle follows a fixed sequence:
| Phase | Purpose |
|---|---|
| Journal Read | Load prior learning state from vault to build on previous cycles |
| Topic Analysis | Identify knowledge gaps from recent tasks and current skills |
| Web Discovery | Search external sources scoped to the subagent's domain |
| Validation | Cross-reference findings against multiple independent sources |
| Storage | Persist validated findings as typed memory entries |
| Journal Update | Record topics explored, findings, deprioritized sources, next priorities |
Duration checkpoints: Elapsed time is checked before each Phase 2 topic iteration and each Phase 3 URL fetch. If max_duration_minutes (default 30) is exceeded, the in-progress operation completes, then the cycle skips to Phase 6 (JOURNAL UPDATE) to persist state and exits gracefully. A session at 29:50 starting a fetch will complete that fetch, write the journal, and then exit — total elapsed time may slightly exceed the budget.
Journal Read loads the subagent's prior learning context so cycles build on each other rather than repeating work. Topic Analysis examines recent task history and existing skills to find gaps — what questions came up that the team could not answer confidently? Web Discovery searches external sources scoped to the subagent's domain. Validation determines confidence through cross-domain corroboration — findings are cross-referenced against sources from different root domains, with near-duplicate content (mirrors, syndication) counted as a single source. Sources on the deprioritized list are skipped. Confidence maps directly to independent source count: 3+ different root domains → high, 2 → medium, 1 → low. Storage persists validated findings as memory entries, with storage type determined by corroboration: 2+ independent root domains qualify a finding as a lesson, while single-source findings are stored as reference only. Journal Update records what was explored, what was found, updates the deprioritized sources list if contradictions were found, prunes expired entries, and sets next priorities.
Learning progression is tracked in the subagent's vault journal, not in memory. Journal keys are namespaced per team and subagent (e.g., learning:{team}:{subagent}:journal) to make keys self-documenting — even though the team_vault table already scopes by team, the explicit team prefix ensures keys are unambiguous when inspecting the store directly. The vault journal is operational state — it records what topics have been explored, what findings were made, and what the next priorities are. This separation keeps memory focused on knowledge (facts, lessons, references) while the journal handles the learning process itself.
The journal enables continuity across sessions. When a new learning cycle starts, it reads the journal to understand where the last cycle left off, avoiding redundant exploration and building incrementally on prior work.
Bootstrap creates active learning-cycle-{subagent} triggers per subagent, with readiness gates checked at runtime (see Architecture-Decisions#ADR-35). The per-subagent naming (e.g., learning-cycle-learner) avoids collision with the trigger name uniqueness constraint (unique per team). Subagents cannot create, enable, disable, or modify their own learning triggers. All trigger management flows through the parent:
-
create_trigger(team, "learning-cycle-learner", ..., subagent="learner", skill="learning-cycle")— creates a learning trigger targeting a specific subagent -
enable_trigger(team, "learning-cycle-learner")— activates the nightly learning cycle -
disable_trigger(team, "learning-cycle-learner")— stops future firings (current session completes)
This ensures:
- Learning schedules are coordinated across the team hierarchy
- Subagents cannot increase their own resource consumption by adjusting learning frequency
- Parents maintain oversight of what their subagents are learning and how often
If a subagent determines that its learning cycle needs adjustment (different schedule, different focus areas), it uses escalate() to request the change from the orchestrator. The orchestrator evaluates the request and updates the trigger configuration if appropriate.
If the team has an active window trigger (ADR-42) and a learning trigger fires while the window is open, the learning cycle is deferred to the next nightly firing. Rationale:
- The
windowtrigger holds the team's on-duty semantics (e.g., market hours). Displacing its tick cadence with a learning cycle would break the "continuous watch" contract. - Learning is a background activity (nightly, off-hours by default); its findings do not depend on real-time window state.
- The deferral is a no-op run: the learning skill checks
trigger_configsfor an openwindowon the same team and exits gracefully with adeferred: window openlog entry. No circuit-breaker increment.
If the team has no active window trigger, learning runs as scheduled. Teams whose watch_window overlaps the default 2 AM learning time should either adjust their watch_window or accept that learning will only run on nights when the window is closed.
When a learning-cycle-{subagent} trigger fires, the skill checks that all 6 tools in the required tool bundle are present in the team's allowed_tools before executing:
web_fetchvault_setvault_getmemory_savememory_searchmemory_list
If any tool is missing, the skill logs a warning naming the missing tool(s) and exits without error. The trigger remains enabled — tools may be added to the team later, and the next nightly firing will re-check. This runtime gate means triggers can be created at bootstrap regardless of whether the team has the required tools yet.
Self-reflection is a scheduled introspective mechanism. Where learning looks outward (external knowledge), reflection looks inward: a subagent reviews its own task outcomes to identify and fix systematic inefficiencies. Reflection runs at the subagent level (ADR-40). See Architecture-Decisions#ADR-37.
| Phase | Purpose |
|---|---|
| Journal Read | Load reflection journal from vault |
| Evidence Gather | Query completed tasks via list_completed_tasks for outcome patterns |
| Diagnose | Identify the single highest-impact issue |
| Propose | Draft one skill or rule change (before/after) |
| Apply | Escalate proposal to orchestrator for confirmation, then apply via the evolution flow with governance enforcement |
| Journal Update | Record diagnosis, proposal, outcome, and next focus |
One change per cycle. At most one modification per session. Changes target accuracy or efficiency only. Cooldown: after applying a change, no further reflection-originated changes until the next cycle.
Duration budget: Max 15 minutes. Duration checkpoint matches learning — in-progress ops complete, then skip to JOURNAL UPDATE.
vault_get, vault_set (journal), memory_save, memory_search, memory_list (context), list_completed_tasks (evidence). If any tool missing: log warning, exit. Trigger remains active per ADR-35 gate pattern. See Triggers#Reflection Trigger.