Escalation - Azure/az-prototype GitHub Wiki

Escalation

Overview

The escalation system implements a 4-level chain for handling blocked tasks during prototype generation and deployment. When an agent encounters a blocker it cannot resolve, the system tracks the issue, attempts automated resolution through progressively broader strategies, and ultimately flags the issue for human intervention if all automated approaches fail.

A core principle: do NOT proceed with workarounds without human approval. The system documents blockers and attempted solutions at every level, giving humans full context when intervention is required.

Escalation Levels

Level 1: Documented

The blocker is recorded with initial context. The EscalationEntry captures:

  • Task description
  • Blocker details
  • Source agent and stage
  • Timestamp

At this level, the issue is simply logged. The agent that encountered the blocker continues to attempt resolution within its own capabilities.

Level 2: Agent Escalation

The blocker is escalated to a specialized agent based on the nature of the issue:

  • Technical blockers route to cloud-architect -- architecture conflicts, service incompatibilities, infrastructure issues
  • Scope blockers route to project-manager -- requirement ambiguities, prioritization conflicts, stakeholder concerns

The system automatically classifies the blocker by checking for scope-related keywords (scope, requirement, backlog, story, feature, stakeholder, priority, sprint). If any match, the blocker goes to the project-manager; otherwise it goes to the cloud-architect.

The escalation agent receives the full context: task description, blocker details, and all previously attempted solutions.

Level 3: Web Search

When agent escalation does not resolve the issue, the system expands to external documentation search. It queries Microsoft Learn and web sources using the blocker description combined with the Azure service context.

This uses the same knowledge system web search infrastructure (search_and_fetch()) that agents use during normal operation, but targeted specifically at the blocker.

Level 4: Human Intervention

If all automated approaches fail, the system flags the blocker for manual resolution. A prominent message is displayed:

*** HUMAN INTERVENTION REQUIRED ***
Task: <task description>
Blocker: <blocker details>
Source: <agent> (<stage>)
Attempted solutions:
  - <solution 1>
  - <solution 2>
Please resolve this blocker manually and resume.

No further automated escalation occurs. The session waits for the user to address the issue.

Auto-Escalation

The should_auto_escalate() method checks whether a blocker should be automatically escalated based on elapsed time. The default timeout is 120 seconds from the last escalation timestamp.

Conditions for auto-escalation:

  • The blocker is not yet resolved
  • The current escalation level is below 4 (human)
  • At least 120 seconds have elapsed since the last escalation

This prevents blockers from stalling a session indefinitely without progressing through the escalation chain.

Persistence

Escalation state persists to .prototype/state/escalation.yaml so blockers survive session restarts. The YAML file contains all escalation entries with their full history:

entries:
  - task_description: "Deploy Azure Cosmos DB"
    blocker: "Subscription quota exceeded for Cosmos DB accounts"
    attempted_solutions:
      - "Checked alternative regions"
      - "Verified subscription limits via az account show"
    escalation_level: 3
    source_agent: "terraform-agent"
    source_stage: "deploy"
    created_at: "2026-03-10T14:30:00+00:00"
    last_escalated_at: "2026-03-10T14:35:00+00:00"
    resolved: false
    resolution: ""

The tracker loads existing state on session start if the file exists, and saves after every state change (new blocker, attempted solution, escalation, resolution).

Integration

The EscalationTracker is created in the __init__ of three session types:

  • Build session (stages/build_session.py) -- handles blockers during code generation and QA remediation
  • Deploy session (stages/deploy_session.py) -- handles deployment failures and preflight issues
  • Backlog session (stages/backlog_session.py) -- handles blockers during backlog generation and push

The tracker integrates with QA-first error routing: when the QA agent cannot diagnose an issue and an escalation tracker is provided, it records the blocker automatically.

Escalation Report

The format_escalation_report() method produces a human-readable summary of all tracked blockers:

  • Active blockers: listed with their current escalation level label (Documented, Agent, Web Search, Human), task description, blocker details, and number of attempted solutions
  • Resolved blockers: listed with their task description and resolution

Related

⚠️ **GitHub.com Fallback** ⚠️