Monitoring and Observability - AGI-Corporation/frontier-os-app-builder GitHub Wiki

Monitoring and Observability

Navigation: [Home]] ](/AGI-Corporation/frontier-os-app-builder/wiki/[Architecture) | [Deployment and Infrastructure]]

Overview

Frontier OS provides built-in tools for monitoring the health of agents, pipelines, and the underlying node infrastructure. This ensures that users and developers can track performance and debug failures in real-time.

Agent Telemetry

Agents emit telemetry data through the sandbox runtime.

  • Resource Usage: Tracking CPU, Memory, and Network I/O per task.
  • Execution Logs: Standard output and error streams from the Agent Runtime and Sandbox.
  • Custom Metrics: Agents can define business-level metrics (e.g., "Genes Processed") via the SDK.

Pipeline Health

Pipeline owners can monitor their logic through the Pipeline Registry dashboard:

  • Success Rate: Percentage of tasks completed without errors.
  • Latency: Average execution time per task.
  • Throughput: Number of active dispatches in the last 24 hours.
  • Error Breakdown: Categorization of failures (e.g., Timeout, Out of Memory, Logic Error).

Node Monitoring

Node operators can use the integrated Prometheus exporter:

  • Queue Depth: Number of pending tasks in the local buffer.
  • Sandbox Health: Number of active virtual machines.
  • FND Earnings: Real-time tracking of task fees earned by the node.

Logging and Tracing

We use a structured logging format for all platform events.

  • Task ID: Correlates events across the sandbox, gateway, and registry.
  • Span Tracing: Visualizes the path of a task from dispatch to result.

Example Log Entry

{
  "timestamp": "2026-05-20T14:30:05Z",
  "level": "INFO",
  "taskId": "task-8892-x",
  "component": "sandbox-exec",
  "message": "Agent completed successfully",
  "durationMs": 450
}

Related Pages