Component Diagram - antimetal/system-agent GitHub Wiki

Component Diagram

This page provides visual representations of the Antimetal System Agent architecture, showing how components interact and data flows through the system.

High-Level System Architecture

graph TB
    Platform["Antimetal Platform<br/>(Intake Service)"]

    subgraph SystemAgent["System Agent"]
        subgraph MainController["Main Controller"]
            K8sController["K8s<br/>Controller"]
            IntakeWorker["Intake<br/>Worker"]
            PerformanceManager["Performance<br/>Manager"]
        end

        subgraph ResourceStore["Resource Store (BadgerDB)"]
            Resources["Resources"]
            Relationships["Relationships"]
            EventRouter["Event Router"]
        end

        subgraph CloudProvider["Cloud Provider Abstraction"]
            EKS["EKS"]
            GKE["GKE"]
            AKS["AKS"]
            KIND["KIND"]
        end
    end

    K8sAPI["Kubernetes<br/>API Server"]
    LinuxKernel["Linux Kernel<br/>/proc, /sys, eBPF"]

    IntakeWorker -->|gRPC Stream| Platform
    K8sController --> ResourceStore
    ResourceStore --> IntakeWorker
    PerformanceManager --> ResourceStore

    SystemAgent --> K8sAPI
    SystemAgent --> LinuxKernel

Component Interaction Flow

1. Kubernetes Resource Collection

graph LR
    K8sAPI["Kubernetes<br/>API Server"] --> Informers["Informers<br/>(Watchers)"]
    Informers --> Controllers["Controller<br/>Reconcilers"]
    Controllers --> ResourceStore["Resource<br/>Store"]

    K8sAPI -.-> Resources1["Pods, Nodes<br/>Services<br/>ConfigMaps"]
    Informers -.-> Resources2["Deployments<br/>StatefulSets<br/>DaemonSets"]
    Controllers -.-> Events["Events<br/>Published"]
    ResourceStore -.-> StoredObjects["Stored<br/>Objects"]
Loading

2. Performance Metrics Collection

graph LR
    LinuxKernel["Linux<br/>Kernel"] --> Collectors["Collectors<br/>(14 types)"]
    Collectors --> PerfManager["Performance<br/>Manager"]
    PerfManager --> MetricsStore["Metrics<br/>Store"]

    LinuxKernel -.-> Sources["/proc/stat<br/>/proc/meminfo<br/>/sys/block<br/>eBPF progs"]
    Collectors -.-> Types["CPU, Memory<br/>Network<br/>Disk, NUMA<br/>Execsnoop"]
    PerfManager -.-> Collection["Continuous<br/>& Point<br/>Collectors"]
    MetricsStore -.-> Upload["Batched<br/>Upload to<br/>Platform"]
Loading

3. Data Upload Pipeline

graph LR
    ResourceStore["Resource<br/>Store<br/>Events"] --> EventRouter["Event<br/>Router"]
    EventRouter --> IntakeWorker["Intake<br/>Worker"]
    IntakeWorker --> Platform["Antimetal<br/>Platform"]

    ResourceStore -.-> Operations["Add/Update/<br/>Delete<br/>Operations"]
    EventRouter -.-> Filtering["Type<br/>Filtering<br/>Subscribers"]
    IntakeWorker -.-> Batching["Batch Queue<br/>(time/size)<br/>Retry Logic"]
    Platform -.-> Stream["gRPC Stream<br/>with<br/>Backoff"]
Loading

Component Details

Core Components

Main Controller

  • Role: Orchestrator and lifecycle manager
  • Responsibilities:
    • Parse configuration and CLI flags
    • Initialize all subsystems
    • Handle graceful shutdown
    • Coordinate between components

Kubernetes Controller

  • Role: K8s resource monitoring
  • Pattern: Controller Runtime framework
  • Features: Leader election, concurrent reconciliation
  • Watches: Nodes, Pods, Services, Deployments, StatefulSets, DaemonSets, ReplicaSets, PVs, PVCs, Jobs

Resource Store

  • Role: Central data hub
  • Technology: BadgerDB (in-memory key-value)
  • Components:
    • Resources: Normalized K8s/cloud objects
    • Relationships: RDF triplets for object relations
    • Event Router: Pub/sub for component communication

Intake Worker

  • Role: Data streaming to platform
  • Protocol: gRPC with protobuf
  • Features: Batching, exponential backoff, health monitoring

Performance Manager

  • Role: System metrics collection
  • Architecture: Pluggable collector system
  • Patterns: PointCollector (one-shot) and ContinuousCollector (streaming)

Cloud Provider Abstraction

  • Role: Multi-cloud metadata discovery
  • Interface: Name(), ClusterName(), Region()
  • Implementations: EKS (full), KIND (local), GKE/AKS (planned)

Concurrency Model

Goroutine Organization

graph TB
    Main["Main<br/>Orchestration & lifecycle"]

    Main --> Controllers["Controller<br/>Workers"]
    Main --> IntakeWorker["Intake<br/>Worker"]
    Main --> Collectors["Collectors<br/>(14 types)"]
    Main --> EventRouter["Event<br/>Router"]

    Controllers -.-> ParallelK8s["Parallel K8s<br/>reconciliation"]
    IntakeWorker -.-> DedicatedgRPC["Dedicated gRPC<br/>streaming"]
    Collectors -.-> Independent["Independent<br/>metric collection"]
    EventRouter -.-> FanOut["Fan-out event<br/>distribution"]
Loading

Communication Patterns

  • Channels: Primary inter-component communication
  • Mutexes: Minimal shared state protection
  • Context: Cancellation and deadline propagation
  • WaitGroups: Coordinated shutdown

Extension Points

Adding New Collectors

func init() {
    performance.Register(MetricTypeCustom, NewCustomCollector)
}

Adding Cloud Providers

type CustomProvider struct{}
func (p *CustomProvider) Name() string { return "custom" }
func (p *CustomProvider) ClusterName(ctx context.Context) (string, error) { ... }
func (p *CustomProvider) Region(ctx context.Context) (string, error) { ... }

Deployment Architecture

Container Structure

graph TB
    subgraph KubernetesCluster["Kubernetes Cluster"]
        subgraph AntiemetalNamespace["antimetal-system namespace"]
            subgraph SystemAgentPod["system-agent pod"]
                subgraph Container["Container"]
                    Features["• Distroless base image<br/>• Non-root user (65532)<br/>• TLS for external connections<br/>• Volume mounts:<br/>  /host/proc --> /proc<br/>  /host/sys  --> /sys"]
                end
            end
        end
    end
Loading

RBAC Permissions

graph LR
    ClusterRole["ClusterRole<br/>• get pods<br/>• list nodes<br/>• watch svc"] --> ServiceAccount["ServiceAccount<br/>antimetal-agent"]
    ServiceAccount --> SystemAgent["System<br/>Agent<br/>Process"]
Loading

Next Steps


This diagram reflects the system architecture as documented. For the latest implementation details, see the source code.

⚠️ **GitHub.com Fallback** ⚠️