Hardware Discovery - antimetal/system-agent GitHub Wiki

Hardware Discovery

COMPLETE: Hardware discovery is fully implemented and operational. The system collects comprehensive hardware information and builds a complete graph representation with all relationship types.

Implementation Status:

  • ✅ Hardware graph builder
  • ✅ Protobuf data models
  • ✅ Resource store integration
  • ✅ Performance collector integration
  • ✅ Actual hardware data collection
  • ✅ All relationship types (Contains, SharesSocket, NUMAAffinity, NUMADistance, ConnectedTo)

Overview

The Hardware Graph feature adds hardware configuration discovery and graph representation to the Antimetal Agent, enabling physical and virtual hardware resources to be represented as nodes and relationships in "The Graph" alongside Kubernetes and cloud resources.

Architecture

Components

┌─────────────────────────────────────────────────────────────┐
│                     Performance Collectors                  │
│  (CPUInfo, MemoryInfo, DiskInfo, NetworkInfo)               │
└──────────────────────┬──────────────────────────────────────┘
                       │ Collect hardware data
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    Hardware Manager                         │
│  - Periodic collection orchestration                        │
│  - Snapshot aggregation                                     │
└──────────────────────┬──────────────────────────────────────┘
                       │ Hardware snapshot
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                Hardware Graph Builder                       │
│  - Converts collector data to graph nodes                   │
│  - Creates RDF triplet relationships                        │
└──────────────────────┬──────────────────────────────────────┘
                       │ Resources & Relationships
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    Resource Store                           │
│   (BadgerDB - stores nodes and relationships)               │
└─────────────────────────────────────────────────────────────┘

Data Flow

  1. Performance collectors read from /proc and /sys filesystems
  2. Hardware Manager orchestrates periodic collection (default: 5 minutes)
  3. Graph Builder transforms raw data into graph nodes and relationships
  4. Resource Store persists the hardware graph using RDF triplets

Hardware Ontology

Complete Hardware Graph Diagram

graph TB
    %% Node Style Definitions
    classDef systemNode fill:#e1f5fe,stroke:#01579b,stroke-width:3px
    classDef cpuNode fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef memoryNode fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
    classDef storageNode fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef networkNode fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    classDef numaNode fill:#f1f8e9,stroke:#33691e,stroke-width:2px
    
    %% System Root
    SYS["SystemNode<br/>📟 hostname<br/>🏗️ x86_64<br/>⏰ boot_time<br/>🐧 Linux 6.8"]
    
    %% CPU Topology
    PKG0["CPUPackageNode<br/>🔧 socket-0<br/>⚡ Intel Xeon<br/>💾 36MB Cache<br/>🧮 8C/16T"]
    PKG1["CPUPackageNode<br/>🔧 socket-1<br/>⚡ Intel Xeon<br/>💾 36MB Cache<br/>🧮 8C/16T"]
    
    CORE0["CPUCoreNode<br/>🎯 core-0<br/>📊 3.2GHz"]
    CORE1["CPUCoreNode<br/>🎯 core-1<br/>📊 3.2GHz"]
    CORE8["CPUCoreNode<br/>🎯 core-8<br/>📊 3.2GHz"]
    CORE9["CPUCoreNode<br/>🎯 core-9<br/>📊 3.2GHz"]
    
    %% Memory Topology
    MEM["MemoryModuleNode<br/>💾 64GB Total<br/>🧠 NUMA Enabled<br/>⚖️ Balancing: Yes"]
    
    NUMA0["NUMANode<br/>🏷️ node-0<br/>💾 32GB<br/>🧮 CPUs: 0-7<br/>📏 [10,20]"]
    NUMA1["NUMANode<br/>🏷️ node-1<br/>💾 32GB<br/>🧮 CPUs: 8-15<br/>📏 [20,10]"]
    
    %% Storage Topology
    NVME["DiskDeviceNode<br/>💿 nvme0n1<br/>📦 1TB Samsung<br/>⚡ SSD (NVMe)<br/>🎯 4KB blocks"]
    SATA["DiskDeviceNode<br/>💿 sda<br/>📦 4TB Seagate<br/>🔄 HDD (SATA)<br/>🎯 512B blocks"]
    
    NVME_P1["DiskPartitionNode<br/>📁 nvme0n1p1<br/>📊 100GB<br/>🎯 sector 2048"]
    NVME_P2["DiskPartitionNode<br/>📁 nvme0n1p2<br/>📊 900GB<br/>🎯 sector 204800"]
    SATA_P1["DiskPartitionNode<br/>📁 sda1<br/>📊 4TB<br/>🎯 sector 2048"]
    
    %% Network Topology
    ETH0["NetworkInterfaceNode<br/>🌐 eth0<br/>🔗 10Gbps<br/>📶 Full Duplex<br/>🚀 ena driver"]
    ETH1["NetworkInterfaceNode<br/>🌐 eth1<br/>🔗 10Gbps<br/>📶 Full Duplex<br/>🚀 ena driver"]
    
    %% Containment Relationships (Contains)
    SYS -->|"Contains<br/>(physical)"| PKG0
    SYS -->|"Contains<br/>(physical)"| PKG1
    SYS -->|"Contains<br/>(physical)"| MEM
    SYS -->|"Contains<br/>(logical)"| NUMA0
    SYS -->|"Contains<br/>(logical)"| NUMA1
    SYS -->|"Contains<br/>(physical)"| NVME
    SYS -->|"Contains<br/>(physical)"| SATA
    SYS -->|"Contains<br/>(physical)"| ETH0
    SYS -->|"Contains<br/>(physical)"| ETH1
    
    PKG0 -->|"Contains<br/>(physical)"| CORE0
    PKG0 -->|"Contains<br/>(physical)"| CORE1
    PKG1 -->|"Contains<br/>(physical)"| CORE8
    PKG1 -->|"Contains<br/>(physical)"| CORE9
    
    NVME -->|"Contains<br/>(partition)"| NVME_P1
    NVME -->|"Contains<br/>(partition)"| NVME_P2
    SATA -->|"Contains<br/>(partition)"| SATA_P1
    
    %% NUMA Affinity Relationships
    MEM -.->|"NUMAAffinity<br/>node-0"| NUMA0
    MEM -.->|"NUMAAffinity<br/>node-1"| NUMA1
    
    %% Socket Sharing Relationships (CPU cores on same socket)
    CORE0 <-.->|"SharesSocket<br/>socket-0"| CORE1
    CORE8 <-.->|"SharesSocket<br/>socket-1"| CORE9
    
    %% NUMA Distance Relationships (between NUMA nodes)
    NUMA0 <-.->|"NUMADistance<br/>local: 10<br/>remote: 20"| NUMA1
    
    %% Bus Connection Relationships
    NVME -.->|"ConnectedTo<br/>NVMe bus"| SYS
    SATA -.->|"ConnectedTo<br/>SATA bus"| SYS
    ETH0 -.->|"ConnectedTo<br/>PCI bus"| SYS
    ETH1 -.->|"ConnectedTo<br/>PCI bus"| SYS
    
    %% Apply Styles
    class SYS systemNode
    class PKG0,PKG1,CORE0,CORE1,CORE8,CORE9 cpuNode
    class MEM,NUMA0,NUMA1 memoryNode
    class NVME,SATA,NVME_P1,NVME_P2,SATA_P1 storageNode
    class ETH0,ETH1 networkNode
Loading

Relationship Types in the Diagram

Relationship Type Visual Style Description Examples
Contains Solid arrow Hierarchical containment relationships System → CPU Package → CPU Core
Disk Device → Partition
NUMAAffinity Dotted arrow Memory/CPU affinity to NUMA nodes Memory Module → NUMA Node
SharesSocket Bidirectional dotted CPU cores sharing physical sockets Core-0 ↔ Core-1 (same socket)
NUMADistance Bidirectional dotted Distance metrics between NUMA nodes NUMA-0 ↔ NUMA-1 (distance: 20)
ConnectedTo Dotted arrow Hardware bus connections Disk → System (via NVMe bus)
Network → System (via PCI bus)

Node Types

SystemNode

Root node representing the physical or virtual machine.

Properties:

  • hostname: System hostname
  • architecture: CPU architecture (x86_64, arm64)
  • boot_time: System boot timestamp
  • kernel_version: Kernel version string
  • os_info: Operating system information

CPUPackageNode

Represents a physical CPU socket/package.

Properties:

  • socket_id: Physical package ID
  • vendor_id: CPU vendor (GenuineIntel, AuthenticAMD)
  • model_name: Full CPU model name
  • cpu_family: CPU family number
  • model: Model number
  • stepping: Stepping revision
  • microcode: Microcode version
  • cache_size: Cache size string
  • physical_cores: Number of physical cores
  • logical_cores: Number of logical cores (with hyperthreading)

CPUCoreNode

Individual CPU core within a package.

Properties:

  • processor_id: Logical CPU number
  • core_id: Physical core ID
  • physical_id: Parent package ID
  • frequency_mhz: Current frequency
  • siblings: Number of sibling threads

MemoryModuleNode

System memory configuration.

Properties:

  • total_bytes: Total system memory
  • numa_enabled: NUMA support status
  • numa_balancing_available: NUMA balancing availability
  • numa_node_count: Number of NUMA nodes

NUMANode

NUMA memory node for systems with non-uniform memory access.

Properties:

  • node_id: NUMA node identifier
  • total_bytes: Memory in this NUMA node
  • cpus: CPU cores assigned to this node
  • distances: Distance metrics to other nodes

DiskDeviceNode

Physical storage device.

Properties:

  • device: Device name (sda, nvme0n1)
  • model: Model identifier
  • vendor: Manufacturer
  • size_bytes: Total capacity
  • rotational: HDD (true) or SSD (false)
  • block_size: Logical block size
  • physical_block_size: Physical block size
  • scheduler: I/O scheduler
  • queue_depth: Queue depth

DiskPartitionNode

Disk partition on a storage device.

Properties:

  • name: Partition name (sda1, nvme0n1p1)
  • parent_device: Parent disk device
  • size_bytes: Partition size
  • start_sector: Starting sector

NetworkInterfaceNode

Network adapter/interface.

Properties:

  • interface: Interface name (eth0, wlan0)
  • mac_address: Hardware MAC address
  • speed: Link speed in Mbps
  • duplex: Duplex mode (full/half)
  • mtu: Maximum transmission unit
  • driver: Driver name
  • type: Interface type (ethernet, wireless, loopback)
  • oper_state: Operational state
  • carrier: Carrier detection status

Relationship Types

ContainsPredicate

Hierarchical containment relationship.

Properties:

  • type: Containment type (physical, logical, partition)

Usage:

  • System → CPU Package (physical)
  • CPU Package → CPU Core (physical)
  • System → Memory Module (physical)
  • System → Disk Device (physical)
  • Disk Device → Partition (partition)
  • System → Network Interface (physical)

NUMAAffinityPredicate

NUMA node affinity relationships.

Properties:

  • node_id: NUMA node identifier
  • distance: Distance metric (optional)

Usage:

  • Memory Module → NUMA Node
  • CPU Core → NUMA Node

SocketSharingPredicate

CPU cores sharing a physical socket.

Properties:

  • physical_id: Physical package ID
  • socket_id: Socket identifier

Usage:

  • CPU Core ↔ CPU Core (same socket)

BusConnectionPredicate

Hardware bus connections (future use).

Properties:

  • bus_type: Bus type (pci, usb, sata, nvme)
  • bus_address: Bus address (optional)

Example Graph Structure

SystemNode (node-01.example.com)
├── [Contains:physical] → CPUPackageNode (socket-0)
│   ├── [Contains:physical] → CPUCoreNode (core-0)
│   ├── [Contains:physical] → CPUCoreNode (core-1)
│   ├── [Contains:physical] → CPUCoreNode (core-2)
│   └── [Contains:physical] → CPUCoreNode (core-3)
├── [Contains:physical] → CPUPackageNode (socket-1)
│   ├── [Contains:physical] → CPUCoreNode (core-4)
│   ├── [Contains:physical] → CPUCoreNode (core-5)
│   ├── [Contains:physical] → CPUCoreNode (core-6)
│   └── [Contains:physical] → CPUCoreNode (core-7)
├── [Contains:physical] → MemoryModuleNode (64GB)
│   ├── [NUMAAffinity:node-0] → NUMANode (node-0, 32GB)
│   └── [NUMAAffinity:node-1] → NUMANode (node-1, 32GB)
├── [Contains:logical] → NUMANode (node-0)
├── [Contains:logical] → NUMANode (node-1)
├── [Contains:physical] → DiskDeviceNode (nvme0n1, 1TB)
│   ├── [Contains:partition] → DiskPartitionNode (nvme0n1p1, 100GB)
│   └── [Contains:partition] → DiskPartitionNode (nvme0n1p2, 900GB)
├── [Contains:physical] → DiskDeviceNode (sda, 4TB)
│   └── [Contains:partition] → DiskPartitionNode (sda1, 4TB)
├── [Contains:physical] → NetworkInterfaceNode (eth0, 10Gbps)
└── [Contains:physical] → NetworkInterfaceNode (eth1, 10Gbps)

Performance Considerations

Collection Overhead

  • Hardware discovery reads from /proc and /sys filesystems
  • Typical collection time: <100ms on modern systems
  • Update interval configurable (default: 5 minutes)

Storage Impact

  • Each hardware node: ~200-500 bytes
  • Typical system: 50-200 nodes total
  • Total storage: <100KB per system

Scalable Storage Architecture for Million-Host Deployments

The Deduplication Opportunity

At scale, hardware configurations are highly repetitive. Analysis of large fleets shows:

  • ~100 unique CPU models across millions of servers
  • ~20 common memory configurations (16GB, 32GB, 64GB, 128GB, etc.)
  • ~50 unique disk models
  • Result: ~1,000 unique hardware profiles serve 99% of hosts

Proposed Approach

Instead of storing complete hardware graphs for each host, we use a profile catalog pattern:

  1. Hardware profiles are deduplicated - Each unique hardware configuration is stored once
  2. Hosts reference profiles - Each host points to its hardware profile ID
  3. Profile hashing - Agents compute a hash of their hardware locally for fast deduplication
  4. Differential storage - Only host-specific data (hostname, serial numbers) stored per-host

Storage Impact

For 1 million hosts:

  • Without deduplication: 100KB × 1M = 100GB
  • With profile catalog:
    • Unique profiles: 1,000 × 100KB = 100MB
    • Host mappings: 1M × 100 bytes = 100MB
    • Total: 200MB (500x reduction)

The approach scales with the number of unique hardware configurations, not the number of hosts, making it ideal for large standardized fleets.

Memory Usage

  • Snapshot data held temporarily during collection
  • Graph builder processes incrementally
  • No persistent memory cache required

Future Enhancements

Cross-Linking with Runtime and Kubernetes

Link hardware nodes to runtime and Kubernetes nodes:

K8s Node → [RunsOn] → SystemNode
K8s Pod → [ScheduledOn] → CPUCoreNode
ContainerNode → [RunsOn] → CPUCoreNode (via cpuset.cpus)
ContainerNode → [AllocatedTo] → NUMANode (via cpuset.mems)

See Runtime Discovery for complete container and process topology integration.

Extended Hardware Support

  • GPU devices and topology
  • InfiniBand/RDMA adapters
  • Hardware accelerators (TPU, FPGA)
  • Power management states
  • Thermal sensors

Performance Metrics Integration

  • Attach real-time metrics to hardware nodes
  • CPU utilization per core
  • Memory bandwidth per NUMA node
  • Disk I/O per device
  • Network throughput per interface

Advanced Relationships

  • PCIe bus topology
  • Memory channel configuration
  • CPU cache hierarchy
  • Interrupt affinity

References


This document was migrated from the repository docs. Last updated: 2025-01-19

⚠️ **GitHub.com Fallback** ⚠️