Memory Info Collector - antimetal/system-agent GitHub Wiki

Memory Info Collector

Overview

The Memory Info Collector is a hardware inventory collector that discovers and reports static memory configuration and NUMA (Non-Uniform Memory Access) topology for the Antimetal System Agent. Unlike the Memory Collector which monitors runtime memory usage, this collector focuses on hardware configuration and physical memory architecture.

What It Monitors

  • Total system memory capacity
  • NUMA node topology and memory distribution
  • CPU-to-NUMA node affinity mapping
  • Hardware memory configuration for capacity planning

Why It's Important

  • Hardware Inventory: Tracks physical memory assets for infrastructure management
  • NUMA-Aware Scheduling: Enables optimal workload placement based on memory locality
  • Capacity Planning: Provides accurate memory capacity information for resource allocation
  • Performance Optimization: Understanding NUMA topology is critical for memory-intensive applications
  • Architecture Discovery: Identifies memory hardware layout for system optimization

Technical Details

Property Value
MetricType MetricTypeMemoryInfo ("memory_info")
Collection Mode One-shot (runs once at startup)
Data Sources /proc/meminfo, /sys/devices/system/node/

Data Sources

The collector reads from kernel-guaranteed data sources that provide standardized formats across all Linux distributions:

  1. Total System Memory

    • Source: /proc/meminfo (MemTotal field only)
    • Format: MemTotal: XXXXX kB
    • Always available on Linux systems
  2. NUMA Topology

    • Source: /sys/devices/system/node/node[0-9]*/
    • Per-node memory: /sys/devices/system/node/nodeX/meminfo
    • CPU affinity: /sys/devices/system/node/nodeX/cpulist
    • Gracefully degrades on non-NUMA systems

Capabilities

SupportsOneShot:    true
SupportsContinuous: false  (runs once at startup)
RequiresRoot:       false
RequiresEBPF:       false
MinKernelVersion:   2.6.0

Collection Mode

This collector runs once at startup using PartialNewOnceContinuousCollector, as hardware configuration is static and doesn't change during runtime.

Collected Metrics

Metric Description Source Example Value
TotalBytes Total system memory in bytes /proc/meminfo (MemTotal) 16777216000 (16GB)
NUMANodes[] Array of NUMA node configurations /sys/devices/system/node/ See below
NUMANode.NodeID NUMA node identifier Directory name parsing 0, 1, 2
NUMANode.TotalBytes Memory capacity of NUMA node nodeX/meminfo 8388608000 (8GB)
NUMANode.CPUs[] CPU cores affiliated with node nodeX/cpulist [0,1,2,3]

NUMA Node CPU List Format

The collector parses various CPU list formats:

  • Individual CPUs: "0,1,2,3"
  • Ranges: "0-3"
  • Mixed: "0-3,8-11,15"

Data Structure

The collector returns a MemoryInfo struct defined in the performance types:

type MemoryInfo struct {
    TotalBytes uint64      // Total memory from /proc/meminfo
    NUMANodes  []NUMANode  // NUMA configuration
}

type NUMANode struct {
    NodeID     int32    // NUMA node identifier
    TotalBytes uint64   // Memory in this node
    CPUs       []int32  // CPU cores in this node
}

Source code: pkg/performance/collectors/memory_info.go

Configuration

The collector requires standard host paths:

config := performance.CollectionConfig{
    HostProcPath: "/proc",  // Or custom path for containers
    HostSysPath:  "/sys",   // Or custom path for containers
}

Container Environments

When running in containers, mount the host filesystems:

volumes:
  - /proc:/host/proc:ro
  - /sys:/host/sys:ro
env:
  - HOST_PROC=/host/proc
  - HOST_SYS=/host/sys

Platform Considerations

Linux Kernel Requirements

  • Minimum Version: 2.6.0 (NUMA support introduced)
  • NUMA Support: Available since kernel 2.6, mature in 3.x+
  • Required Kernel Features:
    • CONFIG_NUMA=y (for NUMA systems)
    • Memory management subsystem (always present)

Hardware Considerations

  1. NUMA Systems:

    • Multi-socket servers typically have NUMA
    • Each CPU socket has local memory
    • Cross-socket memory access incurs latency
  2. UMA Systems (Uniform Memory Access):

    • Single socket systems
    • Collector creates synthetic single-node configuration
    • All CPUs have equal memory access
  3. Virtual Machines:

    • May not expose real NUMA topology
    • Memory reported is VM allocation, not physical
    • NUMA topology may be synthetic or missing

Data Reliability

  • Kernel-Guaranteed: /proc/meminfo, NUMA sysfs structure
  • Hardware-Dependent: Detailed memory specifications (not collected)
  • Always Available: Total memory, basic topology
  • May Be Missing: NUMA info on UMA systems, VMs

Common Issues

Troubleshooting

  1. No NUMA Nodes Found

    • Symptom: Single synthetic node created with all CPUs
    • Cause: UMA system, NUMA disabled in BIOS, or VM
    • Resolution: Expected behavior for non-NUMA systems
  2. Memory Total Mismatch

    • Symptom: Reported memory less than installed
    • Cause: Hardware reservations (firmware, graphics)
    • Resolution: Normal - kernel reports usable memory only
  3. Non-Contiguous Node IDs

    • Symptom: Node IDs like 0, 2, 4 (missing 1, 3)
    • Cause: Hardware configuration or CPU hotplug
    • Resolution: Normal - collector handles gaps correctly
  4. Empty CPU Lists

    • Symptom: NUMA node with no CPUs
    • Cause: Memory-only nodes (rare) or sysfs issues
    • Resolution: Check hardware config, kernel logs

Debug Commands

# Check total memory
cat /proc/meminfo | grep MemTotal

# List NUMA nodes
ls -la /sys/devices/system/node/

# Check NUMA node memory
cat /sys/devices/system/node/node*/meminfo | grep MemTotal

# Check CPU affinity
cat /sys/devices/system/node/node*/cpulist

Examples

Sample Output

Dual-Socket NUMA System

{
  "TotalBytes": 34359738368,  // 32GB total
  "NUMANodes": [
    {
      "NodeID": 0,
      "TotalBytes": 17179869184,  // 16GB
      "CPUs": [0, 1, 2, 3, 4, 5, 6, 7]
    },
    {
      "NodeID": 1,
      "TotalBytes": 17179869184,  // 16GB
      "CPUs": [8, 9, 10, 11, 12, 13, 14, 15]
    }
  ]
}

Single-Socket System (UMA)

{
  "TotalBytes": 8589934592,  // 8GB total
  "NUMANodes": [
    {
      "NodeID": 0,
      "TotalBytes": 8589934592,  // All memory in one node
      "CPUs": [0, 1, 2, 3]
    }
  ]
}

Using the Data

NUMA-Aware Application Deployment

// Find NUMA node with most memory for memory-intensive app
var bestNode *NUMANode
for _, node := range memInfo.NUMANodes {
    if bestNode == nil || node.TotalBytes > bestNode.TotalBytes {
        bestNode = &node
    }
}
// Pin application to bestNode.CPUs

Capacity Planning

// Check if system has enough memory for workload
requiredMemory := 16 * 1024 * 1024 * 1024  // 16GB
if memInfo.TotalBytes < requiredMemory {
    return fmt.Errorf("insufficient memory: have %d, need %d", 
        memInfo.TotalBytes, requiredMemory)
}

Performance Impact

The Memory Info Collector has minimal performance impact as it runs only once at startup:

  • CPU Usage: Negligible (one-time collection)
  • Memory Usage: < 1KB for data structures
  • I/O Operations: One read of /proc/meminfo plus NUMA discovery
  • Execution Time: < 5ms typically (1ms for UMA, 2-5ms for NUMA)
  • Frequency: Once at startup only

Related Collectors

References