NUMA Collector - antimetal/system-agent GitHub Wiki

NUMA Memory Collector

Overview

The NUMA (Non-Uniform Memory Access) Memory Collector continuously monitors memory access patterns and allocation statistics in NUMA-enabled systems. NUMA is a computer memory design where memory access time depends on the memory location relative to the processor. In NUMA systems, processors can access their own local memory faster than non-local memory (typically 2-3x slower for remote access).

This collector is essential for:

  • Performance optimization: Identifying memory locality issues that can significantly impact application performance
  • Resource allocation: Understanding CPU and memory topology for better workload placement
  • Troubleshooting: Detecting cross-node memory access patterns that indicate suboptimal configurations
  • Capacity planning: Monitoring per-node memory usage and allocation patterns in real-time

Unlike other hardware info collectors, NUMA continuously monitors runtime statistics because memory allocation patterns change dynamically during system operation.

Technical Details

Property Value
MetricType MetricTypeNUMA ("numa")
Collection Mode Continuous (periodic)
Data Sources /sys/devices/system/node/, /proc/sys/kernel/

Data Sources

  • /sys/devices/system/node/node*/numastat - Per-node allocation statistics (runtime)
  • /sys/devices/system/node/node*/meminfo - Per-node memory usage (runtime)
  • /sys/devices/system/node/node*/cpulist - CPUs assigned to each node (static)
  • /sys/devices/system/node/node*/distance - Distance matrix between nodes (static)
  • /proc/sys/kernel/numa_balancing - Auto-balancing configuration

Capabilities

SupportsOneShot:    true
SupportsContinuous: true  (runs periodically)
RequiresRoot:       false
RequiresEBPF:       false
MinKernelVersion:   2.6.7

Source Code

Primary implementation: pkg/performance/collectors/numa.go

Collected Metrics

System-Level Metrics

Metric Type Description
Enabled bool Whether NUMA is enabled on this system
NodeCount int Number of NUMA nodes in the system
AutoBalance bool Whether automatic NUMA balancing is enabled

Per-Node Metrics

Metric Type Description
ID int Node ID (0-based)
CPUs []int List of CPU cores assigned to this node
Memory Metrics
MemTotal uint64 Total memory on this node (bytes)
MemFree uint64 Free memory on this node (bytes)
MemUsed uint64 Used memory on this node (bytes)
FilePages uint64 File-backed pages/page cache (bytes)
AnonPages uint64 Anonymous pages/process memory (bytes)
Allocation Counters
NumaHit uint64 Memory successfully allocated on intended node
NumaMiss uint64 Memory allocated here despite preferring different node
NumaForeign uint64 Memory intended for here but allocated elsewhere
InterleaveHit uint64 Interleaved memory successfully allocated here
LocalNode uint64 Memory allocated here while process was running here
OtherNode uint64 Memory allocated here while process was on other node
Topology
Distances []int Distance to other nodes (10=local, 20+=remote)

Key Performance Indicators

  • High NumaHit + Low NumaMiss: Indicates good NUMA locality
  • High NumaForeign: Suggests memory pressure causing cross-node allocations
  • High OtherNode: Processes frequently accessing memory from remote nodes

Data Structure

type NUMAStats struct {
    Enabled     bool
    NodeCount   int
    Nodes       []NUMANodeStats
    AutoBalance bool
}

type NUMANodeStats struct {
    ID            int
    CPUs          []int
    MemTotal      uint64
    MemFree       uint64
    MemUsed       uint64
    FilePages     uint64
    AnonPages     uint64
    NumaHit       uint64
    NumaMiss      uint64
    NumaForeign   uint64
    InterleaveHit uint64
    LocalNode     uint64
    OtherNode     uint64
    Distances     []int
}

Configuration

Enabling the Collector

The NUMA collector runs continuously at the configured interval:

performance:
  enabled: true
  interval: "60s"  # Collection interval
  collectors:
    - numa

For programmatic configuration:

config := performance.CollectionConfig{
    EnabledCollectors: map[performance.MetricType]bool{
        performance.MetricTypeNUMA: true,
    },
    HostProcPath: "/proc",
    HostSysPath:  "/sys",
}

Required Paths

  • HostProcPath: Must be an absolute path (default: /proc)
  • HostSysPath: Must be an absolute path (default: /sys)

Container Environments

When running in containers, mount the host filesystems:

volumes:
  - /proc:/host/proc:ro
  - /sys:/host/sys:ro

Then configure:

config.HostProcPath = "/host/proc"
config.HostSysPath = "/host/sys"

Platform Considerations

Linux Kernel Requirements

  • Minimum kernel version: 2.6.7 (NUMA support in /sys)
  • NUMA hardware must be present
  • NUMA support must be enabled in kernel

Non-NUMA Systems

On systems without NUMA or with only one node:

  • Enabled will be false
  • NodeCount will be 0 or 1
  • Nodes array will be empty

Container Considerations

  • Requires read access to /sys/devices/system/node/
  • NUMA topology is system-wide, not container-specific
  • Container CPU/memory limits don't affect NUMA topology visibility

Common Issues

Issue: Collector reports NUMA disabled

Symptoms: Enabled: false even on NUMA hardware

Possible causes:

  1. Single socket system (not NUMA)
  2. NUMA disabled in BIOS
  3. Kernel compiled without NUMA support
  4. Missing /sys/devices/system/node/ directory

Resolution: Check BIOS settings and kernel configuration

Issue: Missing or incomplete node data

Symptoms: Some nodes have zero values for memory metrics

Possible causes:

  1. Partial /sys mount in container
  2. Insufficient permissions
  3. Memory hot-plug operations

Resolution: Ensure full /sys filesystem is mounted with read permissions

Issue: High NUMA miss rates

Symptoms: High NumaMiss or NumaForeign values

Possible causes:

  1. Poor process/thread affinity
  2. Memory pressure on specific nodes
  3. Suboptimal memory allocation policies

Resolution: Review application NUMA policies and consider enabling auto-balancing

Examples

Sample Output

{
  "Enabled": true,
  "NodeCount": 2,
  "AutoBalance": true,
  "Nodes": [
    {
      "ID": 0,
      "CPUs": [0, 1, 2, 3, 4, 5, 6, 7],
      "MemTotal": 68719476736,
      "MemFree": 12636856320,
      "MemUsed": 56082620416,
      "FilePages": 22829453312,
      "AnonPages": 33244332032,
      "NumaHit": 1234567890,
      "NumaMiss": 12345,
      "NumaForeign": 54321,
      "InterleaveHit": 9876,
      "LocalNode": 1234567000,
      "OtherNode": 890,
      "Distances": [10, 21]
    },
    {
      "ID": 1,
      "CPUs": [8, 9, 10, 11, 12, 13, 14, 15],
      "MemTotal": 68719476736,
      "MemFree": 32954277888,
      "MemUsed": 35765198848,
      "FilePages": 12636856320,
      "AnonPages": 23128342528,
      "NumaHit": 987654321,
      "NumaMiss": 54321,
      "NumaForeign": 12345,
      "InterleaveHit": 5432,
      "LocalNode": 987600000,
      "OtherNode": 54321,
      "Distances": [21, 10]
    }
  ]
}

Performance Impact

Since this collector runs continuously (unlike other hardware info collectors), consider the ongoing resource usage:

  • CPU Usage: Negligible - only reads text files from sysfs
  • Memory Usage: Small - proportional to number of NUMA nodes (typically < 1KB)
  • I/O Operations: Few file reads per collection interval (5-10 files per node)
  • Collection Time: Fast - typically < 1ms for 2-4 node systems
  • Frequency Impact: Default 60s interval has minimal impact; can be adjusted based on monitoring needs

Related Collectors

References