Memory Collector - antimetal/system-agent GitHub Wiki

Memory Collector

Overview

The Memory Collector is a runtime memory statistics collector that monitors system memory usage and performance by reading data from /proc/meminfo. It provides real-time insights into memory allocation, usage patterns, and system memory health, making it essential for operational monitoring, alerting, and performance analysis.

Why It's Important

  • Memory Pressure Monitoring: Detect when systems are running low on available memory
  • Performance Analysis: Track memory usage patterns and identify inefficiencies
  • Leak Detection: Monitor for gradual memory consumption that indicates memory leaks
  • Capacity Planning: Understand memory utilization trends for resource allocation
  • Swap Usage Tracking: Monitor swap activity which can indicate memory pressure
  • Kernel Memory Monitoring: Track kernel-specific memory usage (slab, page tables, etc.)

Technical Details

Property Value
MetricType MetricTypeMemory
Data Source /proc/meminfo
Supported Modes One-shot collection only
Requires Root No
Requires eBPF No
Min Kernel Version 2.6.0
Collection Interval Configurable (when wrapped as ContinuousCollector)

Key Differences from MemoryInfoCollector

  • Memory Collector: Provides runtime statistics (dynamic, changes constantly)
  • MemoryInfo Collector: Provides hardware configuration (static NUMA topology)
  • Use Case: Memory Collector is for monitoring; MemoryInfo Collector is for inventory

Collected Metrics

The collector gathers 30 memory statistics fields from /proc/meminfo. All values are converted from kilobytes (as reported by the kernel) to bytes for consistency.

Metric Description Unit
MemTotal Total usable RAM Bytes
MemFree Free memory currently available Bytes
MemAvailable Memory available for starting new applications without swapping Bytes
Buffers Memory used by kernel buffers Bytes
Cached Memory used by page cache (excluding SwapCached) Bytes
SwapCached Memory that was swapped out and is now back in RAM Bytes
Active Memory that has been used recently and usually not reclaimed Bytes
Inactive Memory that hasn't been used recently (candidate for eviction) Bytes
SwapTotal Total swap space available Bytes
SwapFree Unused swap space Bytes
Dirty Memory waiting to be written back to disk Bytes
Writeback Memory actively being written back to disk Bytes
AnonPages Non-file backed pages mapped into userspace Bytes
Mapped Files which have been mapped into memory (mmap) Bytes
Shmem Total shared memory (tmpfs, shared anonymous mappings) Bytes
Slab Total slab allocator memory Bytes
SReclaimable Reclaimable slab memory (caches) Bytes
SUnreclaim Unreclaimable slab memory Bytes
KernelStack Memory used by kernel stacks Bytes
PageTables Memory used by page tables Bytes
CommitLimit Total amount of memory that can be allocated Bytes
CommittedAS Total committed memory (may exceed physical memory) Bytes
VmallocTotal Total size of vmalloc virtual address space Bytes
VmallocUsed Used vmalloc area Bytes
HugePages_Total Total huge pages memory (count × page size) Bytes
HugePages_Free Free huge pages memory (count × page size) Bytes
HugePages_Rsvd Reserved huge pages memory (count × page size) Bytes
HugePages_Surp Surplus huge pages memory (count × page size) Bytes
HugePagesize Default huge page size Bytes
Hugetlb Total memory consumed by huge pages of all sizes Bytes

Special Handling for HugePages

HugePages fields require special conversion:

  • HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp are page counts in /proc/meminfo
  • These counts are multiplied by Hugepagesize to calculate total bytes
  • Hugepagesize itself is converted from kB to bytes
  • Hugetlb is already a memory amount in kB and is simply converted to bytes

Data Structure

The collector returns a performance.MemoryStats struct. See the implementation at:

Configuration

The Memory Collector is configured through the performance.CollectionConfig:

config := performance.CollectionConfig{
    HostProcPath: "/proc",  // Path to proc filesystem (required)
    HostSysPath:  "/sys",   // Path to sys filesystem (not used by this collector)
}

collector, err := collectors.NewMemoryCollector(logger, config)

Container Environments

When running in containers, you may need to mount the host's proc filesystem:

volumes:
  - name: host-proc
    hostPath:
      path: /proc
      type: Directory
volumeMounts:
  - name: host-proc
    mountPath: /host/proc
    readOnly: true

Then configure the collector with:

config.HostProcPath = "/host/proc"

Platform Considerations

Linux Kernel Requirements

  • Minimum Version: 2.6.0 (when /proc/meminfo was standardized)
  • MemAvailable: Added in kernel 3.14; will be 0 on older kernels
  • HugePages: Available since 2.6.16
  • Transparent Huge Pages: Additional fields may appear in newer kernels

Container Considerations

  1. Memory Limits: Container memory limits don't affect /proc/meminfo values, which show host memory
  2. cgroups: For container-specific memory limits, use cgroup memory statistics instead
  3. Permissions: No special permissions required; /proc/meminfo is world-readable

Virtual Environments

  • Values reflect the VM's view of memory, not the hypervisor's
  • Memory ballooning may cause MemTotal to change during runtime

Common Issues

Issue: All Values Are Zero

Symptom: Collector returns all zeros for memory statistics

Causes:

  • Incorrect HostProcPath configuration
  • /proc/meminfo file is missing or empty
  • Permissions issue (rare, as file is typically world-readable)

Resolution:

# Verify file exists and has content
cat /proc/meminfo

# Check collector configuration
# Ensure HostProcPath points to correct location

Issue: Missing MemAvailable

Symptom: MemAvailable is always 0

Cause: Kernel version older than 3.14

Resolution: Upgrade kernel or calculate available memory manually:

MemAvailable ≈ MemFree + Buffers + Cached

Issue: HugePages Values Incorrect

Symptom: HugePages values seem too large or incorrect

Cause: Misunderstanding of unit conversion

Resolution: Remember that HugePages fields are converted:

  • Raw value in /proc/meminfo: Page count
  • Collector output: Page count × Hugepagesize (in bytes)

Issue: Container Shows Host Memory

Symptom: Container reports host's total memory, not container limit

Cause: /proc/meminfo always shows host values

Resolution: Use cgroup memory statistics for container-specific limits:

cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.limit_in_bytes

Examples

Sample Output

{
  "MemTotal": 17179869184,      // 16 GB
  "MemFree": 2147483648,         // 2 GB
  "MemAvailable": 8589934592,    // 8 GB
  "Buffers": 536870912,          // 512 MB
  "Cached": 4294967296,          // 4 GB
  "SwapCached": 268435456,       // 256 MB
  "Active": 6442450944,          // 6 GB
  "Inactive": 4294967296,        // 4 GB
  "SwapTotal": 8589934592,       // 8 GB
  "SwapFree": 6442450944,        // 6 GB
  "Dirty": 33554432,             // 32 MB
  "Writeback": 0,                // 0
  "AnonPages": 3221225472,       // 3 GB
  "Mapped": 1073741824,          // 1 GB
  "Shmem": 268435456,            // 256 MB
  "Slab": 536870912,             // 512 MB
  "SReclaimable": 268435456,     // 256 MB
  "SUnreclaim": 268435456,       // 256 MB
  "KernelStack": 33554432,       // 32 MB
  "PageTables": 67108864,        // 64 MB
  "CommitLimit": 17179869184,    // 16 GB
  "CommittedAS": 10737418240,    // 10 GB
  "VmallocTotal": 35184372087808,// ~32 TB (virtual)
  "VmallocUsed": 1073741824,     // 1 GB
  "HugePages_Total": 0,
  "HugePages_Free": 0,
  "HugePages_Rsvd": 0,
  "HugePages_Surp": 0,
  "HugePagesize": 2097152,       // 2 MB
  "Hugetlb": 0
}

Common Use Cases

  1. Memory Leak Detection: Monitor gradual increase in anonymous pages over time

  2. Cache Efficiency: Analyze cache hit ratios and buffer utilization

  3. Swap Monitoring: Track swap usage patterns and thrashing indicators

  4. Memory Pressure: Identify low-memory conditions before OOM events

  5. Swap Thrashing Detection:

    # High swap activity
    rate(swap_cached[5m]) > threshold
    

Performance Impact

The Memory Collector has minimal performance impact:

  • CPU Usage: Negligible (simple file read and parse)
  • Memory Usage: ~10 KB for temporary buffers
  • I/O Operations: One read of /proc/meminfo (~2-3 KB)
  • Execution Time: <1ms typically
  • Kernel Overhead: None (reading pre-computed values)

Collection Frequency Recommendations

  • Normal Monitoring: Every 10-30 seconds
  • High-Resolution Monitoring: Every 1-5 seconds
  • Resource-Constrained Systems: Every 60 seconds

Related Collectors

Memory Info Collector

  • Purpose: Hardware inventory and NUMA topology
  • Link: Memory-Info-Collector
  • Relationship: Provides static memory configuration vs runtime statistics

Process Memory Collector

  • Purpose: Per-process memory usage
  • Link: Process-Collector
  • Relationship: Process-level detail vs system-wide view

cgroup Memory Collector

  • Purpose: Container/cgroup memory statistics
  • Link: Cgroup-Collector
  • Relationship: Container-specific vs host-wide memory

VMStat Collector

  • Purpose: Virtual memory statistics and paging activity
  • Link: VMStat-Collector
  • Relationship: Complementary VM subsystem metrics

References