Testing Methodology - antimetal/system-agent GitHub Wiki

Testing Methodology

⚠️ DRAFT/WIP: Documentation for in-development feature on mem_monitor branch

← Back to Memory Monitoring

Overview

This page describes the testing methodology for validating the multi-detector memory growth monitoring system. The test suite includes synthetic leak generators, false positive tests, and validation procedures.

Test Environment Setup

Prerequisites

# Verify kernel support
uname -r  # Need 5.8+
ls /sys/kernel/btf/vmlinux  # BTF support

# Check tracepoint
sudo ls /sys/kernel/debug/tracing/events/kmem/rss_stat/

# Install tools
sudo apt-get install linux-tools-generic bpftool

Building Test Suite

cd test/memory-leak-simulators

# Build all test programs
make build-tests

# Verify builds
ls -la bin/

Test Simulators

Detector-Specific Tests

Simulator Target Detection Method Expected Result
linear_growth.c Linear Regression Steady 1MB/s leak R² > 0.90, 30-60s detection
anon_ratio.c RSS Ratio 90%+ heap allocations Anon > 85%, <30s detection
vsz_divergence.c Threshold mmap without touch VSZ/RSS > 2.5, <60s detection
monotonic_growth.c Threshold No decreases for 5+ min Monotonic flag, 5-6 min detection

Combined Tests

Simulator Purpose Expected Result
combined_leak.c Trigger all detectors Confidence > 95
accelerating_leak.c Exponential growth Fast detection, high confidence
oscillating_leak.c Sawtooth pattern Low confidence (correct)

False Positive Tests

Simulator Pattern Expected Result
cache_growth.c File cache expansion No detection ✅
startup_spike.c Initial burst then stable No detection ✅
gc_pattern.c Garbage collection sawtooth No detection ✅
mmap_database.c Memory-mapped file growth Low confidence ✅

Running Tests

Individual Test Execution

# Run a specific test
./bin/vsz_divergence

# With parameters
./bin/monotonic_growth --leak-rate=1MB --duration=600

# Monitor in background
./bin/combined_leak &
PID=$!

Monitoring eBPF Output

# Terminal 1: Run test
./bin/linear_growth

# Terminal 2: Monitor trace pipe
sudo cat /sys/kernel/debug/tracing/trace_pipe | grep memgrowth

# Terminal 3: Check BPF maps
watch -n 1 'sudo bpftool map dump name process_states | head -20'

Automated Test Suite

# Run all tests with validation
./run-tests.sh

# Quick smoke tests
./run-quick-tests.sh

# Output format:
# [PASS] linear_growth: Detected in 45s, confidence 82
# [PASS] cache_growth: No false positive
# [FAIL] vsz_divergence: Not detected (expected detection)

Validation Criteria

Detection Accuracy

Metric Target Acceptable Range
True Positive Rate >95% 90-100%
False Positive Rate <5% 0-10%
Detection Latency <60s for fast leaks 30-120s
Confidence Accuracy ±10 points 50-100

Per-Detector Validation

Linear Regression

  • R² value matches expected range
  • Slope calculation within 10% of actual
  • History buffer doesn't overflow
  • Coalescing works correctly

RSS Ratio

  • Anonymous ratio calculated correctly
  • Growth rates track accurately
  • Swap detection functions
  • Component separation works

Threshold

  • VSZ/RSS ratio accurate
  • Monotonic timer resets properly
  • Weights sum correctly
  • All thresholds trigger independently

Performance Testing

CPU Overhead

# Baseline CPU usage
top -p $(pgrep test_program) -b -n 1

# With monitoring enabled
sudo ./load_ebpf_programs.sh
top -p $(pgrep test_program) -b -n 1

# Calculate overhead
# Should be <0.1% difference

Memory Usage

# Check BPF map sizes
sudo bpftool map show | grep memgrowth

# Per-process state
echo "10000 processes * 164 bytes = $(( 10000 * 164 / 1024 ))KB"

# Ring buffer usage
sudo bpftool map dump name events | wc -l

Event Rate

# Count events per second
sudo bpftool prog profile name trace_rss_stat duration 10

# Check for drops
sudo bpftool map dump name events | grep -c dropped

Stress Testing

High Process Count

# Spawn 1000 processes
for i in {1..1000}; do
    ./bin/linear_growth --rate=10KB &
done

# Monitor system impact
vmstat 1

Rapid RSS Changes

# Burst allocations
./bin/burst_test --burst-size=100MB --interval=100ms

# Verify coalescing
sudo cat /sys/kernel/debug/tracing/trace | grep coalesce

Edge Cases

# Zero RSS process
./bin/empty_process

# Massive allocation
./bin/huge_alloc --size=10GB

# Rapid fork/exit
./bin/fork_bomb --max-procs=100

Debugging Failed Tests

Check eBPF Programs

# Verify programs loaded
sudo bpftool prog list | grep memgrowth

# Check attachment
sudo bpftool link list

# View verifier log
sudo cat /sys/kernel/debug/tracing/trace

Inspect Process State

# Dump process state for PID
sudo bpftool map lookup name process_states key 0x00 0x10 0x00 0x00

# Decode state
python3 decode_state.py <hex_output>

Ring Buffer Issues

# Check for full buffer
sudo bpftool map dump name events | tail -5

# Monitor event rate
watch -n 1 'sudo bpftool map dump name events | wc -l'

Regression Testing

Test Matrix

Kernel Version Architecture Expected Result
5.8 x86_64 Full support
5.10 x86_64 Full support
5.15 x86_64 Full support
6.0+ x86_64 Full support
5.8 arm64 Full support

Compatibility Tests

# Test on different kernels (using VMs)
for kernel in 5.8 5.10 5.15 6.0; do
    echo "Testing kernel $kernel"
    ssh vm-kernel-$kernel './run-tests.sh'
done

Continuous Integration

GitHub Actions Workflow

# .github/workflows/memory-monitor-test.yml
name: Memory Monitor Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Build eBPF programs
      run: make build-ebpf
    - name: Build test suite
      run: cd test/memory-leak-simulators && make
    - name: Run tests
      run: sudo ./test/memory-leak-simulators/run-tests.sh

Test Results Archive

Expected Baselines

Test Detection Time Confidence Anon% VSZ/RSS
linear_growth 30-60s 70-85 >0.90 60-70 1.1-1.3
anon_ratio <30s 85-95 N/A >90 1.0-1.2
vsz_divergence <60s 60-70 N/A 50-60 >2.5
monotonic_growth 5-6 min 70-80 >0.85 70-75 1.2-1.5
combined_leak <5 min 95-100 >0.95 >85 >2.0

Known Issues

  1. JVM Applications: May trigger false positives due to heap pre-allocation
  2. Database Caches: Need higher thresholds for cache-heavy workloads
  3. Container Environments: cgroup accounting may affect RSS calculations

See Also


Last updated: 2025-01-19 | Branch: mem_monitor | Status: DRAFT

⚠️ **GitHub.com Fallback** ⚠️