Linear Regression Detector - antimetal/system-agent GitHub Wiki

Linear Regression Detector

⚠️ DRAFT/WIP: Documentation for in-development feature on mem_monitor branch

← Back to Memory Monitoring

Overview

The Linear Regression Detector (ebpf/src/memgrowth.bpf.c) uses statistical trend analysis to identify steady memory growth patterns. It implements integer-only linear regression directly in eBPF kernel space to detect slow, consistent memory leaks with minimal overhead.

Key Innovation: Event Coalescing

The detector's most important optimization is 5-second event coalescing, which prevents burst events from overwhelming the 16-element circular buffer:

const __u32 COALESCE_THRESHOLD_DS = 50;  // 5 seconds

if (time_since_last < COALESCE_THRESHOLD_DS) {
    // UPDATE last entry instead of adding new
    state->rss_history_mb[last_idx] = new_rss_mb;
    return;  // Don't advance buffer or recalculate
}

Benefits

  • 16 slots × 5 seconds = 80+ seconds minimum coverage
  • Handles kernel RSS batching gracefully
  • 95% reduction in regression calculations

Technical Design

Circular Buffer Architecture

#define RSS_HISTORY_SIZE 16  // Power of 2 for efficient masking
#define RSS_HISTORY_MASK 15  // Binary AND for wrap-around

struct process_memory_state {
    __u32 rss_history_mb[RSS_HISTORY_SIZE];  // RSS in MB
    __u32 time_history_ds[RSS_HISTORY_SIZE];  // Deciseconds
    __u8 history_head;   // Next write position (0-15)
    __u8 history_count;  // Valid entries (max 16)
};

MB Resolution Storage

Why megabyte resolution?

  • Natural noise filtering - ignores <1MB changes
  • Smaller integers - avoids arithmetic overflow
  • Meaningful threshold - 1MB matters in production

Adaptive Sampling

The detector adjusts sampling based on process behavior:

Process State Growth Rate Sampling Interval Rationale
Idle 0 bytes/s 10 seconds Minimize overhead
Slow Growth <100KB/s 5 seconds Balance coverage
Active Leak >100KB/s 1 second Tight monitoring

Linear Regression Algorithm

Integer-Only Implementation

// No floating point in eBPF - use integer arithmetic
for (i = 0; i < n; i++) {
    x = (time_history_ds[i] - t0) / 10;  // Convert to seconds
    y = rss_history_mb[i];               // Already in MB
    
    sum_x += x;
    sum_y += y;
    sum_xy += x * y;
    sum_x2 += x * x;
}

// Calculate slope in MB/s, convert to bytes/s
slope_mb_per_s = (n * sum_xy - sum_x * sum_y) /
                 (n * sum_x2 - sum_x * sum_x);
state->trend_slope = slope_mb_per_s * 1024 * 1024;

R² Correlation Quality

The detector calculates R² to measure trend quality:

  • R² > 0.95: Strong linear correlation (high confidence)
  • R² > 0.90: Good correlation (medium confidence)
  • R² < 0.80: Poor correlation (low confidence)

Confidence Scoring

The detector uses a multi-factor scoring system:

Factor Weight Criteria
Growth Rate 0-25 points >10MB/s: 25, >1MB/s: 20, >100KB/s: 15
Pattern Quality 0-35 points R²>0.95: 35, R²>0.90: 25, R²>0.80: 15
Duration 0-25 points 15+ samples: 25, 10+: 20, 6+: 10
Relative Growth 0-15 points 2x initial: 15, 1.5x: 10, 1x: 5

Total Score: 0-100 (threshold typically set at 60)

Detection Examples

Slow Steady Leak

Time (s):  0    10   20   30   40   50   60   70   80
RSS (MB):  100  105  110  115  120  125  130  135  140
           └─────────────────────────────────────────┘
                  Slope: 0.5 MB/s
                  R² = 0.99
                  Confidence: 85/100 ✅

Normal Process (No Leak)

Time (s):  0    10   20   30   40   50   60   70   80
RSS (MB):  100  120  118  122  119  121  120  119  121
           └─────────────────────────────────────────┘
                  No clear trend
                  R² = 0.12
                  Confidence: 5/100 ✅

Performance Characteristics

Metric Value Notes
Event Rate 0.1-0.2/sec per process After coalescing
CPU per Event <100 instructions Highly optimized
Memory per Process 164 bytes Compact state
Total Memory (10K procs) ~1.6MB Scales linearly
Regression Calculations Every 5+ seconds Only on buffer advance

Advantages

Statistical robustness - Filters noise through regression
Extended history - 80+ seconds with coalescing
Adaptive behavior - Responds to process patterns
Low overhead - <0.05% CPU for 1000 processes
Accurate measurement - Direct RSS from kernel

Limitations

Minimum samples - Needs 6+ data points
Linear assumption - May miss exponential growth initially
MB resolution - Won't detect <1MB/min leaks
Single pattern - Only detects linear trends

Configuration

struct linear_regression_config {
    __u64 min_rss_threshold;      // Default: 10MB
    __u64 mb_change_threshold;    // Default: 1MB
    __u32 steady_interval_ds;     // Default: 100 (10s)
    __u32 active_interval_ds;     // Default: 10 (1s)
    __u32 min_samples;           // Default: 6
    __u32 r2_threshold;          // Default: 900 (0.90)
    __u8 confidence_threshold;    // Default: 60
};

Testing

Test with test/memory-leak-simulators/linear_growth.c:

// Simulates 1MB/s steady leak
for (i = 0; i < 60; i++) {
    void* leak = malloc(1024 * 1024);
    memset(leak, 0, 1024 * 1024);  // Touch pages
    sleep(1);
}

Expected detection: 30-60 seconds, R² > 0.90, Confidence 70-85

Integration with Other Detectors

The Linear Regression Detector provides:

  • Trend validation for threshold-based detections
  • Long-term perspective complementing short-term spikes
  • Statistical confidence for decision making

See Also


Last updated: 2025-01-19 | Branch: mem_monitor | Status: DRAFT

⚠️ **GitHub.com Fallback** ⚠️