RSS Ratio Detector - antimetal/system-agent GitHub Wiki
⚠️ DRAFT/WIP: Documentation for in-development feature onmem_monitor
branch
The RSS Component Ratio Detector (ebpf/src/memgrowth_rss_ratio.bpf.c
) analyzes the composition of process memory to identify heap-based memory leaks. By tracking the ratio of anonymous memory to file-backed memory, it distinguishes genuine leaks from normal cache growth.
Memory leaks manifest differently than cache growth:
Memory Type | Normal App | Leaking App | Cache-Heavy App |
---|---|---|---|
Anonymous (heap/stack) | 40-75% | 85-95% | 20-40% |
File-backed (code/libs) | 25-60% | 5-15% | 60-80% |
This detector leverages these patterns to identify leaks with high accuracy.
The detector monitors four RSS components from the kmem:rss_stat
tracepoint:
struct process_memory_state {
// RSS components
__u64 rss_anon; // Anonymous pages (heap, stack)
__u64 rss_file; // File-backed pages (code, libs, mmap)
__u64 rss_swap; // Swap entries
__u64 rss_shmem; // Shared memory pages
// Growth rates by type
__s64 anon_growth_rate; // bytes/sec (can be negative)
__s64 file_growth_rate; // bytes/sec (can be negative)
// Memory ratios (% * 10 for precision)
__u16 anon_ratio; // Anonymous % * 10 (0-1000)
__u16 swap_ratio; // Swap % * 10 (0-1000)
};
#define ANON_RATIO_THRESHOLD 800 // 80% anonymous memory
// Criterion 1: High anonymous memory ratio
if (state->anon_ratio > ANON_RATIO_THRESHOLD) {
confidence += 40;
}
// Criterion 2: Anonymous growth exceeds file growth
if (state->anon_growth_rate > state->file_growth_rate * 2) {
confidence += 30;
}
// Criterion 3: Swap pressure increasing
if (state->swap_ratio > 100 && growing) {
confidence += 20;
}
// Criterion 4: File memory stable while anon grows
if (state->file_growth_rate <= 0 &&
state->anon_growth_rate > 100KB/s) {
confidence += 10;
}
// Calculate with precision (% * 10)
__u64 total = rss_anon + rss_file + rss_shmem;
anon_ratio = (rss_anon * 1000) / total;
// Example: 850 = 85.0% anonymous memory
Time: T0 T1 T2 T3 T4
Anon: 100MB 150MB 200MB 250MB 300MB ↗️
File: 50MB 52MB 51MB 53MB 52MB →
Ratio: 66% 74% 79% 82% 85%
Action: Monitor Monitor Warn Alert Critical
Time: T0 T1 T2 T3 T4
Anon: 100MB 102MB 105MB 103MB 104MB →
File: 50MB 100MB 150MB 200MB 250MB ↗️
Ratio: 66% 50% 41% 34% 29%
Action: Normal Normal Normal Normal Normal
Time: T0 T1 T2 T3 T4
Anon: 100MB 200MB 300MB 250MB 200MB
File: 50MB 50MB 50MB 30MB 20MB
Swap: 0MB 0MB 50MB 100MB 150MB ↗️
Ratio: 66% 80% 85% 89% 91%
Action: Normal Warn Alert Critical OOM-Risk
Component | Weight | Scoring Criteria |
---|---|---|
Anonymous Ratio | 0-40 | >90%: 40, >85%: 35, >80%: 30, >75%: 20 |
Growth Differential | 0-30 | Anon-File >10MB/s: 30, >1MB/s: 25, >100KB/s: 20 |
Swap Pressure | 0-20 | >20%: 20, >10%: 15, >5%: 10 |
Duration | 0-10 | Sustained pattern: 10 |
Scenario | Total RSS | Anon Ratio | Detection |
---|---|---|---|
Heap leak | Growing |
High (>80%) | ✅ Detected |
Cache growth | Growing |
Low (<40%) | ✅ Correctly ignored |
Mixed growth | Growing |
Stable (~60%) | ✅ Low confidence |
- Detects pattern changes before significant growth
- Identifies leaks in small processes (<100MB)
- Scales automatically with process size
- Detects memory pressure before OOM
- Identifies thrashing patterns
- Correlates swap growth with leak probability
Metric | Value | Impact |
---|---|---|
Ratio Calculation | ~50 instructions | Per RSS update |
Growth Rate Calc | ~100 instructions | Per RSS update |
Confidence Scoring | ~200 instructions | On ratio change |
Total CPU | <350 instructions | <0.03% overhead |
Memory per Process | 52 bytes additional | 520KB for 10K procs |
struct rss_ratio_config {
__u16 anon_ratio_threshold; // Default: 800 (80%)
__u16 swap_ratio_threshold; // Default: 100 (10%)
__u64 min_growth_differential; // Default: 100KB/s
__u32 sample_window_ms; // Default: 5000ms
__u8 enable_swap_detection; // Default: 1
__u8 confidence_threshold; // Default: 60
};
Test with test/memory-leak-simulators/anon_ratio.c
:
// Allocate primarily anonymous memory
for (int i = 0; i < 100; i++) {
void* heap = malloc(10 * 1024 * 1024); // 10MB heap
memset(heap, i, 10 * 1024 * 1024); // Touch pages
// Minimal file mapping for comparison
if (i % 10 == 0) {
mmap(NULL, 1024*1024, PROT_READ, MAP_PRIVATE, fd, 0);
}
sleep(1);
}
Expected: Anonymous ratio >90%, Detection <30 seconds, Confidence 85-95
The detector includes safeguards:
✅ Balanced growth recognition - Won't trigger if file and anon grow proportionally
✅ Cache detection - Low confidence when file memory dominates
✅ Startup grace period - Ignores initial allocation bursts
✅ Minimum threshold - Only tracks processes >10MB RSS
❌ Language-specific patterns - JVM has unique memory patterns
❌ Shared memory - May misclassify heavy shmem usage
❌ Memory-mapped databases - Could trigger false positives
❌ Container overhead - cgroup accounting differences
RSS Ratio provides unique insights:
- WHAT is growing (heap vs cache) - This detector
- HOW it's growing (trend) - Linear Regression
- WHETHER it matches patterns - Threshold Detector
Combined, they provide comprehensive leak detection.
- Linear Regression Detector - Trend analysis
- Threshold Detector - Scientific heuristics
- Testing Methodology - Validation procedures
Last updated: 2025-01-19 | Branch: mem_monitor
| Status: DRAFT