Exploratory Matrix - antimetal/system-agent GitHub Wiki
Memory Leak Detection Technologies Comparison Matrix
Overview
This matrix compares all researched memory leak detection approaches across multiple dimensions including performance overhead, accuracy, deployment complexity, and production readiness.
Scoring Legend
- Overhead: Percentage performance impact (lower is better)
- Accuracy: 🟢 High (>90%) | 🟡 Medium (60-90%) | 🔴 Low (<60%)
- False Positives: 🟢 Low (<10%) | 🟡 Medium (10-30%) | 🔴 High (>30%)
- Setup Complexity: 🟢 Easy | 🟡 Moderate | 🔴 Complex
- Production Ready: ✅ Yes | ⚠️ Limited | ❌ No
Comprehensive Comparison Matrix
Technology |
Overhead |
Accuracy |
False Positives |
Granularity |
Setup |
Prod Ready |
Restart Required |
Stack Traces |
Platform Requirements |
Key Limitations |
Page Fault Tracing |
<1% |
🟡 Medium |
🟢 Low |
Coarse |
🟢 Easy |
✅ Yes |
No |
Yes* |
Linux 4.14+, Frame pointers |
Indirect detection only |
jemalloc Profiling |
~4% |
🟢 High |
🟢 Low |
Fine |
🟢 Easy |
✅ Yes |
No** |
Yes |
LD_PRELOAD support |
Sampling may miss small leaks |
tcmalloc Profiling |
~5% |
🟢 High |
🟢 Low |
Fine |
🟢 Easy |
✅ Yes |
No** |
Yes |
LD_PRELOAD support |
Google ecosystem focused |
mimalloc Profiling |
~2% |
🟡 Medium |
🟢 Low |
Medium |
🟢 Easy |
✅ Yes |
Yes |
Limited |
Windows/Linux/macOS |
Limited profiling features |
PSI + Metrics |
0% |
🔴 Low |
🔴 High |
Very Coarse |
🟢 Easy |
✅ Yes |
No |
No |
Linux 4.20+ |
Detection only, no root cause |
Hardware PMCs |
0% |
🟡 Medium |
🟡 Medium |
Coarse |
🔴 Complex |
⚠️ Limited |
No |
No |
Intel/AMD CPUs, root access |
Requires expertise to interpret |
SWAT (Statistical) |
<5% |
🟢 High |
🟢 Low |
Fine |
🟡 Moderate |
✅ Yes |
No |
Yes |
Windows/Linux |
Requires baseline period |
Precog (ML) |
~1% |
🟡 Medium |
🟡 Medium |
Coarse |
🟡 Moderate |
⚠️ Limited |
No |
No |
Training data required |
Needs historical data |
BCC memleak (sampled) |
10-30% |
🟢 High |
🟢 Low |
Fine |
🟡 Moderate |
⚠️ Limited |
No |
Yes |
Linux 4.6+, BCC tools |
Still significant overhead |
BCC memleak (full) |
30-400% |
🟢 High |
🟢 Low |
Very Fine |
🟡 Moderate |
❌ No |
No |
Yes |
Linux 4.6+, BCC tools |
Unsuitable for production |
ByteHound |
~20% |
🟢 High |
🟢 Low |
Very Fine |
🟡 Moderate |
⚠️ Limited |
Yes |
Yes |
Linux, Rust runtime |
Requires process restart |
Parca |
1-2% |
🟡 Medium |
🟢 Low |
Fine |
🔴 Complex |
✅ Yes |
No |
Yes |
Kubernetes, eBPF |
Additional infrastructure |
Pixie |
1-2% |
🟡 Medium |
🟢 Low |
Fine |
🔴 Complex |
✅ Yes |
No |
Yes |
Kubernetes only |
K8s specific |
Pyroscope |
1-2% |
🟡 Medium |
🟢 Low |
Fine |
🟡 Moderate |
✅ Yes |
No |
Yes |
Multi-platform |
Server infrastructure needed |
Valgrind/Massif |
2000% |
🟢 High |
🟢 Low |
Very Fine |
🟢 Easy |
❌ No |
Yes |
Yes |
Linux/macOS |
Dev only, serializes threads |
Heaptrack |
50-100% |
🟢 High |
🟢 Low |
Very Fine |
🟢 Easy |
❌ No |
Yes |
Yes |
Linux |
Dev/debug only |
ASAN |
200-300% |
🟢 High |
🟢 Low |
Very Fine |
🟡 Moderate |
❌ No |
Rebuild |
Yes |
Compiler support |
Requires recompilation |
LSAN |
150-200% |
🟢 High |
🟢 Low |
Very Fine |
🟡 Moderate |
❌ No |
Rebuild |
Yes |
LLVM/GCC |
Requires recompilation |
brk/mmap Tracing |
<1% |
🔴 Low |
🔴 High |
Very Coarse |
🟢 Easy |
✅ Yes |
No |
Yes |
Linux, eBPF |
Only heap expansion |
LeakGuard |
5-10% |
🟢 High |
🟢 Low |
Fine |
🟡 Moderate |
⚠️ Limited |
No |
Yes |
Research prototype |
Not widely available |
GenCount |
5-15% |
🟡 Medium |
🟡 Medium |
Fine |
🟡 Moderate |
⚠️ Limited |
No |
Yes |
Research prototype |
Academic tool |
Sleigh |
10-20% |
🟡 Medium |
🟡 Medium |
Fine |
🟡 Moderate |
⚠️ Limited |
No |
Yes |
Research prototype |
Limited deployment |
*Requires frame pointers to be enabled (-fno-omit-frame-pointer)
**Can be enabled at runtime with mallctl() or environment variables
Detailed Capability Matrix
Technology |
Detects Slow Leaks |
Detects Fast Leaks |
Kernel Memory |
User Memory |
Language Agnostic |
Real-time Detection |
Historical Analysis |
Root Cause Analysis |
Page Fault Tracing |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
❌ |
⚠️ |
jemalloc Profiling |
⚠️ |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
tcmalloc Profiling |
⚠️ |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
mimalloc Profiling |
⚠️ |
✅ |
❌ |
✅ |
✅ |
✅ |
⚠️ |
⚠️ |
PSI + Metrics |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
Hardware PMCs |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
SWAT (Statistical) |
✅ |
⚠️ |
❌ |
✅ |
❌ |
✅ |
✅ |
✅ |
Precog (ML) |
✅ |
⚠️ |
✅ |
✅ |
✅ |
⚠️ |
✅ |
❌ |
BCC memleak |
✅ |
✅ |
✅* |
✅ |
✅ |
✅ |
❌ |
✅ |
ByteHound |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
Parca |
✅ |
✅ |
❌ |
✅ |
⚠️ |
✅ |
✅ |
✅ |
Pixie |
✅ |
✅ |
❌ |
✅ |
⚠️ |
✅ |
✅ |
✅ |
Pyroscope |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
Valgrind/Massif |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
Heaptrack |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
ASAN/LSAN |
✅ |
✅ |
❌ |
✅ |
❌ |
✅ |
❌ |
✅ |
*BCC memleak can trace kernel allocations (kmalloc/kfree)
Use Case Recommendations
By Deployment Scenario
Scenario |
Primary Choice |
Secondary Choice |
Avoid |
Always-on Production |
PSI + Page Faults |
jemalloc (4% acceptable) |
Valgrind, ASAN, Full memleak |
Kubernetes |
Pixie |
Parca |
Non-container aware tools |
High-Performance Systems |
Hardware PMCs |
Page Fault Tracing |
Any allocator instrumentation |
Development/Testing |
Valgrind/ASAN |
Heaptrack |
- |
Quick Investigation |
jemalloc profiling |
Sampled BCC memleak |
Full tracing |
Deep Root Cause Analysis |
ByteHound |
Full BCC memleak |
Surface-level metrics |
Java Applications |
JVM Native Tools |
- |
C/C++ specific tools |
Embedded Systems |
Custom lightweight |
mimalloc |
Heavy profilers |
By Leak Characteristics
Leak Type |
Best Tools |
Why |
Slow, gradual leaks |
PSI + Metrics, Page Faults |
Low overhead for long-term monitoring |
Fast, obvious leaks |
jemalloc, tcmalloc |
Quick detection with stack traces |
Small, intermittent |
ByteHound, Full memleak |
Need complete tracking |
Unknown source |
SWAT, Statistical approaches |
Pattern recognition helps |
Container escapes |
Pixie, Parca |
Container-aware |
Kernel memory |
BCC memleak (kernel mode) |
Specialized for kernel |
Quantitative Performance Comparison
Metric |
Best Performers |
Acceptable |
Poor |
CPU Overhead |
PMCs (0%), PSI (0%) |
Page Faults (<1%), Parca (1-2%) |
Valgrind (2000%) |
Memory Overhead |
PSI, PMCs, Page Faults |
jemalloc (~10%) |
ASAN (2-3x) |
Latency Impact |
PMCs, PSI |
jemalloc (+10% P99) |
Valgrind (serialization) |
Detection Speed |
Direct tracing |
Statistical (minutes) |
ML approaches (hours) |
Accuracy |
Valgrind, ASAN, ByteHound |
jemalloc, BCC |
PSI, Metrics only |
Implementation Effort Comparison
Approach |
Lines of Code |
Dependencies |
Maintenance |
Expertise Required |
PSI Monitoring |
~50 |
/proc/pressure |
Low |
Low |
Page Fault eBPF |
~200 |
BCC/bpftrace |
Medium |
Medium |
jemalloc Integration |
~100 |
jemalloc lib |
Low |
Low |
PMC Analysis |
~500 |
perf, PAPI |
High |
High |
ML Detection |
~1000+ |
sklearn, data pipeline |
High |
High |
ByteHound |
~50 (usage) |
ByteHound binary |
Medium |
Medium |
Parca/Pixie |
~200 |
K8s, operators |
High |
Medium |
Cost-Benefit Analysis
High Value (Low Cost, High Benefit)
- Page Fault Tracing: <1% overhead, good detection
- PSI Monitoring: 0% overhead, early warning
- jemalloc (existing users): If already using, enable profiling
Medium Value
- Allocator Switch: 4% overhead but need migration
- Sampled BCC: 10-30% overhead, periodic use only
- Statistical Approaches: Need tuning and baseline
Low Value (High Cost, Limited Benefit)
- Full malloc/free tracing: 30-400% overhead
- Valgrind in production: 2000% overhead
- Custom ML solutions: High development cost
Decision Tree
Is this production?
├─ Yes
│ ├─ Can tolerate 4% overhead?
│ │ ├─ Yes → jemalloc/tcmalloc profiling
│ │ └─ No → Page fault tracing + PSI
│ └─ Kubernetes environment?
│ ├─ Yes → Pixie or Parca
│ └─ No → Standard Linux tools
└─ No (Development)
├─ Need complete accuracy?
│ ├─ Yes → Valgrind or ASAN
│ └─ No → Heaptrack or ByteHound
└─ Quick check only?
└─ Yes → jemalloc one-time profile
Key Insights from Matrix
- No Silver Bullet: No single tool excels at all dimensions
- Overhead vs Accuracy Trade-off: Universal across all approaches
- Production Viability Threshold: ~5% overhead is the practical limit
- Layered Approach Optimal: Combine low-overhead detection with targeted deep analysis
- Platform Matters: Kubernetes environments have specialized, superior tools
- Frame Pointers Critical: Most eBPF tools require them but modern compilers omit by default
- Statistical Sampling Works: 4% overhead for 90%+ accuracy is achievable
- Hardware Counters Underutilized: Zero overhead but require expertise