Memory Technologies - antimetal/system-agent GitHub Wiki

Memory Leak Detection Technologies Documentation

This directory contains comprehensive documentation for 23 different memory leak detection technologies, each with detailed implementation plans for system-agent integration, production examples, and academic references.

Quick Navigation

🟢 Production-Ready (Low Overhead)

🟡 Production-Limited (Moderate Overhead)

🔴 Development-Only (High Overhead)

🔬 Research Prototypes

☁️ Platform-Specific

Technology Categories

By Detection Method

Direct Allocation Tracking:

  • BCC memleak (full and sampled)
  • ByteHound
  • Valgrind/Massif
  • Heaptrack
  • ASAN/LSAN

Statistical Profiling:

  • jemalloc
  • tcmalloc
  • mimalloc
  • SWAT

Indirect Signals:

  • Page fault tracing
  • PSI metrics
  • Hardware PMCs
  • brk/mmap tracing

Pattern Analysis:

  • Time series analysis
  • Precog ML
  • LeakGuard
  • GenCount
  • Sleigh

By Deployment Model

Always-On Monitoring:

  • PSI + Metrics
  • Page fault tracing
  • Hardware PMCs
  • brk/mmap tracing

Triggered Profiling:

  • jemalloc/tcmalloc
  • BCC memleak (sampled)
  • ByteHound

Continuous Profiling Platforms:

  • Parca
  • Pixie
  • Pyroscope

Development/CI Only:

  • Valgrind
  • ASAN/LSAN
  • Heaptrack

Three-Layer Detection Strategy

Based on the research, the optimal approach combines multiple technologies:

Layer 1: Continuous Monitoring (Always On)

Layer 2: Triggered Investigation (On Anomaly)

Layer 3: Deep Analysis (Critical Issues)

Selection Guide

For Different Scenarios

If you need... Use this technology Why
Zero overhead monitoring Memory-Technologies-Production-Ready-Hardware-PMC or Memory-Technologies-Production-Ready-PSI-Metrics Hardware-native or kernel tracking
Production profiling Memory-Technologies-Production-Ready-Jemalloc-Profiling Best overhead/accuracy trade-off
Kubernetes monitoring Memory-Technologies-Platform-Specific-Pixie or Memory-Technologies-Platform-Specific-Parca Native K8s integration
Development debugging Memory-Technologies-Development-Only-Valgrind-Massif Most comprehensive
Quick leak check Memory-Technologies-Production-Ready-Page-Fault-Tracing Low overhead, good signals
Statistical analysis Memory-Technologies-Production-Limited-SWAT-Statistical Proven at Microsoft
Complete tracking Memory-Technologies-Production-Limited-Bytehound 100% allocation coverage

Performance vs Accuracy Trade-offs

High Accuracy, High Overhead:
  Valgrind > ASAN/LSAN > Heaptrack > ByteHound > BCC full

Balanced (Production-suitable):
  jemalloc ≈ tcmalloc > SWAT > Parca/Pixie > BCC sampled

Low Overhead, Lower Accuracy:
  Hardware PMCs ≈ PSI < Page faults < brk/mmap < Time series

Implementation Status

Status Count Technologies
✅ Production Ready 10 PSI, Page faults, jemalloc, tcmalloc, mimalloc, PMCs, SWAT, brk/mmap, Time series, Continuous platforms
⚠️ Limited Production 6 BCC sampled, ByteHound, Precog, Research prototypes
❌ Development Only 7 Valgrind, Heaptrack, ASAN, LSAN, BCC full

Key Findings

  1. Page fault tracing is underutilized - Provides excellent signals at <1% overhead
  2. Modern allocators are production-ready - jemalloc/tcmalloc's 4-5% overhead is acceptable
  3. Hardware PMCs offer zero overhead - But require expertise to interpret
  4. Statistical approaches work - SWAT proves <5% overhead is achievable
  5. Layered approach is optimal - No single tool solves all cases

References

Contributing

When adding new technologies:

  1. Follow the established document template
  2. Include production examples where available
  3. Provide implementation code for system-agent
  4. Add academic references and benchmarks
  5. Update this index and the comparison matrix