Memory Technologies Development Only BCC Memleak Full - antimetal/system-agent GitHub Wiki
BCC memleak with full tracing provides the most comprehensive memory leak detection available on Linux systems. Unlike sampling-based approaches, this tool tracks every single allocation and deallocation through eBPF probes attached to malloc, free, calloc, and realloc functions.
Key Characteristics:
- Complete malloc/free/calloc/realloc tracing via eBPF
- Tracks every allocation and deallocation without sampling
- 30-400% overhead makes it unsuitable for production environments
- Most accurate leak detection available with minimal false positives
- Provides complete stack traces for every allocation
- Can trace both user-space and kernel allocations
- CPU Overhead: 30-400% depending on application allocation patterns
- Memory Overhead: Significant - stores metadata for every active allocation
- Latency Impact: 100-400% increase in allocation/deallocation latency
-
Throughput Impact: Measured examples:
- MySQL: 33% throughput reduction
- High-frequency trading apps: 200-400% latency increase
- Web servers: 100-200% request latency increase
- Accuracy: Extremely high - tracks every allocation
- False Positives: Very low - genuine allocations without matching frees
- False Negatives: Minimal - may miss some complex allocation patterns
- Stack Trace Quality: Complete, limited by available debug symbols
- Linux Kernel: 4.6+ (eBPF uprobe support required)
- BCC Framework: Required for eBPF compilation and execution
- Debug Symbols: Recommended for meaningful stack traces
- Production Ready: No - development and debugging only
BCC memleak full tracing should never be used for continuous monitoring in production environments. Implementation should be strictly limited to:
-
Development Environment Usage
- Pre-production testing environments
- Local development debugging
- CI/CD pipeline leak detection
-
Brief Diagnostic Runs
- Maximum 5-10 minute runs
- Scheduled during low-traffic periods
- Immediate termination after data collection
-
Emergency Debugging Scenarios
- Critical memory leak investigation
- Reproduction of specific leak patterns
- Root cause analysis for known issues
-
Never for Continuous Monitoring
- No 24/7 deployment
- No automated recurring execution
- No production environment usage
# Trace all user-space allocations
sudo /usr/share/bcc/tools/memleak -p <pid>
# Trace specific libraries
sudo /usr/share/bcc/tools/memleak -p <pid> -T 300 -s
# Trace kernel allocations (kmalloc/kfree)
sudo /usr/share/bcc/tools/memleak -K
# Combined user and kernel tracing
sudo /usr/share/bcc/tools/memleak -p <pid> -K
- Complete call stack for every allocation
- Symbol resolution when debug info available
- Configurable stack depth (default 16 frames)
- Aggregation of identical allocation patterns
- Real-time tracking of unmatched malloc/free pairs
- Memory usage growth detection
- Leak rate calculation
- Allocation site ranking by leaked bytes
# Run comprehensive leak detection on staging
sudo /usr/share/bcc/tools/memleak -p $(pidof myapp) -T 600 -s > leak_report.txt
- Controlled environment reproduction
- Specific code path exercising
- Correlation with application logs
- Validation of leak fix effectiveness
- Complete allocation history
- Stack trace analysis for leak origins
- Pattern identification across multiple runs
- Integration with application profiling data
- Understanding application memory usage patterns
- Identifying hot allocation paths
- Memory usage optimization opportunities
- Allocation frequency analysis
- Repository: https://github.com/iovisor/bcc
- Documentation: https://github.com/iovisor/bcc/blob/master/tools/memleak.py
- Examples: https://github.com/iovisor/bcc/blob/master/tools/memleak_example.txt
- eBPF Program: Uprobe attachment to malloc/free family functions
- Data Collection: Hash table storage of allocation metadata
- Stack Walking: Kernel and user-space stack unwinding
- Symbol Resolution: Integration with system symbol tables
#!/bin/bash
# comprehensive-leak-detection.sh
# Configuration
PID=$1
DURATION=${2:-300} # 5 minutes default
OUTPUT_FILE="leak_report_$(date +%Y%m%d_%H%M%S).txt"
# Pre-checks
if ! command -v /usr/share/bcc/tools/memleak &> /dev/null; then
echo "ERROR: BCC tools not installed"
exit 1
fi
if [ -z "$PID" ]; then
echo "Usage: $0 <pid> [duration_seconds]"
exit 1
fi
# Run full tracing
echo "Starting full memory leak detection for PID $PID"
echo "Duration: $DURATION seconds"
echo "Output: $OUTPUT_FILE"
sudo /usr/share/bcc/tools/memleak \
-p $PID \
-T $DURATION \
-s \
--stack-depth 20 \
> "$OUTPUT_FILE" 2>&1
echo "Leak detection complete. Results in $OUTPUT_FILE"
#!/usr/bin/env python3
# analyze_leak_report.py
import re
import sys
from collections import defaultdict
def parse_leak_report(filename):
"""Parse BCC memleak output and extract key information."""
leaks = []
current_leak = {}
with open(filename, 'r') as f:
content = f.read()
# Parse leak entries
leak_pattern = r'(\d+) bytes in (\d+) allocations from stack\n(.*?)\n\n'
matches = re.findall(leak_pattern, content, re.DOTALL)
for bytes_leaked, alloc_count, stack_trace in matches:
leaks.append({
'bytes': int(bytes_leaked),
'count': int(alloc_count),
'stack': stack_trace.strip(),
'avg_size': int(bytes_leaked) // int(alloc_count)
})
return leaks
def analyze_leaks(leaks):
"""Analyze leak patterns and generate insights."""
if not leaks:
return "No leaks detected"
# Sort by bytes leaked
leaks.sort(key=lambda x: x['bytes'], reverse=True)
total_leaked = sum(leak['bytes'] for leak in leaks)
total_allocs = sum(leak['count'] for leak in leaks)
analysis = f"""
Leak Analysis Summary:
=====================
Total Leaked: {total_leaked:,} bytes ({total_leaked/1024/1024:.2f} MB)
Total Leak Sites: {len(leaks)}
Total Leaked Allocations: {total_allocs:,}
Average Leak Size: {total_leaked // total_allocs} bytes
Top 5 Leak Sources:
"""
for i, leak in enumerate(leaks[:5], 1):
analysis += f"\n{i}. {leak['bytes']:,} bytes in {leak['count']} allocations"
analysis += f" (avg: {leak['avg_size']} bytes)\n"
# Show first few lines of stack trace
stack_lines = leak['stack'].split('\n')[:3]
for line in stack_lines:
analysis += f" {line}\n"
return analysis
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: analyze_leak_report.py <leak_report.txt>")
sys.exit(1)
leaks = parse_leak_report(sys.argv[1])
print(analyze_leaks(leaks))
# Focus on specific allocation functions
sudo /usr/share/bcc/tools/memleak -p $PID -T 120 | grep -A 10 "malloc\|calloc\|realloc"
# Filter by minimum leak size (1MB+)
sudo /usr/share/bcc/tools/memleak -p $PID -T 180 | awk '/^[0-9]+ bytes/ && $1 >= 1048576'
# Comprehensive reporting with timestamps
{
echo "=== Memory Leak Report ==="
echo "Timestamp: $(date)"
echo "PID: $PID"
echo "Command: $(ps -p $PID -o comm=)"
echo "=========================="
echo
sudo /usr/share/bcc/tools/memleak -p $PID -T 300 -s
echo
echo "=== Process Memory Info ==="
cat /proc/$PID/status | grep -E "VmSize|VmRSS|VmData|VmHWM"
} > "comprehensive_leak_report_$(date +%Y%m%d_%H%M%S).txt"
Configuration: MySQL 8.0, TPC-C benchmark
Environment: 16-core server, 64GB RAM
Baseline Performance:
- Transactions/sec: 15,247
- Average Latency: 12.3ms
- CPU Usage: 45%
With BCC memleak Full Tracing:
- Transactions/sec: 10,215 (-33%)
- Average Latency: 18.9ms (+54%)
- CPU Usage: 78% (+73%)
Memory Overhead:
- Additional RSS: 2.1GB
- Allocation metadata: ~150 bytes per tracked allocation
Application: Node.js Express API
Load: 1000 req/sec sustained
Normal Operation:
- Response Time P95: 45ms
- Memory Usage: 512MB
- CPU Usage: 30%
With Full Tracing:
- Response Time P95: 180ms (+300%)
- Memory Usage: 1.2GB (+135%)
- CPU Usage: 85% (+183%)
Allocation Rate Impact:
- Normal: ~50K allocs/sec
- With tracing: ~12K effective allocs/sec
-
Every Allocation Intercepted
- Uprobe fires for every malloc/free call
- Context switching overhead for each probe
- eBPF program execution time
-
Metadata Storage
- Stack trace collection and storage
- Hash table operations for tracking
- Memory overhead for tracking structures
-
Stack Walking
- Complete stack unwinding for each allocation
- Symbol resolution overhead
- Debug information processing
-
Synchronization Overhead
- eBPF map operations require synchronization
- Potential lock contention in high-concurrency scenarios
- Memory barriers and cache invalidation
-
Development Environment Only
- Local development debugging
- Unit test memory validation
- Integration test leak detection
-
Known Leak Reproduction
- Reproducing reported memory leaks
- Validating leak fixes
- Understanding leak patterns
-
Last Resort Debugging
- When other tools provide insufficient data
- Critical production issue investigation (very brief runs)
- Memory corruption investigation
-
Very Brief Periods
- Maximum 5-10 minutes in development
- Maximum 1-2 minutes in staging
- Never continuous operation
- Production environments (continuous monitoring)
- Performance-critical applications during normal operation
- High-frequency allocation patterns (millions of allocs/sec)
- Memory-constrained systems (overhead too high)
- 24/7 monitoring (use sampling approaches instead)
# Use sampled version instead
sudo /usr/share/bcc/tools/memleak -p $PID -s 1000 # Sample 1 in 1000
# Or use dedicated profilers
export MALLOC_CONF="prof:true,prof_leak:true"
# Run application with jemalloc profiling
# Enable jemalloc heap profiling
export MALLOC_CONF="prof:true,prof_active:false,prof_leak:true"
# Lower overhead, production-suitable
# Monitor memory growth via page faults (much lower overhead)
sudo /usr/share/bcc/tools/trace 'p:do_anonymous_page_fault'
- Valgrind: More comprehensive but even higher overhead
- AddressSanitizer: Compile-time instrumentation option
- jemalloc profiling: Lower overhead, production-suitable
- SystemTap: Alternative eBPF-like tracing framework
- Confirm non-production environment
- Set maximum runtime limits (5-10 minutes)
- Ensure sufficient disk space for output
- Verify debug symbols availability
- Plan for application performance degradation
- Have rollback plan ready
- Start with short runs (30-60 seconds)
- Focus on specific allocation patterns
- Combine with application profiling data
- Document environmental conditions
- Save complete output for analysis
- Focus on largest leaks first
- Group similar stack traces
- Correlate with application behavior
- Validate findings with controlled tests
- Document leak patterns for future reference
BCC memleak with full tracing is an extremely powerful but heavyweight tool for memory leak detection. Its comprehensive tracking capabilities come at the cost of significant performance overhead, making it suitable only for development environments and brief diagnostic runs.
The tool excels at providing complete, accurate leak detection with minimal false positives, but the 30-400% overhead makes it completely unsuitable for production monitoring. For production environments, consider sampling-based approaches, dedicated profilers like jemalloc, or lightweight page fault tracing methods.
When used appropriately in controlled environments, BCC memleak full tracing provides unmatched visibility into memory allocation patterns and leak sources, making it an invaluable tool for debugging complex memory management issues.