Memory Technologies Development Only BCC Memleak Full - antimetal/system-agent GitHub Wiki

BCC memleak (Full Tracing)

Overview

BCC memleak with full tracing provides the most comprehensive memory leak detection available on Linux systems. Unlike sampling-based approaches, this tool tracks every single allocation and deallocation through eBPF probes attached to malloc, free, calloc, and realloc functions.

Key Characteristics:

  • Complete malloc/free/calloc/realloc tracing via eBPF
  • Tracks every allocation and deallocation without sampling
  • 30-400% overhead makes it unsuitable for production environments
  • Most accurate leak detection available with minimal false positives
  • Provides complete stack traces for every allocation
  • Can trace both user-space and kernel allocations

Performance Characteristics

Overhead Analysis

  • CPU Overhead: 30-400% depending on application allocation patterns
  • Memory Overhead: Significant - stores metadata for every active allocation
  • Latency Impact: 100-400% increase in allocation/deallocation latency
  • Throughput Impact: Measured examples:
    • MySQL: 33% throughput reduction
    • High-frequency trading apps: 200-400% latency increase
    • Web servers: 100-200% request latency increase

Accuracy Metrics

  • Accuracy: Extremely high - tracks every allocation
  • False Positives: Very low - genuine allocations without matching frees
  • False Negatives: Minimal - may miss some complex allocation patterns
  • Stack Trace Quality: Complete, limited by available debug symbols

Platform Requirements

  • Linux Kernel: 4.6+ (eBPF uprobe support required)
  • BCC Framework: Required for eBPF compilation and execution
  • Debug Symbols: Recommended for meaningful stack traces
  • Production Ready: No - development and debugging only

System-Agent Implementation Plan

Deployment Strategy

BCC memleak full tracing should never be used for continuous monitoring in production environments. Implementation should be strictly limited to:

  1. Development Environment Usage

    • Pre-production testing environments
    • Local development debugging
    • CI/CD pipeline leak detection
  2. Brief Diagnostic Runs

    • Maximum 5-10 minute runs
    • Scheduled during low-traffic periods
    • Immediate termination after data collection
  3. Emergency Debugging Scenarios

    • Critical memory leak investigation
    • Reproduction of specific leak patterns
    • Root cause analysis for known issues
  4. Never for Continuous Monitoring

    • No 24/7 deployment
    • No automated recurring execution
    • No production environment usage

Tool Capabilities

User-Space Function Tracing

# Trace all user-space allocations
sudo /usr/share/bcc/tools/memleak -p <pid>

# Trace specific libraries
sudo /usr/share/bcc/tools/memleak -p <pid> -T 300 -s

Kernel Allocation Tracing

# Trace kernel allocations (kmalloc/kfree)
sudo /usr/share/bcc/tools/memleak -K

# Combined user and kernel tracing
sudo /usr/share/bcc/tools/memleak -p <pid> -K

Stack Trace Collection

  • Complete call stack for every allocation
  • Symbol resolution when debug info available
  • Configurable stack depth (default 16 frames)
  • Aggregation of identical allocation patterns

Outstanding Allocation Tracking

  • Real-time tracking of unmatched malloc/free pairs
  • Memory usage growth detection
  • Leak rate calculation
  • Allocation site ranking by leaked bytes

Development Use Cases

Pre-Production Testing

# Run comprehensive leak detection on staging
sudo /usr/share/bcc/tools/memleak -p $(pidof myapp) -T 600 -s > leak_report.txt

Reproduction of Known Leaks

  • Controlled environment reproduction
  • Specific code path exercising
  • Correlation with application logs
  • Validation of leak fix effectiveness

Root Cause Analysis

  • Complete allocation history
  • Stack trace analysis for leak origins
  • Pattern identification across multiple runs
  • Integration with application profiling data

Allocation Pattern Studies

  • Understanding application memory usage patterns
  • Identifying hot allocation paths
  • Memory usage optimization opportunities
  • Allocation frequency analysis

Repository & Documentation

Official Sources

Implementation Details

  • eBPF Program: Uprobe attachment to malloc/free family functions
  • Data Collection: Hash table storage of allocation metadata
  • Stack Walking: Kernel and user-space stack unwinding
  • Symbol Resolution: Integration with system symbol tables

Code Examples

Full Tracing Setup

#!/bin/bash
# comprehensive-leak-detection.sh

# Configuration
PID=$1
DURATION=${2:-300}  # 5 minutes default
OUTPUT_FILE="leak_report_$(date +%Y%m%d_%H%M%S).txt"

# Pre-checks
if ! command -v /usr/share/bcc/tools/memleak &> /dev/null; then
    echo "ERROR: BCC tools not installed"
    exit 1
fi

if [ -z "$PID" ]; then
    echo "Usage: $0 <pid> [duration_seconds]"
    exit 1
fi

# Run full tracing
echo "Starting full memory leak detection for PID $PID"
echo "Duration: $DURATION seconds"
echo "Output: $OUTPUT_FILE"

sudo /usr/share/bcc/tools/memleak \
    -p $PID \
    -T $DURATION \
    -s \
    --stack-depth 20 \
    > "$OUTPUT_FILE" 2>&1

echo "Leak detection complete. Results in $OUTPUT_FILE"

Output Analysis Script

#!/usr/bin/env python3
# analyze_leak_report.py

import re
import sys
from collections import defaultdict

def parse_leak_report(filename):
    """Parse BCC memleak output and extract key information."""
    leaks = []
    current_leak = {}
    
    with open(filename, 'r') as f:
        content = f.read()
    
    # Parse leak entries
    leak_pattern = r'(\d+) bytes in (\d+) allocations from stack\n(.*?)\n\n'
    matches = re.findall(leak_pattern, content, re.DOTALL)
    
    for bytes_leaked, alloc_count, stack_trace in matches:
        leaks.append({
            'bytes': int(bytes_leaked),
            'count': int(alloc_count),
            'stack': stack_trace.strip(),
            'avg_size': int(bytes_leaked) // int(alloc_count)
        })
    
    return leaks

def analyze_leaks(leaks):
    """Analyze leak patterns and generate insights."""
    if not leaks:
        return "No leaks detected"
    
    # Sort by bytes leaked
    leaks.sort(key=lambda x: x['bytes'], reverse=True)
    
    total_leaked = sum(leak['bytes'] for leak in leaks)
    total_allocs = sum(leak['count'] for leak in leaks)
    
    analysis = f"""
Leak Analysis Summary:
=====================
Total Leaked: {total_leaked:,} bytes ({total_leaked/1024/1024:.2f} MB)
Total Leak Sites: {len(leaks)}
Total Leaked Allocations: {total_allocs:,}
Average Leak Size: {total_leaked // total_allocs} bytes

Top 5 Leak Sources:
"""
    
    for i, leak in enumerate(leaks[:5], 1):
        analysis += f"\n{i}. {leak['bytes']:,} bytes in {leak['count']} allocations"
        analysis += f" (avg: {leak['avg_size']} bytes)\n"
        # Show first few lines of stack trace
        stack_lines = leak['stack'].split('\n')[:3]
        for line in stack_lines:
            analysis += f"   {line}\n"
    
    return analysis

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: analyze_leak_report.py <leak_report.txt>")
        sys.exit(1)
    
    leaks = parse_leak_report(sys.argv[1])
    print(analyze_leaks(leaks))

Allocation Site Identification

# Focus on specific allocation functions
sudo /usr/share/bcc/tools/memleak -p $PID -T 120 | grep -A 10 "malloc\|calloc\|realloc"

# Filter by minimum leak size (1MB+)
sudo /usr/share/bcc/tools/memleak -p $PID -T 180 | awk '/^[0-9]+ bytes/ && $1 >= 1048576'

Leak Report Generation

# Comprehensive reporting with timestamps
{
    echo "=== Memory Leak Report ==="
    echo "Timestamp: $(date)"
    echo "PID: $PID"
    echo "Command: $(ps -p $PID -o comm=)"
    echo "=========================="
    echo
    
    sudo /usr/share/bcc/tools/memleak -p $PID -T 300 -s
    
    echo
    echo "=== Process Memory Info ==="
    cat /proc/$PID/status | grep -E "VmSize|VmRSS|VmData|VmHWM"
} > "comprehensive_leak_report_$(date +%Y%m%d_%H%M%S).txt"

Performance Impact Studies

MySQL Benchmark Results

Configuration: MySQL 8.0, TPC-C benchmark
Environment: 16-core server, 64GB RAM

Baseline Performance:
- Transactions/sec: 15,247
- Average Latency: 12.3ms
- CPU Usage: 45%

With BCC memleak Full Tracing:
- Transactions/sec: 10,215 (-33%)
- Average Latency: 18.9ms (+54%)
- CPU Usage: 78% (+73%)

Memory Overhead:
- Additional RSS: 2.1GB
- Allocation metadata: ~150 bytes per tracked allocation

Web Server Impact Analysis

Application: Node.js Express API
Load: 1000 req/sec sustained

Normal Operation:
- Response Time P95: 45ms
- Memory Usage: 512MB
- CPU Usage: 30%

With Full Tracing:
- Response Time P95: 180ms (+300%)
- Memory Usage: 1.2GB (+135%)
- CPU Usage: 85% (+183%)

Allocation Rate Impact:
- Normal: ~50K allocs/sec
- With tracing: ~12K effective allocs/sec

Why Overhead is So High

  1. Every Allocation Intercepted

    • Uprobe fires for every malloc/free call
    • Context switching overhead for each probe
    • eBPF program execution time
  2. Metadata Storage

    • Stack trace collection and storage
    • Hash table operations for tracking
    • Memory overhead for tracking structures
  3. Stack Walking

    • Complete stack unwinding for each allocation
    • Symbol resolution overhead
    • Debug information processing
  4. Synchronization Overhead

    • eBPF map operations require synchronization
    • Potential lock contention in high-concurrency scenarios
    • Memory barriers and cache invalidation

When to Use

Appropriate Scenarios

  1. Development Environment Only

    • Local development debugging
    • Unit test memory validation
    • Integration test leak detection
  2. Known Leak Reproduction

    • Reproducing reported memory leaks
    • Validating leak fixes
    • Understanding leak patterns
  3. Last Resort Debugging

    • When other tools provide insufficient data
    • Critical production issue investigation (very brief runs)
    • Memory corruption investigation
  4. Very Brief Periods

    • Maximum 5-10 minutes in development
    • Maximum 1-2 minutes in staging
    • Never continuous operation

When NOT to Use

  • Production environments (continuous monitoring)
  • Performance-critical applications during normal operation
  • High-frequency allocation patterns (millions of allocs/sec)
  • Memory-constrained systems (overhead too high)
  • 24/7 monitoring (use sampling approaches instead)

Alternatives

For Production Monitoring

# Use sampled version instead
sudo /usr/share/bcc/tools/memleak -p $PID -s 1000  # Sample 1 in 1000

# Or use dedicated profilers
export MALLOC_CONF="prof:true,prof_leak:true"
# Run application with jemalloc profiling

Consider jemalloc Profiling

# Enable jemalloc heap profiling
export MALLOC_CONF="prof:true,prof_active:false,prof_leak:true"
# Lower overhead, production-suitable

Page Fault Tracing for Production

# Monitor memory growth via page faults (much lower overhead)
sudo /usr/share/bcc/tools/trace 'p:do_anonymous_page_fault'

Alternative Tools

  • Valgrind: More comprehensive but even higher overhead
  • AddressSanitizer: Compile-time instrumentation option
  • jemalloc profiling: Lower overhead, production-suitable
  • SystemTap: Alternative eBPF-like tracing framework

Best Practices

Pre-Deployment Checklist

  • Confirm non-production environment
  • Set maximum runtime limits (5-10 minutes)
  • Ensure sufficient disk space for output
  • Verify debug symbols availability
  • Plan for application performance degradation
  • Have rollback plan ready

Data Collection Guidelines

  • Start with short runs (30-60 seconds)
  • Focus on specific allocation patterns
  • Combine with application profiling data
  • Document environmental conditions
  • Save complete output for analysis

Analysis Recommendations

  • Focus on largest leaks first
  • Group similar stack traces
  • Correlate with application behavior
  • Validate findings with controlled tests
  • Document leak patterns for future reference

Conclusion

BCC memleak with full tracing is an extremely powerful but heavyweight tool for memory leak detection. Its comprehensive tracking capabilities come at the cost of significant performance overhead, making it suitable only for development environments and brief diagnostic runs.

The tool excels at providing complete, accurate leak detection with minimal false positives, but the 30-400% overhead makes it completely unsuitable for production monitoring. For production environments, consider sampling-based approaches, dedicated profilers like jemalloc, or lightweight page fault tracing methods.

When used appropriately in controlled environments, BCC memleak full tracing provides unmatched visibility into memory allocation patterns and leak sources, making it an invaluable tool for debugging complex memory management issues.

⚠️ **GitHub.com Fallback** ⚠️