Memory Technologies Production Limited Bytehound - antimetal/system-agent GitHub Wiki
ByteHound
Overview
ByteHound is a sophisticated memory profiler specifically designed for Linux systems, written in Rust. It provides comprehensive memory allocation tracking and analysis capabilities for native applications, with a focus on detecting memory leaks and understanding allocation patterns.
Key Characteristics:
- Rust-based memory profiler for Linux
- Complete allocation tracking via LD_PRELOAD interception
- Custom stack unwinding implementation for performance
- Web-based visualization UI
- Production-tested by the author in specific scenarios
- Multi-architecture support (AMD64, ARM, AArch64, MIPS64)
Performance Characteristics
Overhead Analysis
- Performance Impact: Variable depending on application - can be significant for performance-critical systems
- Optimization Focus: Uses custom stack unwinding implementation that is "potentially up to orders of magnitude faster" than traditional profiling tools
- Production Suitability: Mixed - suitable for investigation periods but may be prohibitive for always-on monitoring
- Accuracy: High (100% allocation coverage, no sampling)
- False Positives: Low due to complete tracking
Performance Comparison Context
- vs Valgrind: Significantly lower overhead than Valgrind's memory tools
- vs jemalloc profiling: Higher overhead than statistical profiling but provides complete coverage
- vs TCMalloc profiling: Higher overhead than allocator-based profiling (~4% OPS drop, ~10% P99 latency increase for TCMalloc/jemalloc)
- Production Reality: TiKV team found overhead too high for their production database workload
Key Features
Core Capabilities
- Complete Allocation Tracking: Captures every malloc, free, mmap, munmap, and related calls
- Full Stack Traces: Call stack for every allocation and deallocation
- Dynamic Culling: Runtime filtering of temporary allocations to reduce data volume
- Backtrace Deduplication: Runtime backtrace cache with deduplication for reduced data generation
- No Sampling: 100% coverage of memory operations
- Real-time Analysis: Can stream profiling data to another machine
- Multiple Export Formats: JSON, Heaptrack, flamegraph formats
Advanced Features
- Embedded Scripting: Supports Rhai DSL for custom analysis
- jemalloc Integration: Special support for jemalloc profiling (AMD64 only)
- Programmatic Control: Can be started and stopped programmatically within applications
- Shadow Stack Unwinding: Supported on stable Rust, enabled by default
- Rust Symbol Demangling: Native support for Rust symbol demangling
System-Agent Implementation Plan
Layer 3 Integration Strategy
ByteHound is positioned as a Layer 3 deep analysis tool for critical memory leak investigations:
Triggered Profiling Approach
# 5-10 minute investigation window
export MEMORY_PROFILER_LOG=warn
export MEMORY_PROFILER_OUTPUT=/tmp/memory-profiling.dat
LD_PRELOAD=./libbytehound.so timeout 600 target-process
Implementation Requirements
- Process Restart: Applications must be restarted with LD_PRELOAD
- Duration Limits: 5-10 minute profiling windows to manage overhead
- Data Collection: Automated collection and transfer of profiling data
- Analysis Pipeline: Automated report generation from profiling data
Integration Workflow
- Trigger Detection: Memory growth alerts or leak suspicion
- Process Preparation: Restart target process with ByteHound attached
- Data Collection: Gather profiling data over investigation period
- Analysis: Generate comprehensive leak analysis reports
- Cleanup: Return process to normal operation
Architecture
Interception Mechanism
ByteHound uses LD_PRELOAD to intercept memory allocation functions:
Intercepted Functions
malloc
,calloc
,realloc
,free
mmap
,munmap
,mmap64
posix_memalign
,aligned_alloc
memalign
,valloc
,pvalloc
Data Structures
- Allocation Tracking: Hash maps for active allocations
- Stack Trace Cache: Deduplicated backtrace storage
- Temporal Filtering: Dynamic culling of short-lived allocations
Custom Stack Unwinding
ByteHound's performance advantage comes from its custom stack unwinding implementation:
- DWARF Preprocessing: DWARF debugging information is preprocessed at startup for faster lookups
- Dynamic Patching: Return addresses are dynamically patched to reduce traversal overhead
- Optimized Lookups: Stack trace generation optimized for minimal runtime impact
Output Format
- Continuous Updates: Data file is continuously updated rather than generated at specific intervals
- Streaming Support: Can stream data to remote machines for analysis
- Multiple Formats: Native format plus exports to JSON, Heaptrack, and flamegraph
Production Deployments
Author's Production Experience
The ByteHound author has used the tool in production environments, demonstrating its practical viability for specific use cases.
Success Stories
- Memory Leak Resolution: Users report successfully identifying and fixing "nasty leaks" using ByteHound
- Allocation Pattern Analysis: Effective for understanding complex allocation behaviors
- Development Integration: Used during development cycles for memory optimization
Production Constraints
- Performance-Critical Systems: May be unsuitable for high-throughput, latency-sensitive applications
- Limited Always-On Use: Best suited for investigation periods rather than continuous monitoring
- Resource Requirements: Additional RAM needed for profiling data storage
Layer 3 Tool Positioning
ByteHound is most effective as a specialized investigation tool:
- Triggered Usage: Deploy when memory issues are suspected
- Deep Analysis: Comprehensive data for complex leak scenarios
- Root Cause Analysis: Complete allocation history for thorough investigation
Installation & Setup
Building from Source
# Requirements
# - Rust nightly (version 1.62+)
# - Full GCC toolchain
# - Yarn package manager
# Clone repository
git clone https://github.com/koute/bytehound.git
cd bytehound
# Build profiler
cargo build --release
# Build GUI (requires Yarn)
cd gui
yarn install
yarn build
cd ..
Prebuilt Binaries
Download from GitHub releases: https://github.com/koute/bytehound/releases
Configuration Setup
# Basic profiling setup
export MEMORY_PROFILER_LOG=info
export MEMORY_PROFILER_OUTPUT=memory-profiling.dat
# Advanced configuration
export MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1
export MEMORY_PROFILER_GRAB_BACKTRACES=1
Web UI Setup
# Start analysis server
./bytehound server memory-profiling_*.dat
# Access GUI at http://localhost:8080
Code Examples
Integration Wrapper Script
#!/bin/bash
# bytehound-profile.sh - Automated ByteHound profiling wrapper
PROFILE_DURATION=${1:-300} # Default 5 minutes
TARGET_PROCESS=${2}
OUTPUT_DIR="/tmp/bytehound-profiles"
if [ -z "$TARGET_PROCESS" ]; then
echo "Usage: $0 [duration_seconds] <target_process>"
exit 1
fi
# Setup
mkdir -p "$OUTPUT_DIR"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
PROFILE_FILE="$OUTPUT_DIR/memory-profile_${TIMESTAMP}.dat"
# Configure ByteHound
export MEMORY_PROFILER_LOG=warn
export MEMORY_PROFILER_OUTPUT="$PROFILE_FILE"
export MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1
echo "Starting ByteHound profiling for $PROFILE_DURATION seconds..."
# Start profiling
timeout "$PROFILE_DURATION" LD_PRELOAD=./libbytehound.so "$TARGET_PROCESS" &
PROFILE_PID=$!
# Wait for completion
wait $PROFILE_PID
echo "Profiling complete. Data saved to: $PROFILE_FILE"
echo "Start analysis with: ./bytehound server $PROFILE_FILE"
Automated Analysis Setup
#!/usr/bin/env python3
# bytehound-analyzer.py - Automated ByteHound analysis
import subprocess
import json
import sys
from pathlib import Path
class ByteHoundAnalyzer:
def __init__(self, profile_file):
self.profile_file = Path(profile_file)
def start_server(self):
"""Start ByteHound analysis server"""
cmd = ["./bytehound", "server", str(self.profile_file)]
self.server_proc = subprocess.Popen(cmd)
def export_data(self, format_type="json"):
"""Export profiling data in specified format"""
output_file = self.profile_file.with_suffix(f".{format_type}")
cmd = ["./bytehound", "export", str(self.profile_file), "--format", format_type]
with open(output_file, 'w') as f:
subprocess.run(cmd, stdout=f, check=True)
return output_file
def generate_report(self):
"""Generate summary report from profiling data"""
# Export as JSON for analysis
json_file = self.export_data("json")
# Parse and analyze (placeholder for actual analysis logic)
with open(json_file) as f:
data = json.load(f)
# Generate summary report
report = {
"total_allocations": len(data.get("allocations", [])),
"peak_memory": self.calculate_peak_memory(data),
"potential_leaks": self.identify_leaks(data)
}
return report
def calculate_peak_memory(self, data):
# Placeholder implementation
return "N/A"
def identify_leaks(self, data):
# Placeholder implementation
return []
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python3 bytehound-analyzer.py <profile_file>")
sys.exit(1)
analyzer = ByteHoundAnalyzer(sys.argv[1])
report = analyzer.generate_report()
print(json.dumps(report, indent=2))
System Agent Integration Code
// bytehound-integration.go - System agent ByteHound integration
package main
import (
"context"
"fmt"
"os"
"os/exec"
"path/filepath"
"time"
)
type ByteHoundProfiler struct {
ProfileDuration time.Duration
OutputDir string
LibraryPath string
}
func NewByteHoundProfiler() *ByteHoundProfiler {
return &ByteHoundProfiler{
ProfileDuration: 5 * time.Minute,
OutputDir: "/tmp/bytehound-profiles",
LibraryPath: "./libbytehound.so",
}
}
func (bp *ByteHoundProfiler) ProfileProcess(ctx context.Context, processName string, args []string) error {
// Ensure output directory exists
if err := os.MkdirAll(bp.OutputDir, 0755); err != nil {
return fmt.Errorf("failed to create output directory: %w", err)
}
// Generate unique profile filename
timestamp := time.Now().Format("20060102_150405")
profileFile := filepath.Join(bp.OutputDir, fmt.Sprintf("memory-profile_%s.dat", timestamp))
// Setup environment
env := os.Environ()
env = append(env, "MEMORY_PROFILER_LOG=warn")
env = append(env, fmt.Sprintf("MEMORY_PROFILER_OUTPUT=%s", profileFile))
env = append(env, "MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1")
env = append(env, fmt.Sprintf("LD_PRELOAD=%s", bp.LibraryPath))
// Create command with timeout
ctx, cancel := context.WithTimeout(ctx, bp.ProfileDuration)
defer cancel()
cmd := exec.CommandContext(ctx, processName, args...)
cmd.Env = env
// Start profiling
if err := cmd.Run(); err != nil {
if ctx.Err() == context.DeadlineExceeded {
// Expected timeout - profiling complete
fmt.Printf("ByteHound profiling completed. Data saved to: %s\n", profileFile)
return nil
}
return fmt.Errorf("profiling failed: %w", err)
}
return nil
}
func (bp *ByteHoundProfiler) AnalyzeProfile(profileFile string) error {
// Start analysis server
cmd := exec.Command("./bytehound", "server", profileFile)
return cmd.Start()
}
Optimization Techniques
Backtrace Deduplication
ByteHound implements runtime backtrace caching to reduce data volume:
- Hash-based Deduplication: Identical stack traces are stored once and referenced
- Dynamic Cache Management: Cache is managed to balance memory usage and lookup performance
- Reduced Storage: Significantly reduces profile data size for applications with repetitive allocation patterns
Temporary Allocation Filtering
Dynamic culling of short-lived allocations:
- Lifetime Thresholds: Configurable thresholds for allocation lifetime
- Runtime Decision Making: Decisions made during profiling to avoid post-processing overhead
- Long-term Profiling: Enables extended profiling sessions without excessive data accumulation
Custom Stack Unwinding
The core performance optimization in ByteHound:
- DWARF Preprocessing: Debug information is preprocessed at startup for O(1) lookups
- Return Address Patching: Dynamic modification of return addresses for efficient stack traversal
- Minimal Runtime Overhead: Stack trace generation optimized to minimize impact on target application
Data Compression Strategies
- Incremental Updates: Continuous file updates rather than periodic dumps
- Structured Storage: Efficient binary format for profiling data
- Stream Processing: Real-time data streaming to reduce local storage requirements
Comparison with Alternatives
vs Valgrind Memcheck
Aspect | ByteHound | Valgrind Memcheck |
---|---|---|
Overhead | Moderate (custom unwinding) | Very High (~20x slowdown) |
Coverage | 100% allocations | 100% allocations + bounds checking |
Production Use | Limited investigation periods | Development only |
Setup Complexity | LD_PRELOAD | Command wrapper |
Real-time Analysis | Yes (web UI) | Post-mortem analysis |
vs jemalloc Profiling
Aspect | ByteHound | jemalloc Profiling |
---|---|---|
Overhead | Higher | Lower (~4% OPS impact) |
Coverage | Complete tracking | Statistical sampling |
Restart Required | Yes (LD_PRELOAD) | No (runtime toggle) |
Analysis Depth | Full allocation history | Aggregate statistics |
Integration | External tool | Built-in allocator feature |
vs BCC memleak
Aspect | ByteHound | BCC memleak |
---|---|---|
Overhead | Moderate | Lower (eBPF) |
Restart Required | Yes | No |
Kernel Requirements | Standard Linux | BPF-enabled kernel |
Analysis UI | Web-based GUI | Command-line output |
Historical Data | Complete timeline | Point-in-time snapshots |
vs TCMalloc Profiling
Aspect | ByteHound | TCMalloc Profiling |
---|---|---|
Overhead | Higher | Lower (~4% OPS, ~10% P99 latency) |
Coverage | All allocations | Sampled allocations |
Configuration | Environment variables | Runtime heap profiling |
Visualization | Web UI | pprof integration |
Production Deployment | Investigation periods | Continuous profiling possible |
Sweet Spot Analysis
ByteHound is optimal for:
- Critical Leak Investigations: When complete allocation history is needed
- Complex Memory Pattern Analysis: Understanding intricate allocation behaviors
- Development Phase Profiling: Deep analysis during development cycles
- One-off Investigations: Thorough analysis of specific memory issues
Not optimal for:
- Always-on Production Monitoring: Overhead may be prohibitive
- High-frequency Profiling: Better suited for focused investigation periods
- Performance-critical Systems: May impact application performance too significantly
Web UI Features
Timeline Visualization
- Memory Usage Timeline: Visual representation of memory usage over time
- Allocation Rate Charts: Graphs showing allocation and deallocation rates
- Interactive Timeline: Zoom and pan through profiling session timeline
- Event Correlation: Correlate allocation events with application behavior
Allocation Site Ranking
- Top Allocators: Ranked list of allocation sites by total memory allocated
- Leak Suspects: Allocations sites with high net allocation (allocated - deallocated)
- Frequency Analysis: Most frequently called allocation sites
- Size Distribution: Analysis of allocation size patterns
Memory Growth Analysis
- Growth Trend Analysis: Identification of memory growth patterns
- Leak Detection: Automatic identification of potential memory leaks
- Allocation Lifetime Analysis: Understanding of allocation lifecycle patterns
- Memory Map Visualization: Visual representation of memory layout
Interactive Exploration
- Stack Trace Navigation: Click-through navigation of allocation call stacks
- Source Code Integration: Jump to source code locations (when debug info available)
- Filtering and Search: Advanced filtering capabilities for specific allocation patterns
- Export Capabilities: Export analysis results in various formats
Advanced Analysis Features
- Allocation Clustering: Group similar allocations for pattern analysis
- Temporal Analysis: Understand how allocation patterns change over time
- Memory Fragmentation Analysis: Identify memory fragmentation issues
- Custom Queries: Use embedded scripting for custom analysis queries
Conclusion
ByteHound represents a sophisticated approach to memory profiling, offering complete allocation tracking with optimized performance characteristics. While its overhead makes it unsuitable for always-on production monitoring, it excels as a Layer 3 investigation tool for complex memory leak analysis.
The tool's custom stack unwinding implementation and comprehensive feature set make it particularly valuable for:
- Deep memory leak investigations
- Understanding complex allocation patterns
- Development-phase memory optimization
- Root cause analysis of memory-related performance issues
Organizations should evaluate ByteHound's overhead in their specific environment and use it strategically for focused investigation periods when comprehensive memory analysis is required.
See Also
- BCC memleak Full Analysis - eBPF-based alternative with lower overhead
- jemalloc Profiling - Allocator-based profiling with statistical sampling
- TCMalloc Profiling - Google's allocator profiling capabilities
- Memory Leak Detection Comparison Matrix - Comprehensive tool comparison