Memory Technologies Production Limited Bytehound - antimetal/system-agent GitHub Wiki

ByteHound

Overview

ByteHound is a sophisticated memory profiler specifically designed for Linux systems, written in Rust. It provides comprehensive memory allocation tracking and analysis capabilities for native applications, with a focus on detecting memory leaks and understanding allocation patterns.

Key Characteristics:

Rust-based memory profiler for Linux
Complete allocation tracking via LD_PRELOAD interception
Custom stack unwinding implementation for performance
Web-based visualization UI
Production-tested by the author in specific scenarios
Multi-architecture support (AMD64, ARM, AArch64, MIPS64)

Performance Characteristics

Overhead Analysis

Performance Impact: Variable depending on application - can be significant for performance-critical systems
Optimization Focus: Uses custom stack unwinding implementation that is "potentially up to orders of magnitude faster" than traditional profiling tools
Production Suitability: Mixed - suitable for investigation periods but may be prohibitive for always-on monitoring
Accuracy: High (100% allocation coverage, no sampling)
False Positives: Low due to complete tracking

Performance Comparison Context

vs Valgrind: Significantly lower overhead than Valgrind's memory tools
vs jemalloc profiling: Higher overhead than statistical profiling but provides complete coverage
vs TCMalloc profiling: Higher overhead than allocator-based profiling (~4% OPS drop, ~10% P99 latency increase for TCMalloc/jemalloc)
Production Reality: TiKV team found overhead too high for their production database workload

Key Features

Core Capabilities

Complete Allocation Tracking: Captures every malloc, free, mmap, munmap, and related calls
Full Stack Traces: Call stack for every allocation and deallocation
Dynamic Culling: Runtime filtering of temporary allocations to reduce data volume
Backtrace Deduplication: Runtime backtrace cache with deduplication for reduced data generation
No Sampling: 100% coverage of memory operations
Real-time Analysis: Can stream profiling data to another machine
Multiple Export Formats: JSON, Heaptrack, flamegraph formats

Advanced Features

Embedded Scripting: Supports Rhai DSL for custom analysis
jemalloc Integration: Special support for jemalloc profiling (AMD64 only)
Programmatic Control: Can be started and stopped programmatically within applications
Shadow Stack Unwinding: Supported on stable Rust, enabled by default
Rust Symbol Demangling: Native support for Rust symbol demangling

System-Agent Implementation Plan

Layer 3 Integration Strategy

ByteHound is positioned as a Layer 3 deep analysis tool for critical memory leak investigations:

Triggered Profiling Approach

# 5-10 minute investigation window
export MEMORY_PROFILER_LOG=warn
export MEMORY_PROFILER_OUTPUT=/tmp/memory-profiling.dat
LD_PRELOAD=./libbytehound.so timeout 600 target-process

Implementation Requirements

Process Restart: Applications must be restarted with LD_PRELOAD
Duration Limits: 5-10 minute profiling windows to manage overhead
Data Collection: Automated collection and transfer of profiling data
Analysis Pipeline: Automated report generation from profiling data

Integration Workflow

Trigger Detection: Memory growth alerts or leak suspicion
Process Preparation: Restart target process with ByteHound attached
Data Collection: Gather profiling data over investigation period
Analysis: Generate comprehensive leak analysis reports
Cleanup: Return process to normal operation

Architecture

Interception Mechanism

ByteHound uses LD_PRELOAD to intercept memory allocation functions:

Intercepted Functions

malloc, calloc, realloc, free
mmap, munmap, mmap64
posix_memalign, aligned_alloc
memalign, valloc, pvalloc

Data Structures

Allocation Tracking: Hash maps for active allocations
Stack Trace Cache: Deduplicated backtrace storage
Temporal Filtering: Dynamic culling of short-lived allocations

Custom Stack Unwinding

ByteHound's performance advantage comes from its custom stack unwinding implementation:

DWARF Preprocessing: DWARF debugging information is preprocessed at startup for faster lookups
Dynamic Patching: Return addresses are dynamically patched to reduce traversal overhead
Optimized Lookups: Stack trace generation optimized for minimal runtime impact

Output Format

Continuous Updates: Data file is continuously updated rather than generated at specific intervals
Streaming Support: Can stream data to remote machines for analysis
Multiple Formats: Native format plus exports to JSON, Heaptrack, and flamegraph

Production Deployments

Author's Production Experience

The ByteHound author has used the tool in production environments, demonstrating its practical viability for specific use cases.

Success Stories

Memory Leak Resolution: Users report successfully identifying and fixing "nasty leaks" using ByteHound
Allocation Pattern Analysis: Effective for understanding complex allocation behaviors
Development Integration: Used during development cycles for memory optimization

Production Constraints

Performance-Critical Systems: May be unsuitable for high-throughput, latency-sensitive applications
Limited Always-On Use: Best suited for investigation periods rather than continuous monitoring
Resource Requirements: Additional RAM needed for profiling data storage

Layer 3 Tool Positioning

ByteHound is most effective as a specialized investigation tool:

Triggered Usage: Deploy when memory issues are suspected
Deep Analysis: Comprehensive data for complex leak scenarios
Root Cause Analysis: Complete allocation history for thorough investigation

Installation & Setup

Building from Source

# Requirements
# - Rust nightly (version 1.62+)
# - Full GCC toolchain
# - Yarn package manager

# Clone repository
git clone https://github.com/koute/bytehound.git
cd bytehound

# Build profiler
cargo build --release

# Build GUI (requires Yarn)
cd gui
yarn install
yarn build
cd ..

Prebuilt Binaries

Download from GitHub releases: https://github.com/koute/bytehound/releases

Configuration Setup

# Basic profiling setup
export MEMORY_PROFILER_LOG=info
export MEMORY_PROFILER_OUTPUT=memory-profiling.dat

# Advanced configuration
export MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1
export MEMORY_PROFILER_GRAB_BACKTRACES=1

Web UI Setup

# Start analysis server
./bytehound server memory-profiling_*.dat

# Access GUI at http://localhost:8080

Code Examples

Integration Wrapper Script

#!/bin/bash
# bytehound-profile.sh - Automated ByteHound profiling wrapper

PROFILE_DURATION=${1:-300}  # Default 5 minutes
TARGET_PROCESS=${2}
OUTPUT_DIR="/tmp/bytehound-profiles"

if [ -z "$TARGET_PROCESS" ]; then
    echo "Usage: $0 [duration_seconds] <target_process>"
    exit 1
fi

# Setup
mkdir -p "$OUTPUT_DIR"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
PROFILE_FILE="$OUTPUT_DIR/memory-profile_${TIMESTAMP}.dat"

# Configure ByteHound
export MEMORY_PROFILER_LOG=warn
export MEMORY_PROFILER_OUTPUT="$PROFILE_FILE"
export MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1

echo "Starting ByteHound profiling for $PROFILE_DURATION seconds..."

# Start profiling
timeout "$PROFILE_DURATION" LD_PRELOAD=./libbytehound.so "$TARGET_PROCESS" &
PROFILE_PID=$!

# Wait for completion
wait $PROFILE_PID

echo "Profiling complete. Data saved to: $PROFILE_FILE"
echo "Start analysis with: ./bytehound server $PROFILE_FILE"

Automated Analysis Setup

#!/usr/bin/env python3
# bytehound-analyzer.py - Automated ByteHound analysis

import subprocess
import json
import sys
from pathlib import Path

class ByteHoundAnalyzer:
    def __init__(self, profile_file):
        self.profile_file = Path(profile_file)
        
    def start_server(self):
        """Start ByteHound analysis server"""
        cmd = ["./bytehound", "server", str(self.profile_file)]
        self.server_proc = subprocess.Popen(cmd)
        
    def export_data(self, format_type="json"):
        """Export profiling data in specified format"""
        output_file = self.profile_file.with_suffix(f".{format_type}")
        cmd = ["./bytehound", "export", str(self.profile_file), "--format", format_type]
        
        with open(output_file, 'w') as f:
            subprocess.run(cmd, stdout=f, check=True)
            
        return output_file
        
    def generate_report(self):
        """Generate summary report from profiling data"""
        # Export as JSON for analysis
        json_file = self.export_data("json")
        
        # Parse and analyze (placeholder for actual analysis logic)
        with open(json_file) as f:
            data = json.load(f)
            
        # Generate summary report
        report = {
            "total_allocations": len(data.get("allocations", [])),
            "peak_memory": self.calculate_peak_memory(data),
            "potential_leaks": self.identify_leaks(data)
        }
        
        return report
        
    def calculate_peak_memory(self, data):
        # Placeholder implementation
        return "N/A"
        
    def identify_leaks(self, data):
        # Placeholder implementation
        return []

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python3 bytehound-analyzer.py <profile_file>")
        sys.exit(1)
        
    analyzer = ByteHoundAnalyzer(sys.argv[1])
    report = analyzer.generate_report()
    
    print(json.dumps(report, indent=2))

System Agent Integration Code

// bytehound-integration.go - System agent ByteHound integration
package main

import (
    "context"
    "fmt"
    "os"
    "os/exec"
    "path/filepath"
    "time"
)

type ByteHoundProfiler struct {
    ProfileDuration time.Duration
    OutputDir       string
    LibraryPath     string
}

func NewByteHoundProfiler() *ByteHoundProfiler {
    return &ByteHoundProfiler{
        ProfileDuration: 5 * time.Minute,
        OutputDir:       "/tmp/bytehound-profiles",
        LibraryPath:     "./libbytehound.so",
    }
}

func (bp *ByteHoundProfiler) ProfileProcess(ctx context.Context, processName string, args []string) error {
    // Ensure output directory exists
    if err := os.MkdirAll(bp.OutputDir, 0755); err != nil {
        return fmt.Errorf("failed to create output directory: %w", err)
    }
    
    // Generate unique profile filename
    timestamp := time.Now().Format("20060102_150405")
    profileFile := filepath.Join(bp.OutputDir, fmt.Sprintf("memory-profile_%s.dat", timestamp))
    
    // Setup environment
    env := os.Environ()
    env = append(env, "MEMORY_PROFILER_LOG=warn")
    env = append(env, fmt.Sprintf("MEMORY_PROFILER_OUTPUT=%s", profileFile))
    env = append(env, "MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS=1")
    env = append(env, fmt.Sprintf("LD_PRELOAD=%s", bp.LibraryPath))
    
    // Create command with timeout
    ctx, cancel := context.WithTimeout(ctx, bp.ProfileDuration)
    defer cancel()
    
    cmd := exec.CommandContext(ctx, processName, args...)
    cmd.Env = env
    
    // Start profiling
    if err := cmd.Run(); err != nil {
        if ctx.Err() == context.DeadlineExceeded {
            // Expected timeout - profiling complete
            fmt.Printf("ByteHound profiling completed. Data saved to: %s\n", profileFile)
            return nil
        }
        return fmt.Errorf("profiling failed: %w", err)
    }
    
    return nil
}

func (bp *ByteHoundProfiler) AnalyzeProfile(profileFile string) error {
    // Start analysis server
    cmd := exec.Command("./bytehound", "server", profileFile)
    return cmd.Start()
}

Optimization Techniques

Backtrace Deduplication

ByteHound implements runtime backtrace caching to reduce data volume:

Hash-based Deduplication: Identical stack traces are stored once and referenced
Dynamic Cache Management: Cache is managed to balance memory usage and lookup performance
Reduced Storage: Significantly reduces profile data size for applications with repetitive allocation patterns

Temporary Allocation Filtering

Dynamic culling of short-lived allocations:

Lifetime Thresholds: Configurable thresholds for allocation lifetime
Runtime Decision Making: Decisions made during profiling to avoid post-processing overhead
Long-term Profiling: Enables extended profiling sessions without excessive data accumulation

Custom Stack Unwinding

The core performance optimization in ByteHound:

DWARF Preprocessing: Debug information is preprocessed at startup for O(1) lookups
Return Address Patching: Dynamic modification of return addresses for efficient stack traversal
Minimal Runtime Overhead: Stack trace generation optimized to minimize impact on target application

Data Compression Strategies

Incremental Updates: Continuous file updates rather than periodic dumps
Structured Storage: Efficient binary format for profiling data
Stream Processing: Real-time data streaming to reduce local storage requirements

Comparison with Alternatives

vs Valgrind Memcheck

Aspect	ByteHound	Valgrind Memcheck
Overhead	Moderate (custom unwinding)	Very High (~20x slowdown)
Coverage	100% allocations	100% allocations + bounds checking
Production Use	Limited investigation periods	Development only
Setup Complexity	LD_PRELOAD	Command wrapper
Real-time Analysis	Yes (web UI)	Post-mortem analysis

vs jemalloc Profiling

Aspect	ByteHound	jemalloc Profiling
Overhead	Higher	Lower (~4% OPS impact)
Coverage	Complete tracking	Statistical sampling
Restart Required	Yes (LD_PRELOAD)	No (runtime toggle)
Analysis Depth	Full allocation history	Aggregate statistics
Integration	External tool	Built-in allocator feature

vs BCC memleak

Aspect	ByteHound	BCC memleak
Overhead	Moderate	Lower (eBPF)
Restart Required	Yes	No
Kernel Requirements	Standard Linux	BPF-enabled kernel
Analysis UI	Web-based GUI	Command-line output
Historical Data	Complete timeline	Point-in-time snapshots

vs TCMalloc Profiling

Aspect	ByteHound	TCMalloc Profiling
Overhead	Higher	Lower (~4% OPS, ~10% P99 latency)
Coverage	All allocations	Sampled allocations
Configuration	Environment variables	Runtime heap profiling
Visualization	Web UI	pprof integration
Production Deployment	Investigation periods	Continuous profiling possible

Sweet Spot Analysis

ByteHound is optimal for:

Critical Leak Investigations: When complete allocation history is needed
Complex Memory Pattern Analysis: Understanding intricate allocation behaviors
Development Phase Profiling: Deep analysis during development cycles
One-off Investigations: Thorough analysis of specific memory issues

Not optimal for:

Always-on Production Monitoring: Overhead may be prohibitive
High-frequency Profiling: Better suited for focused investigation periods
Performance-critical Systems: May impact application performance too significantly

Web UI Features

Timeline Visualization

Memory Usage Timeline: Visual representation of memory usage over time
Allocation Rate Charts: Graphs showing allocation and deallocation rates
Interactive Timeline: Zoom and pan through profiling session timeline
Event Correlation: Correlate allocation events with application behavior

Allocation Site Ranking

Top Allocators: Ranked list of allocation sites by total memory allocated
Leak Suspects: Allocations sites with high net allocation (allocated - deallocated)
Frequency Analysis: Most frequently called allocation sites
Size Distribution: Analysis of allocation size patterns

Memory Growth Analysis

Growth Trend Analysis: Identification of memory growth patterns
Leak Detection: Automatic identification of potential memory leaks
Allocation Lifetime Analysis: Understanding of allocation lifecycle patterns
Memory Map Visualization: Visual representation of memory layout

Interactive Exploration

Stack Trace Navigation: Click-through navigation of allocation call stacks
Source Code Integration: Jump to source code locations (when debug info available)
Filtering and Search: Advanced filtering capabilities for specific allocation patterns
Export Capabilities: Export analysis results in various formats

Advanced Analysis Features

Allocation Clustering: Group similar allocations for pattern analysis
Temporal Analysis: Understand how allocation patterns change over time
Memory Fragmentation Analysis: Identify memory fragmentation issues
Custom Queries: Use embedded scripting for custom analysis queries

Conclusion

ByteHound represents a sophisticated approach to memory profiling, offering complete allocation tracking with optimized performance characteristics. While its overhead makes it unsuitable for always-on production monitoring, it excels as a Layer 3 investigation tool for complex memory leak analysis.

The tool's custom stack unwinding implementation and comprehensive feature set make it particularly valuable for:

Deep memory leak investigations
Understanding complex allocation patterns
Development-phase memory optimization
Root cause analysis of memory-related performance issues

Organizations should evaluate ByteHound's overhead in their specific environment and use it strategically for focused investigation periods when comprehensive memory analysis is required.

Memory Technologies Production Limited Bytehound - antimetal/system-agent GitHub Wiki

ByteHound

Overview

Performance Characteristics

Overhead Analysis

Performance Comparison Context

Key Features

Core Capabilities

Advanced Features

System-Agent Implementation Plan

Layer 3 Integration Strategy

Triggered Profiling Approach

Implementation Requirements

Integration Workflow

Architecture

Interception Mechanism

Intercepted Functions

Data Structures

Custom Stack Unwinding

Output Format

Production Deployments

Author's Production Experience

Success Stories

Production Constraints

Layer 3 Tool Positioning

Installation & Setup

Building from Source

Prebuilt Binaries

Configuration Setup

Web UI Setup

Code Examples

Integration Wrapper Script

Automated Analysis Setup

System Agent Integration Code

Optimization Techniques

Backtrace Deduplication

Temporary Allocation Filtering

Custom Stack Unwinding

Data Compression Strategies

Comparison with Alternatives

vs Valgrind Memcheck

vs jemalloc Profiling

vs BCC memleak

vs TCMalloc Profiling

Sweet Spot Analysis

Web UI Features

Timeline Visualization

Allocation Site Ranking

Memory Growth Analysis

Interactive Exploration

Advanced Analysis Features

Conclusion

See Also