Profiler Collector - antimetal/system-agent GitHub Wiki

Profiler Collector

The ProfilerCollector provides CPU profiling using eBPF with ring buffer streaming for real-time performance analysis. It uses CO-RE (Compile Once - Run Everywhere) technology for portability across kernel versions 4.18+.

Overview

The profiler collector captures CPU sampling data using hardware Performance Monitoring Unit (PMU) events or software timers. It streams profile events through a ring buffer for minimal overhead and zero data loss.

Key Features

Ring Buffer Streaming: 8MB ring buffer for high-throughput, zero-loss sampling
Hardware PMU Events: CPU cycles, instructions, cache references, branch predictions
Software Events: CPU clock, task clock, context switches (virtualization-friendly)
Event Enumeration: Automatic discovery of available perf events on the system
Fail-Fast Design: Clear error messages with actionable alternatives
Cross-Platform Support: Full Linux implementation with stub fallbacks

Architecture

eBPF Components

Ring Buffer Map

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 8 * 1024 * 1024);  // 8MB buffer
} events SEC(".maps");

Profile Event Structure (32 bytes)

struct profile_event {
    __u64 timestamp;        // nanoseconds since boot
    __s32 pid;              // process ID
    __s32 tid;              // thread ID
    __s32 user_stack_id;    // user stack trace ID
    __s32 kernel_stack_id;  // kernel stack trace ID
    __u32 cpu;              // CPU number
    __u32 flags;            // event flags
} __attribute__((packed));

Go Implementation

The userspace collector manages eBPF programs and processes events:

type ProfilerCollector struct {
    performance.BaseContinuousCollector
    
    mu         sync.RWMutex
    objs       *profilerObjects    // eBPF objects
    links      []link.Link         // perf event links
    ringReader *ringbuf.Reader     // ring buffer reader
    outputChan chan any            // output channel
    stopChan   chan struct{}       // stop signal
    wg         sync.WaitGroup      // goroutine management
    isRunning  bool
    
    samplePeriod uint64             // sampling configuration
}

Perf Event System

Hardware Events (PMU)

Require Performance Monitoring Unit access - available on bare metal, limited in VMs:

cpu-cycles: CPU cycles consumed by tasks
instructions: Instructions executed
cache-references: Cache references by tasks
cache-misses: Cache misses by tasks
branch-instructions: Branch instructions executed
branch-misses: Branch mispredictions

Software Events (Virtualization-Friendly)

Always available, work in virtualized environments:

cpu-clock: High-resolution CPU timer
task-clock: Task clock time
page-faults: Total page faults
context-switches: Context switches
cpu-migrations: CPU migrations

Event Enumeration

The profiler automatically discovers available events:

// Get all available events
events, err := profiler.EnumerateAvailableEvents()

// Get just event names
names, err := profiler.GetAvailableEventNames()

// Get categorized summary
summary, err := profiler.GetEventSummary()

// Find specific event
event, err := profiler.FindEventByName("cpu-cycles")

Usage

Basic Usage

import (
    "github.com/antimetal/agent/pkg/performance"
    "github.com/antimetal/agent/pkg/performance/collectors"
)

// Create collector
config := performance.CollectionConfig{
    Interval: time.Second,
}
profiler, err := collectors.NewProfilerCollector(logger, config)
if err != nil {
    log.Fatal(err)
}

// Start profiling
ctx := context.Background()
eventChan, err := profiler.Start(ctx)
if err != nil {
    log.Fatal(err)
}

// Process events
for event := range eventChan {
    profileEvent := event.(*collectors.ProfileEvent)
    fmt.Printf("PID %d, CPU %d, Timestamp %d\n", 
        profileEvent.PID, profileEvent.CPU, profileEvent.Timestamp)
}

Error Handling

The profiler uses fail-fast design with actionable error messages:

failed to attach hardware perf event 'cpu-cycles'. This usually means:
1. Running in a VM without PMU access
2. Missing CAP_SYS_ADMIN or CAP_PERFMON capabilities
3. Hardware PMU not available

Available events on this system: [cpu-clock, task-clock, page-faults, context-switches]

To use software events, explicitly configure the profiler with events like 'cpu-clock' or 'task-clock'.

Platform Support

Linux Implementation

Full eBPF profiler with:

Hardware and software perf event support
Ring buffer streaming
Event enumeration from /sys/devices/cpu/events/
Capability checking

Files:

pkg/performance/collectors/profiler.go - Main implementation
pkg/performance/collectors/profiler_perf_events.go - Event enumeration
ebpf/src/profiler.bpf.c - eBPF program
ebpf/include/profiler_types.h - Shared data structures

Non-Linux Stub

Provides compatible interface with error messages:

pkg/performance/collectors/profiler_stub.go - Stub implementation
pkg/performance/collectors/profiler_perf_events_stub.go - Event stubs

Configuration

Default Configuration

Sample Frequency: 99 Hz (99 samples per second)
Ring Buffer Size: 8MB
Default Event: cpu-cycles (hardware PMU)
Channel Buffer: 1000 events

Capabilities Required

CAP_SYS_ADMIN or CAP_PERFMON (Linux 5.8+)
CAP_BPF (Linux 5.8+)
Kernel version 4.18+ for CO-RE support

Testing

Unit Tests

go test ./pkg/performance/collectors -run TestProfiler

Integration Tests

go test -tags=integration ./pkg/performance/collectors -run TestProfilerIntegration

Hardware Tests (Bare Metal)

go test -tags=hardware ./pkg/performance/collectors -run TestProfilerHardware

Troubleshooting

Common Issues

"no such file or directory" on perf_event_open

Cause: Running in VM without PMU passthrough
Solution: Use software events like cpu-clock, task-clock

"operation not permitted"

Cause: Missing capabilities
Solution: Run with CAP_SYS_ADMIN or CAP_PERFMON

"device or resource busy"

Cause: PMU events already in use
Solution: Check for other profiling tools, use software events

Verification Commands

# Check perf event paranoid level
cat /proc/sys/kernel/perf_event_paranoid

# List available PMU devices
ls -la /sys/bus/event_source/devices/

# Test hardware events availability
perf list hw | head -5

# Check kernel version
uname -r

VM vs Bare Metal

Environment	Hardware Events	Software Events	Best For
VM/Container	Limited/None	✅ Available	Development, basic profiling
Bare Metal	✅ Full PMU	✅ Available	Production profiling, detailed analysis

Performance Considerations

Overhead

Ring Buffer: ~8MB memory overhead
Sampling Frequency: 99Hz = ~1% CPU overhead
Event Processing: Minimal userspace processing

Data Loss Prevention

Large ring buffer (8MB) prevents event loss
Non-blocking event processing
Configurable output channel buffering

Optimization Tips

Use hardware events on bare metal for accuracy
Use software events in VMs for compatibility
Adjust sampling frequency based on workload
Monitor ring buffer usage for capacity planning

Integration with System Agent

The profiler integrator with the performance monitoring system:

// Registered automatically via init()
performance.Register(performance.MetricTypeProfiler,
    func(logger logr.Logger, config performance.CollectionConfig) (performance.ContinuousCollector, error) {
        return NewProfilerCollector(logger, config)
    },
)

Data Types

Profile events are converted to performance.ProfileStats:

type ProfileStats struct {
    CollectionTime time.Time
    Duration       time.Duration
    EventName      string
    EventType      uint32
    EventConfig    uint64
    SamplePeriod   uint64
    SampleCount    uint64
    LostSamples    uint64
    DroppedSamples uint64
    Stacks         []ProfileStack
    Processes      map[int32]ProfileProcess
}

Future Enhancements

Planned Features

Stack Trace Collection: Capture call stacks with events
Process Filtering: Profile specific processes/containers
Custom Event Configuration: User-configurable event sets
Flamegraph Integration: Direct flamegraph generation
Multi-Event Profiling: Simultaneous multiple event collection

Performance Optimizations

Adaptive Sampling: Frequency adjustment based on system load
Compression: Ring buffer data compression
Batch Processing: Efficient event batching

For implementation details, see the source code at:

Main implementation: pkg/performance/collectors/profiler.go
Event enumeration: pkg/performance/collectors/profiler_perf_events.go
eBPF program: ebpf/src/profiler.bpf.c