Profiler Collector - antimetal/system-agent GitHub Wiki
Profiler Collector
The ProfilerCollector provides CPU profiling using eBPF with ring buffer streaming for real-time performance analysis. It uses CO-RE (Compile Once - Run Everywhere) technology for portability across kernel versions 4.18+.
Overview
The profiler collector captures CPU sampling data using hardware Performance Monitoring Unit (PMU) events or software timers. It streams profile events through a ring buffer for minimal overhead and zero data loss.
Key Features
- Ring Buffer Streaming: 8MB ring buffer for high-throughput, zero-loss sampling
- Hardware PMU Events: CPU cycles, instructions, cache references, branch predictions
- Software Events: CPU clock, task clock, context switches (virtualization-friendly)
- Event Enumeration: Automatic discovery of available perf events on the system
- Fail-Fast Design: Clear error messages with actionable alternatives
- Cross-Platform Support: Full Linux implementation with stub fallbacks
Architecture
eBPF Components
Ring Buffer Map
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 8 * 1024 * 1024); // 8MB buffer
} events SEC(".maps");
Profile Event Structure (32 bytes)
struct profile_event {
__u64 timestamp; // nanoseconds since boot
__s32 pid; // process ID
__s32 tid; // thread ID
__s32 user_stack_id; // user stack trace ID
__s32 kernel_stack_id; // kernel stack trace ID
__u32 cpu; // CPU number
__u32 flags; // event flags
} __attribute__((packed));
Go Implementation
The userspace collector manages eBPF programs and processes events:
type ProfilerCollector struct {
performance.BaseContinuousCollector
mu sync.RWMutex
objs *profilerObjects // eBPF objects
links []link.Link // perf event links
ringReader *ringbuf.Reader // ring buffer reader
outputChan chan any // output channel
stopChan chan struct{} // stop signal
wg sync.WaitGroup // goroutine management
isRunning bool
samplePeriod uint64 // sampling configuration
}
Perf Event System
Hardware Events (PMU)
Require Performance Monitoring Unit access - available on bare metal, limited in VMs:
cpu-cycles
: CPU cycles consumed by tasksinstructions
: Instructions executedcache-references
: Cache references by taskscache-misses
: Cache misses by tasksbranch-instructions
: Branch instructions executedbranch-misses
: Branch mispredictions
Software Events (Virtualization-Friendly)
Always available, work in virtualized environments:
cpu-clock
: High-resolution CPU timertask-clock
: Task clock timepage-faults
: Total page faultscontext-switches
: Context switchescpu-migrations
: CPU migrations
Event Enumeration
The profiler automatically discovers available events:
// Get all available events
events, err := profiler.EnumerateAvailableEvents()
// Get just event names
names, err := profiler.GetAvailableEventNames()
// Get categorized summary
summary, err := profiler.GetEventSummary()
// Find specific event
event, err := profiler.FindEventByName("cpu-cycles")
Usage
Basic Usage
import (
"github.com/antimetal/agent/pkg/performance"
"github.com/antimetal/agent/pkg/performance/collectors"
)
// Create collector
config := performance.CollectionConfig{
Interval: time.Second,
}
profiler, err := collectors.NewProfilerCollector(logger, config)
if err != nil {
log.Fatal(err)
}
// Start profiling
ctx := context.Background()
eventChan, err := profiler.Start(ctx)
if err != nil {
log.Fatal(err)
}
// Process events
for event := range eventChan {
profileEvent := event.(*collectors.ProfileEvent)
fmt.Printf("PID %d, CPU %d, Timestamp %d\n",
profileEvent.PID, profileEvent.CPU, profileEvent.Timestamp)
}
Error Handling
The profiler uses fail-fast design with actionable error messages:
failed to attach hardware perf event 'cpu-cycles'. This usually means:
1. Running in a VM without PMU access
2. Missing CAP_SYS_ADMIN or CAP_PERFMON capabilities
3. Hardware PMU not available
Available events on this system: [cpu-clock, task-clock, page-faults, context-switches]
To use software events, explicitly configure the profiler with events like 'cpu-clock' or 'task-clock'.
Platform Support
Linux Implementation
Full eBPF profiler with:
- Hardware and software perf event support
- Ring buffer streaming
- Event enumeration from
/sys/devices/cpu/events/
- Capability checking
Files:
pkg/performance/collectors/profiler.go
- Main implementationpkg/performance/collectors/profiler_perf_events.go
- Event enumerationebpf/src/profiler.bpf.c
- eBPF programebpf/include/profiler_types.h
- Shared data structures
Non-Linux Stub
Provides compatible interface with error messages:
pkg/performance/collectors/profiler_stub.go
- Stub implementationpkg/performance/collectors/profiler_perf_events_stub.go
- Event stubs
Configuration
Default Configuration
- Sample Frequency: 99 Hz (99 samples per second)
- Ring Buffer Size: 8MB
- Default Event:
cpu-cycles
(hardware PMU) - Channel Buffer: 1000 events
Capabilities Required
CAP_SYS_ADMIN
orCAP_PERFMON
(Linux 5.8+)CAP_BPF
(Linux 5.8+)- Kernel version 4.18+ for CO-RE support
Testing
Unit Tests
go test ./pkg/performance/collectors -run TestProfiler
Integration Tests
go test -tags=integration ./pkg/performance/collectors -run TestProfilerIntegration
Hardware Tests (Bare Metal)
go test -tags=hardware ./pkg/performance/collectors -run TestProfilerHardware
Troubleshooting
Common Issues
"no such file or directory" on perf_event_open
- Cause: Running in VM without PMU passthrough
- Solution: Use software events like
cpu-clock
,task-clock
"operation not permitted"
- Cause: Missing capabilities
- Solution: Run with
CAP_SYS_ADMIN
orCAP_PERFMON
"device or resource busy"
- Cause: PMU events already in use
- Solution: Check for other profiling tools, use software events
Verification Commands
# Check perf event paranoid level
cat /proc/sys/kernel/perf_event_paranoid
# List available PMU devices
ls -la /sys/bus/event_source/devices/
# Test hardware events availability
perf list hw | head -5
# Check kernel version
uname -r
VM vs Bare Metal
Environment | Hardware Events | Software Events | Best For |
---|---|---|---|
VM/Container | Limited/None | ✅ Available | Development, basic profiling |
Bare Metal | ✅ Full PMU | ✅ Available | Production profiling, detailed analysis |
Performance Considerations
Overhead
- Ring Buffer: ~8MB memory overhead
- Sampling Frequency: 99Hz = ~1% CPU overhead
- Event Processing: Minimal userspace processing
Data Loss Prevention
- Large ring buffer (8MB) prevents event loss
- Non-blocking event processing
- Configurable output channel buffering
Optimization Tips
- Use hardware events on bare metal for accuracy
- Use software events in VMs for compatibility
- Adjust sampling frequency based on workload
- Monitor ring buffer usage for capacity planning
Integration with System Agent
The profiler integrator with the performance monitoring system:
// Registered automatically via init()
performance.Register(performance.MetricTypeProfiler,
func(logger logr.Logger, config performance.CollectionConfig) (performance.ContinuousCollector, error) {
return NewProfilerCollector(logger, config)
},
)
Data Types
Profile events are converted to performance.ProfileStats
:
type ProfileStats struct {
CollectionTime time.Time
Duration time.Duration
EventName string
EventType uint32
EventConfig uint64
SamplePeriod uint64
SampleCount uint64
LostSamples uint64
DroppedSamples uint64
Stacks []ProfileStack
Processes map[int32]ProfileProcess
}
Future Enhancements
Planned Features
- Stack Trace Collection: Capture call stacks with events
- Process Filtering: Profile specific processes/containers
- Custom Event Configuration: User-configurable event sets
- Flamegraph Integration: Direct flamegraph generation
- Multi-Event Profiling: Simultaneous multiple event collection
Performance Optimizations
- Adaptive Sampling: Frequency adjustment based on system load
- Compression: Ring buffer data compression
- Batch Processing: Efficient event batching
For implementation details, see the source code at:
- Main implementation:
pkg/performance/collectors/profiler.go
- Event enumeration:
pkg/performance/collectors/profiler_perf_events.go
- eBPF program:
ebpf/src/profiler.bpf.c