Performance Collectors - antimetal/system-agent GitHub Wiki
Performance Collectors
⚠️ Work in Progress: This documentation is currently being developed and may be incomplete or subject to change.
Overview
Performance collectors are the core components of the Antimetal System Agent responsible for gathering system metrics and hardware information. This page provides an overview of all available collectors, their architecture, and how to work with them.
Collector Architecture
Collector Types
The system supports two main collector types:
- Point Collectors - Gather metrics at a single point in time
- Continuous Collectors - Run continuously and calculate rates/deltas
Collector Interfaces
// Basic collector interface
type Collector interface {
Name() string
Collect(ctx context.Context) (any, error)
}
// Point collector for one-shot collection
type PointCollector interface {
Collector
Capabilities() CollectorCapabilities
}
// Continuous collector with lifecycle management
type ContinuousCollector interface {
Collector
Start(ctx context.Context) error
Stop() error
}
Available Collectors
System Metrics Collectors
Collector | Type | Source | Collection Mode | Description |
---|---|---|---|---|
CPU Collector | cpu |
/proc/stat |
Continuous | CPU usage and time distribution |
Memory Collector | memory |
/proc/meminfo |
Continuous | Memory usage, buffers, cache |
Load Collector | load |
/proc/loadavg |
Continuous | System load averages |
Network Collector | network |
/proc/net/dev |
Continuous | Network interface statistics |
Disk Collector | disk |
/proc/diskstats |
Continuous | Disk I/O statistics |
TCP Collector | tcp |
/proc/net/tcp |
Point | TCP connection states |
Process Collector | process |
/proc/[pid]/* |
Continuous | Per-process metrics |
Kernel Collector | kernel |
/proc/sys/kernel |
Point | Kernel parameters |
Hardware Information Collectors
Collector | Type | Source | Collection Mode | Description |
---|---|---|---|---|
CPU Info Collector | cpu_info |
/proc/cpuinfo |
One-shot | CPU hardware details |
Memory Info Collector | memory_info |
/proc/meminfo |
One-shot | Memory hardware configuration |
Disk Info Collector | disk_info |
/sys/block |
One-shot | Disk hardware details |
Network Info Collector | network_info |
/sys/class/net |
One-shot | Network hardware info |
NUMA Collector | numa |
/sys/devices/system/node |
Mixed | NUMA topology and stats |
eBPF Collectors
Collector | Type | Source | Collection Mode | Description |
---|---|---|---|---|
Execsnoop Collector | execsnoop |
eBPF | Continuous | Process execution tracking |
Collector Capabilities
Each collector declares its capabilities:
type CollectorCapabilities struct {
SupportsOneShot bool // Can do point-in-time collection
SupportsContinuous bool // Can run continuously
RequiresRoot bool // Needs root privileges
RequiresEBPF bool // Needs eBPF support
MinKernelVersion string // Minimum Linux kernel version
}
Collector Registration
Collectors are registered at startup:
// Register a point collector
performance.Register(
performance.MetricTypeCPU,
performance.PartialNewContinuousPointCollector(
collectors.NewCPUCollector,
),
)
// Register a continuous collector
performance.Register(
performance.MetricTypeExecsnoop,
collectors.NewExecsnoopCollector,
)
Data Collection Flow
graph LR
A[Manager] --> B[Collector] --> C[Store]
A --> D[Schedule Collection]
B --> E[Read Data from Source]
C --> F[Aggregate & Buffer]
Configuration
Global Configuration
performance:
enabled: true
interval: 10s
collectors:
- cpu
- memory
- disk
- network
Per-Collector Configuration
collectors:
cpu:
enabled: true
interval: 10s
per_core: true
memory:
enabled: true
interval: 10s
include_swap: true
process:
enabled: true
interval: 30s
top_count: 20
Writing Custom Collectors
Basic Structure
package collectors
import (
"context"
"github.com/antimetal/system-agent/pkg/performance"
)
type MyCollector struct {
logger Logger
config CollectionConfig
}
func NewMyCollector(logger Logger, config CollectionConfig) (*MyCollector, error) {
return &MyCollector{
logger: logger,
config: config,
}, nil
}
func (c *MyCollector) Name() string {
return "my_collector"
}
func (c *MyCollector) Capabilities() CollectorCapabilities {
return CollectorCapabilities{
SupportsOneShot: true,
SupportsContinuous: false,
RequiresRoot: false,
MinKernelVersion: "3.10",
}
}
func (c *MyCollector) Collect(ctx context.Context) (any, error) {
// Implementation
data := &MyMetrics{
Value: 42,
}
return data, nil
}
Best Practices
-
Error Handling
- Return meaningful errors
- Log warnings for non-fatal issues
- Gracefully handle missing data sources
-
Performance
- Minimize allocations
- Reuse buffers
- Cache file handles when appropriate
-
Context Handling
- Check context cancellation
- Respect timeouts
- Clean up resources
-
Testing
- Mock file systems
- Test error conditions
- Benchmark performance
Collector Lifecycle
Point Collectors
- Initialization - Create collector instance
- Collection - Call Collect() method
- Cleanup - Automatic after collection
Continuous Collectors
- Initialization - Create collector instance
- Start - Begin background collection
- Running - Periodic data collection
- Stop - Graceful shutdown
- Cleanup - Release resources
Monitoring Collectors
Metrics
Each collector exposes metrics:
# Collection success/failure
antimetal_collector_collections_total{collector="cpu",status="success"} 1234
antimetal_collector_collections_total{collector="cpu",status="error"} 5
# Collection duration
antimetal_collector_duration_seconds{collector="cpu",quantile="0.5"} 0.001
antimetal_collector_duration_seconds{collector="cpu",quantile="0.99"} 0.005
# Last collection timestamp
antimetal_collector_last_collection_timestamp{collector="cpu"} 1642598400
Health Checks
// Check collector health
health := manager.GetCollectorHealth("cpu")
if !health.Healthy {
log.Errorf("CPU collector unhealthy: %v", health.LastError)
}
Troubleshooting
Common Issues
-
Permission Denied
- Check if collector requires root
- Verify file permissions
- Check SELinux/AppArmor policies
-
File Not Found
- Verify kernel version compatibility
- Check if running in container
- Ensure proc/sys mounted correctly
-
High CPU Usage
- Increase collection interval
- Reduce number of metrics
- Check for inefficient parsing
Debug Mode
Enable debug logging for collectors:
logging:
level: debug
collectors:
- cpu
- memory
Performance Considerations
Collection Overhead
Collector | CPU Impact | Memory Impact | I/O Impact |
---|---|---|---|
CPU | Negligible | Minimal | One file read |
Memory | Negligible | Minimal | One file read |
Process | Low-Medium | Proportional to processes | Multiple file reads |
Network | Low | Minimal | One file read |
Disk | Low | Minimal | One file read |
Optimization Tips
- Batch Operations - Collect multiple metrics in one pass
- Caching - Cache static information
- Filtering - Only collect needed metrics
- Intervals - Adjust based on requirements
Future Collectors
Planned collectors include:
- GPU Metrics - NVIDIA/AMD GPU utilization
- Container Metrics - Docker/containerd statistics
- Application Metrics - JVM, Python, Node.js metrics
- Storage Metrics - Advanced filesystem statistics
- Security Metrics - SELinux, AppArmor events
See Also
- Architecture Overview - System design
- Custom Collectors - Building collectors
- Configuration Guide - Configuration options
- Performance Monitoring - Using collected data