Configuration Guide - antimetal/system-agent GitHub Wiki
Configuration Guide
The System Agent provides comprehensive configuration options through command-line flags, environment variables, and configuration files. This guide covers all available settings and common configuration scenarios.
Configuration Methods
Configuration follows this precedence order (highest to lowest):
- Command-line flags
- Environment variables
- Configuration file
- Default values
Command-Line Flags
Core Flags
# Intake Service Configuration
--intake-endpoint string Intake service endpoint (default: "intake.antimetal.com:443")
--intake-api-key string API key for authentication
--intake-batch-size int Max items per batch (default: 100)
--intake-batch-interval duration Batch interval (default: 10s)
--intake-buffer-size int Event buffer size (default: 1000)
# Kubernetes Configuration
--kubeconfig string Path to kubeconfig file
--cluster-name string Override cluster name
--cloud-provider string Force cloud provider (eks, gke, aks, kind)
--namespace string Namespace to watch (empty = all)
# Performance Monitoring
--performance-enabled Enable performance collectors (default: true)
--performance-interval duration Collection interval (default: 60s)
--collectors strings Collectors to enable (default: all)
# Operational
--leader-election Enable leader election (default: true)
--leader-election-namespace Namespace for leader election (default: antimetal-system)
--metrics-bind-address string Metrics endpoint (default: ":8080")
--health-probe-bind-address Health endpoint (default: ":8081")
--log-level string Log level (debug, info, warn, error) (default: "info")
Storage Configuration
# Resource Store
--store-path string BadgerDB storage path (empty = in-memory)
--store-cache-size int LRU cache size (default: 10000)
--store-gc-interval duration Garbage collection interval (default: 5m)
# Performance Metrics Store
--metrics-retention duration Metrics retention period (default: 24h)
--metrics-storage-path string Metrics storage location
Advanced Flags
# Resource Filtering
--include-namespaces strings Namespaces to include (default: all)
--exclude-namespaces strings Namespaces to exclude
--include-resources strings Resource types to include
--exclude-resources strings Resource types to exclude
# Rate Limiting
--k8s-qps float Kubernetes API QPS (default: 50)
--k8s-burst int Kubernetes API burst (default: 100)
--reconcile-workers int Concurrent reconcilers per type (default: 10)
# TLS Configuration
--tls-cert-file string TLS certificate file
--tls-key-file string TLS private key file
--tls-ca-file string TLS CA certificate file
--tls-insecure-skip-verify Skip TLS verification (dev only)
Environment Variables
All flags can be set via environment variables with the prefix ANTIMETAL_
:
# Core settings
export ANTIMETAL_INTAKE_ENDPOINT="intake.antimetal.com:443"
export ANTIMETAL_INTAKE_API_KEY="your-api-key"
export ANTIMETAL_CLUSTER_NAME="prod-cluster"
export ANTIMETAL_LOG_LEVEL="debug"
# Performance monitoring
export ANTIMETAL_PERFORMANCE_ENABLED="true"
export ANTIMETAL_PERFORMANCE_INTERVAL="30s"
export ANTIMETAL_COLLECTORS="cpu,memory,network,disk"
# Container paths (important for containerized deployments)
export HOST_PROC="/host/proc"
export HOST_SYS="/host/sys"
export HOST_DEV="/host/dev"
Configuration File
YAML configuration file (specified with --config
):
# config.yaml
intake:
endpoint: "intake.antimetal.com:443"
apiKey: "${ANTIMETAL_API_KEY}" # Environment variable substitution
batchSize: 100
batchInterval: "10s"
bufferSize: 1000
# TLS configuration
tls:
certFile: "/etc/antimetal/tls/tls.crt"
keyFile: "/etc/antimetal/tls/tls.key"
caFile: "/etc/antimetal/tls/ca.crt"
insecureSkipVerify: false
kubernetes:
# Leave empty for in-cluster config
kubeconfig: ""
# Cloud provider settings
cloudProvider: "auto" # auto, eks, gke, aks, kind
clusterName: "" # Auto-detected if empty
# Resource filtering
namespaces:
include: [] # All namespaces if empty
exclude:
- "kube-system"
- "kube-public"
resources:
include: [] # All resources if empty
exclude:
- "events"
- "endpoints"
# API rate limiting
qps: 50
burst: 100
# Reconciler settings
reconcilers:
workers: 10
maxRetries: 3
performance:
enabled: true
interval: "60s"
# Specific collectors
collectors:
- cpu
- memory
- network
- disk
- filesystem
- tcp
- load
# Collector-specific settings
settings:
network:
interfaces:
exclude:
- "lo"
- "docker0"
filesystem:
types:
include:
- "ext4"
- "xfs"
- "btrfs"
storage:
# Resource store
resource:
path: "" # In-memory if empty
cacheSize: 10000
gcInterval: "5m"
# Metrics store
metrics:
retention: "24h"
path: "/var/lib/antimetal/metrics"
operational:
# Leader election
leaderElection:
enabled: true
namespace: "antimetal-system"
leaseDuration: "15s"
renewDeadline: "10s"
retryPeriod: "2s"
# Observability
metrics:
bindAddress: ":8080"
path: "/metrics"
health:
bindAddress: ":8081"
livenessPath: "/healthz"
readinessPath: "/readyz"
# Logging
logging:
level: "info" # debug, info, warn, error
format: "json" # json, text
output: "stdout"
# Verbose logging for specific components
verbosity:
controller: 1
intake: 2
performance: 1
Deployment Configurations
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: antimetal-agent
namespace: antimetal-system
spec:
replicas: 1
selector:
matchLabels:
app: antimetal-agent
template:
metadata:
labels:
app: antimetal-agent
spec:
serviceAccountName: antimetal-agent
containers:
- name: agent
image: antimetal/system-agent:latest
args:
- --intake-api-key=$(ANTIMETAL_API_KEY)
- --log-level=info
- --leader-election=true
env:
- name: ANTIMETAL_API_KEY
valueFrom:
secretKeyRef:
name: antimetal-credentials
key: api-key
- name: HOST_PROC
value: "/host/proc"
- name: HOST_SYS
value: "/host/sys"
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: config
mountPath: /etc/antimetal
readOnly: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: config
configMap:
name: antimetal-agent-config
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: antimetal-agent-config
namespace: antimetal-system
data:
config.yaml: |
intake:
endpoint: "intake.antimetal.com:443"
batchSize: 100
batchInterval: "10s"
kubernetes:
namespaces:
exclude:
- "kube-system"
- "kube-public"
performance:
enabled: true
interval: "60s"
collectors:
- cpu
- memory
- network
- disk
Secret
apiVersion: v1
kind: Secret
metadata:
name: antimetal-credentials
namespace: antimetal-system
type: Opaque
stringData:
api-key: "your-api-key-here"
Common Scenarios
High-Security Environment
# Strict TLS verification
intake:
tls:
caFile: "/etc/ssl/certs/company-ca.crt"
insecureSkipVerify: false
# Limit data collection
kubernetes:
namespaces:
include:
- "production"
- "staging"
resources:
exclude:
- "secrets"
- "configmaps"
# Disable specific collectors
performance:
collectors:
- cpu
- memory
# Exclude network and disk
Resource-Constrained Environment
# Reduce memory usage
storage:
resource:
cacheSize: 1000 # Smaller cache
metrics:
retention: "6h" # Shorter retention
# Slower collection
performance:
interval: "5m" # Less frequent
# Smaller batches
intake:
batchSize: 50
bufferSize: 500
# Fewer workers
kubernetes:
reconcilers:
workers: 5
Development Environment
# Local development settings
intake:
endpoint: "localhost:50051"
tls:
insecureSkipVerify: true
kubernetes:
cloudProvider: "kind"
operational:
logging:
level: "debug"
format: "text"
# Enable all collectors for testing
performance:
interval: "10s"
collectors: ["all"]
Multi-Cluster Setup
# Cluster-specific configuration
kubernetes:
clusterName: "prod-us-west-2"
# Tag resources with cluster info
labels:
cluster: "prod-us-west-2"
region: "us-west-2"
environment: "production"
# Different endpoints per cluster
intake:
endpoint: "intake-prod.antimetal.com:443"
Performance Tuning
API Rate Limiting
kubernetes:
# Adjust based on cluster size
qps: 100 # Large clusters
burst: 200
reconcilers:
workers: 20 # More parallel processing
Memory Optimization
storage:
resource:
path: "/var/lib/antimetal/store" # Persistent storage
cacheSize: 5000 # Balance memory/performance
intake:
batchSize: 200 # Larger batches
batchInterval: "30s" # Less frequent sends
bufferSize: 2000 # Larger buffer
Network Optimization
intake:
# Compression for slow links
compression: true
# Retry configuration
retry:
maxAttempts: 5
initialBackoff: "1s"
maxBackoff: "30s"
multiplier: 2
Monitoring Configuration
Prometheus Scrape Config
scrape_configs:
- job_name: 'antimetal-agent'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- antimetal-system
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: antimetal-agent
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: metrics
Troubleshooting Configuration
Debug Logging
# Enable debug logging for specific components
--log-level=info \
--log-verbosity=controller:2,intake:3,performance:1
Dry Run Mode
operational:
dryRun: true # Don't send data to intake
intake:
endpoint: "logger://stdout" # Log instead of sending
Configuration Validation
# Validate configuration file
antimetal-agent validate --config config.yaml
# Show effective configuration
antimetal-agent config --show-effective
Best Practices
- Use Secrets: Store API keys in Kubernetes secrets
- Version Control: Keep configuration in Git
- Environment-Specific: Use different configs per environment
- Monitor Resources: Set appropriate limits
- Regular Updates: Keep configuration current
- Documentation: Document custom settings
Next Steps
- Kubernetes Deployment - Deploy to cluster
- Security Considerations - Security best practices
- Troubleshooting - Common issues
For the complete list of configuration options, run antimetal-agent --help