Configuration Guide - antimetal/system-agent GitHub Wiki

Configuration Guide

The System Agent provides comprehensive configuration options through command-line flags, environment variables, and configuration files. This guide covers all available settings and common configuration scenarios.

Configuration Methods

Configuration follows this precedence order (highest to lowest):

  1. Command-line flags
  2. Environment variables
  3. Configuration file
  4. Default values

Command-Line Flags

Core Flags

# Intake Service Configuration
--intake-endpoint string        Intake service endpoint (default: "intake.antimetal.com:443")
--intake-api-key string         API key for authentication
--intake-batch-size int         Max items per batch (default: 100)
--intake-batch-interval duration Batch interval (default: 10s)
--intake-buffer-size int        Event buffer size (default: 1000)

# Kubernetes Configuration
--kubeconfig string             Path to kubeconfig file
--cluster-name string           Override cluster name
--cloud-provider string         Force cloud provider (eks, gke, aks, kind)
--namespace string              Namespace to watch (empty = all)

# Performance Monitoring
--performance-enabled           Enable performance collectors (default: true)
--performance-interval duration Collection interval (default: 60s)
--collectors strings            Collectors to enable (default: all)

# Operational
--leader-election              Enable leader election (default: true)
--leader-election-namespace    Namespace for leader election (default: antimetal-system)
--metrics-bind-address string  Metrics endpoint (default: ":8080")
--health-probe-bind-address    Health endpoint (default: ":8081")
--log-level string             Log level (debug, info, warn, error) (default: "info")

Storage Configuration

# Resource Store
--store-path string            BadgerDB storage path (empty = in-memory)
--store-cache-size int         LRU cache size (default: 10000)
--store-gc-interval duration   Garbage collection interval (default: 5m)

# Performance Metrics Store
--metrics-retention duration   Metrics retention period (default: 24h)
--metrics-storage-path string  Metrics storage location

Advanced Flags

# Resource Filtering
--include-namespaces strings   Namespaces to include (default: all)
--exclude-namespaces strings   Namespaces to exclude
--include-resources strings    Resource types to include
--exclude-resources strings    Resource types to exclude

# Rate Limiting
--k8s-qps float               Kubernetes API QPS (default: 50)
--k8s-burst int               Kubernetes API burst (default: 100)
--reconcile-workers int       Concurrent reconcilers per type (default: 10)

# TLS Configuration
--tls-cert-file string        TLS certificate file
--tls-key-file string         TLS private key file
--tls-ca-file string          TLS CA certificate file
--tls-insecure-skip-verify    Skip TLS verification (dev only)

Environment Variables

All flags can be set via environment variables with the prefix ANTIMETAL_:

# Core settings
export ANTIMETAL_INTAKE_ENDPOINT="intake.antimetal.com:443"
export ANTIMETAL_INTAKE_API_KEY="your-api-key"
export ANTIMETAL_CLUSTER_NAME="prod-cluster"
export ANTIMETAL_LOG_LEVEL="debug"

# Performance monitoring
export ANTIMETAL_PERFORMANCE_ENABLED="true"
export ANTIMETAL_PERFORMANCE_INTERVAL="30s"
export ANTIMETAL_COLLECTORS="cpu,memory,network,disk"

# Container paths (important for containerized deployments)
export HOST_PROC="/host/proc"
export HOST_SYS="/host/sys"
export HOST_DEV="/host/dev"

Configuration File

YAML configuration file (specified with --config):

# config.yaml
intake:
  endpoint: "intake.antimetal.com:443"
  apiKey: "${ANTIMETAL_API_KEY}"  # Environment variable substitution
  batchSize: 100
  batchInterval: "10s"
  bufferSize: 1000
  
  # TLS configuration
  tls:
    certFile: "/etc/antimetal/tls/tls.crt"
    keyFile: "/etc/antimetal/tls/tls.key"
    caFile: "/etc/antimetal/tls/ca.crt"
    insecureSkipVerify: false

kubernetes:
  # Leave empty for in-cluster config
  kubeconfig: ""
  
  # Cloud provider settings
  cloudProvider: "auto"  # auto, eks, gke, aks, kind
  clusterName: ""        # Auto-detected if empty
  
  # Resource filtering
  namespaces:
    include: []  # All namespaces if empty
    exclude:
      - "kube-system"
      - "kube-public"
  
  resources:
    include: []  # All resources if empty
    exclude:
      - "events"
      - "endpoints"
  
  # API rate limiting
  qps: 50
  burst: 100
  
  # Reconciler settings
  reconcilers:
    workers: 10
    maxRetries: 3

performance:
  enabled: true
  interval: "60s"
  
  # Specific collectors
  collectors:
    - cpu
    - memory
    - network
    - disk
    - filesystem
    - tcp
    - load
  
  # Collector-specific settings
  settings:
    network:
      interfaces:
        exclude:
          - "lo"
          - "docker0"
    
    filesystem:
      types:
        include:
          - "ext4"
          - "xfs"
          - "btrfs"

storage:
  # Resource store
  resource:
    path: ""  # In-memory if empty
    cacheSize: 10000
    gcInterval: "5m"
  
  # Metrics store
  metrics:
    retention: "24h"
    path: "/var/lib/antimetal/metrics"

operational:
  # Leader election
  leaderElection:
    enabled: true
    namespace: "antimetal-system"
    leaseDuration: "15s"
    renewDeadline: "10s"
    retryPeriod: "2s"
  
  # Observability
  metrics:
    bindAddress: ":8080"
    path: "/metrics"
  
  health:
    bindAddress: ":8081"
    livenessPath: "/healthz"
    readinessPath: "/readyz"
  
  # Logging
  logging:
    level: "info"  # debug, info, warn, error
    format: "json" # json, text
    output: "stdout"
    
    # Verbose logging for specific components
    verbosity:
      controller: 1
      intake: 2
      performance: 1

Deployment Configurations

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: antimetal-agent
  namespace: antimetal-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: antimetal-agent
  template:
    metadata:
      labels:
        app: antimetal-agent
    spec:
      serviceAccountName: antimetal-agent
      containers:
      - name: agent
        image: antimetal/system-agent:latest
        args:
          - --intake-api-key=$(ANTIMETAL_API_KEY)
          - --log-level=info
          - --leader-election=true
        env:
        - name: ANTIMETAL_API_KEY
          valueFrom:
            secretKeyRef:
              name: antimetal-credentials
              key: api-key
        - name: HOST_PROC
          value: "/host/proc"
        - name: HOST_SYS
          value: "/host/sys"
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: config
          mountPath: /etc/antimetal
          readOnly: true
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: config
        configMap:
          name: antimetal-agent-config

ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: antimetal-agent-config
  namespace: antimetal-system
data:
  config.yaml: |
    intake:
      endpoint: "intake.antimetal.com:443"
      batchSize: 100
      batchInterval: "10s"
    
    kubernetes:
      namespaces:
        exclude:
          - "kube-system"
          - "kube-public"
    
    performance:
      enabled: true
      interval: "60s"
      collectors:
        - cpu
        - memory
        - network
        - disk

Secret

apiVersion: v1
kind: Secret
metadata:
  name: antimetal-credentials
  namespace: antimetal-system
type: Opaque
stringData:
  api-key: "your-api-key-here"

Common Scenarios

High-Security Environment

# Strict TLS verification
intake:
  tls:
    caFile: "/etc/ssl/certs/company-ca.crt"
    insecureSkipVerify: false

# Limit data collection
kubernetes:
  namespaces:
    include:
      - "production"
      - "staging"
  resources:
    exclude:
      - "secrets"
      - "configmaps"

# Disable specific collectors
performance:
  collectors:
    - cpu
    - memory
    # Exclude network and disk

Resource-Constrained Environment

# Reduce memory usage
storage:
  resource:
    cacheSize: 1000  # Smaller cache
  metrics:
    retention: "6h"  # Shorter retention

# Slower collection
performance:
  interval: "5m"  # Less frequent

# Smaller batches
intake:
  batchSize: 50
  bufferSize: 500

# Fewer workers
kubernetes:
  reconcilers:
    workers: 5

Development Environment

# Local development settings
intake:
  endpoint: "localhost:50051"
  tls:
    insecureSkipVerify: true

kubernetes:
  cloudProvider: "kind"

operational:
  logging:
    level: "debug"
    format: "text"

# Enable all collectors for testing
performance:
  interval: "10s"
  collectors: ["all"]

Multi-Cluster Setup

# Cluster-specific configuration
kubernetes:
  clusterName: "prod-us-west-2"
  
  # Tag resources with cluster info
  labels:
    cluster: "prod-us-west-2"
    region: "us-west-2"
    environment: "production"

# Different endpoints per cluster
intake:
  endpoint: "intake-prod.antimetal.com:443"

Performance Tuning

API Rate Limiting

kubernetes:
  # Adjust based on cluster size
  qps: 100        # Large clusters
  burst: 200
  
  reconcilers:
    workers: 20   # More parallel processing

Memory Optimization

storage:
  resource:
    path: "/var/lib/antimetal/store"  # Persistent storage
    cacheSize: 5000                    # Balance memory/performance
    
intake:
  batchSize: 200      # Larger batches
  batchInterval: "30s" # Less frequent sends
  bufferSize: 2000    # Larger buffer

Network Optimization

intake:
  # Compression for slow links
  compression: true
  
  # Retry configuration
  retry:
    maxAttempts: 5
    initialBackoff: "1s"
    maxBackoff: "30s"
    multiplier: 2

Monitoring Configuration

Prometheus Scrape Config

scrape_configs:
- job_name: 'antimetal-agent'
  kubernetes_sd_configs:
  - role: pod
    namespaces:
      names:
      - antimetal-system
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_label_app]
    action: keep
    regex: antimetal-agent
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    action: keep
    regex: metrics

Troubleshooting Configuration

Debug Logging

# Enable debug logging for specific components
--log-level=info \
--log-verbosity=controller:2,intake:3,performance:1

Dry Run Mode

operational:
  dryRun: true  # Don't send data to intake
  
intake:
  endpoint: "logger://stdout"  # Log instead of sending

Configuration Validation

# Validate configuration file
antimetal-agent validate --config config.yaml

# Show effective configuration
antimetal-agent config --show-effective

Best Practices

  1. Use Secrets: Store API keys in Kubernetes secrets
  2. Version Control: Keep configuration in Git
  3. Environment-Specific: Use different configs per environment
  4. Monitor Resources: Set appropriate limits
  5. Regular Updates: Keep configuration current
  6. Documentation: Document custom settings

Next Steps


For the complete list of configuration options, run antimetal-agent --help