Getting Started - antimetal/system-agent GitHub Wiki

Getting Started

⚠️ Note: This guide is being updated. The Helm deployment method is not yet available. Please see Building from Source for current deployment options.

This guide will help you quickly deploy and integrate with the Antimetal System Agent.

Prerequisites

Required Tools

  • Kubernetes cluster (1.19+) - K3s, Kind, EKS, GKE, or AKS

  • kubectl configured to access your cluster

  • kustomize (v4.0+) - Required for deployment. Install with:

    curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
    sudo mv kustomize /usr/local/bin/
    
  • Go 1.21+ (1.22+ recommended) - For building from source

  • Docker or Podman - For building container images

  • make - For running build commands

  • Antimetal API credentials - Contact [email protected]

System Requirements

  • Disk Space: 30GB+ recommended (Docker images, build artifacts)
  • Memory: 4GB+ for comfortable development
  • Architecture: AMD64 or ARM64 supported

Optional Tools (for full functionality)

  • gcc - For CGO builds (though we disable it)
  • bpftool - For eBPF collectors (can build without)

Quick Start (Helm - Coming Soon)

Note: Helm charts are under development. See Building from Source below for current deployment method.

1. Deploy the Agent (Future)

# Add the Antimetal Helm repository
helm repo add antimetal https://charts.antimetal.com
helm repo update

# Install the agent
helm install antimetal-agent antimetal/system-agent \
  --namespace antimetal-system \
  --create-namespace \
  --set intake.apiKey="YOUR_API_KEY" \
  --set intake.endpoint="intake.antimetal.com:443"

Building from Source

1. Clone and Build

# Clone the repository
git clone https://github.com/antimetal/system-agent.git
cd system-agent

# Generate required code (eBPF types, protobuf, etc.)
# Note: This requires development tools installed on the host
make generate

# Build the static binary (required for distroless images)
# CGO_ENABLED=0 creates a static binary that works in minimal containers
# Build time: ~1-2 minutes, Output size: ~110MB
CGO_ENABLED=0 go build -o agent ./cmd/main.go

# Verify the binary is static (optional)
file agent  # Should show "statically linked"

# Prepare for Docker build
mkdir -p linux/$(go env GOARCH)
cp agent linux/$(go env GOARCH)/agent

# Build the container image
# Build time: ~30 seconds, Image size: ~108MB
docker build -t your-registry/antimetal-agent:latest .

# Note: If you get permission denied on Docker socket, either:
# - Use sudo: sudo docker build ...
# - Add your user to docker group and start a new shell:
#   sudo usermod -aG docker $USER
#   exec bash -l  # or logout and login again

# For local K3s/Kind clusters, import the image
# K3s:
docker save your-registry/antimetal-agent:latest | sudo k3s ctr images import -
# Kind:
kind load docker-image your-registry/antimetal-agent:latest

# For remote clusters, push to registry
docker push your-registry/antimetal-agent:latest

2. Configure and Deploy

Important: Cluster Configuration

The agent requires a cluster-info ConfigMap in the kube-public namespace with a proper kubeconfig structure. This is different from standard kubeadm clusters - K3s and Kind don't create this automatically.

# Set your API key
export ANTIMETAL_API_KEY="YOUR_API_KEY"

# Create namespace
kubectl create namespace antimetal-system

# Create the cluster-info ConfigMap with proper kubeconfig structure
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-info
  namespace: kube-public
data:
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: $(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
        server: $(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.server}')
      name: my-cluster
    contexts:
    - context:
        cluster: my-cluster
      name: my-cluster
    current-context: my-cluster
EOF

Note: The agent looks for this ConfigMap in kube-public namespace specifically, and expects a kubeconfig field containing cluster connection information.

Deploy the Agent

# Deploy using Kustomize
kustomize build config/default | kubectl apply -f -

# Wait for deployment to be created
kubectl wait --for=condition=available deployment/agent -n antimetal-system --timeout=30s || true

Configure for Your Environment

Here's a complete configuration script that handles all necessary patches:

#!/bin/bash
# Complete configuration for Antimetal Agent

# Your settings
IMAGE="your-registry/antimetal-agent:latest"
API_KEY="${ANTIMETAL_API_KEY:-your-api-key-here}"
CLUSTER_NAME="my-cluster"

# Update image
kubectl set image deployment/agent agent=$IMAGE -n antimetal-system

# Apply all patches in one go
kubectl patch deployment agent -n antimetal-system --type=json -p="[
  {\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/imagePullPolicy\", \"value\": \"IfNotPresent\"},
  {\"op\": \"add\", \"path\": \"/spec/template/spec/containers/0/args/-\", \"value\": \"--intake-api-key=$API_KEY\"},
  {\"op\": \"add\", \"path\": \"/spec/template/spec/containers/0/args/-\", \"value\": \"--intake-address=intake.antimetal.com:443\"}
]"

# For local development, you might want to disable intake
# kubectl patch deployment agent -n antimetal-system --type=json -p='[
#   {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--intake-address=localhost:50051"},
#   {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--intake-secure=false"}
# ]'

3. Verify Installation

Step 1: Check Pod Status

# Check the agent is running
kubectl get pods -n antimetal-system

# Expected output:
# NAME                     READY   STATUS    RESTARTS   AGE
# agent-xxxxxxxxx-xxxxx    1/1     Running   0          2m

Step 2: Verify Agent Logs

# View agent logs
kubectl logs -n antimetal-system -l app.kubernetes.io/name=agent --tail=20

# Healthy agent logs should show:
# {"level":"info","ts":"2025-08-01T02:00:00Z","logger":"setup","msg":"starting manager"}
# {"level":"info","ts":"2025-08-01T02:00:00Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
# I0801 02:00:00.123456       1 leaderelection.go:271] successfully acquired lease antimetal-system/4927b366.antimetal.com
# {"level":"info","ts":"2025-08-01T02:00:01Z","logger":"controller","msg":"Starting EventSource","source":"kind source: *v1.Pod"}
# {"level":"info","ts":"2025-08-01T02:00:01Z","logger":"controller","msg":"Starting Controller"}
# {"level":"info","ts":"2025-08-01T02:00:01Z","logger":"intake-worker","msg":"starting intake worker"}

Step 3: Validate Metrics Endpoint

# Port-forward to metrics service
kubectl port-forward -n antimetal-system svc/metrics-service 8443:8443 &

# Check metrics (may require --insecure flag)
curl -k https://localhost:8443/metrics | grep antimetal

# Kill port-forward when done
kill %1

Step 4: Verify Data Collection

# Check if the agent is collecting resources
kubectl logs -n antimetal-system -l app.kubernetes.io/name=agent | grep -E "resources collected|intake"

# With valid API key, you should see:
# {"level":"info","msg":"resources collected","count":150,"types":["Pod","Node","Service"]}
# {"level":"info","logger":"intake-worker","msg":"batch sent successfully","size":100}

# Without valid API key, you'll see connection errors (expected):
# {"level":"error","logger":"intake-worker","msg":"failed to send batch","error":"connection refused"}

3. Monitor Data Flow

The agent immediately begins collecting and streaming data:

  • Kubernetes resources (pods, nodes, deployments, etc.)
  • Performance metrics (CPU, memory, network, disk)
  • Cloud provider metadata (region, instance types)

Integration Scenarios

Scenario 1: Basic Monitoring

Just deploy the agent - it automatically discovers and monitors all Kubernetes resources.

Scenario 2: Custom Metrics Collection

Add custom performance collectors:

// 1. Implement the Collector interface
type CustomCollector struct {
    performance.BaseCollector
}

func (c *CustomCollector) Collect(ctx context.Context) (any, error) {
    // Your collection logic
    return CustomMetrics{...}, nil
}

// 2. Register your collector
func init() {
    performance.Register(MetricTypeCustom, NewCustomCollector)
}

Scenario 3: Cloud Provider Integration

Add support for a new cloud provider:

// 1. Implement the Provider interface
type MyCloudProvider struct{}

func (p *MyCloudProvider) Name() string {
    return "mycloud"
}

func (p *MyCloudProvider) ClusterName(ctx context.Context) (string, error) {
    // Discover cluster name
    return "prod-cluster", nil
}

func (p *MyCloudProvider) Region(ctx context.Context) (string, error) {
    // Discover region
    return "us-west-2", nil
}

// 2. Register in the provider factory

Scenario 4: Data Processing Integration

Subscribe to resource events:

// Subscribe to specific resource types
store := agent.ResourceStore()
events := store.Subscribe(&TypeDescriptor{
    Group:   "v1",
    Version: "v1", 
    Kind:    "Pod",
})

// Process events
for event := range events {
    switch event.Type {
    case EventTypeAdd:
        // New pod created
    case EventTypeUpdate:
        // Pod updated
    case EventTypeDelete:
        // Pod deleted
    }
}

Configuration Options

Essential Configuration

# values.yaml
intake:
  apiKey: "YOUR_API_KEY"
  endpoint: "intake.antimetal.com:443"
  batchSize: 100
  batchInterval: "10s"

agent:
  logLevel: "info"
  leaderElection: true

performance:
  enabled: true
  interval: "60s"
  collectors:
    - cpu
    - memory
    - network
    - disk

cloudProvider:
  autoDetect: true

Advanced Configuration

See Configuration Guide for all options.

Local Development

Using KIND

# Create a local cluster
make cluster

# Build and load the agent
make build-and-load-image

# Deploy to KIND
make deploy

# View logs
kubectl logs -n antimetal-system -l app=antimetal-agent -f

Using Docker Compose

version: '3.8'
services:
  agent:
    image: antimetal/system-agent:latest
    environment:
      - INTAKE_API_KEY=test-key
      - INTAKE_ENDPOINT=localhost:50051
      - LOG_LEVEL=debug
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro

Monitoring the Agent

Metrics

The agent exposes Prometheus metrics on port 8080:

# Port-forward to access metrics
kubectl port-forward -n antimetal-system svc/antimetal-agent-metrics 8080:8080

# View metrics
curl http://localhost:8080/metrics

Key metrics:

  • antimetal_resources_total - Total resources by type
  • antimetal_intake_batches_sent - Batches sent to intake
  • antimetal_intake_stream_health - Stream connection health
  • antimetal_collector_duration_seconds - Collector execution time

Health Checks

# Liveness probe
curl http://localhost:8080/healthz

# Readiness probe  
curl http://localhost:8080/readyz

Troubleshooting Quick Tips

Agent Not Starting

# Check events
kubectl describe pod -n antimetal-system -l app=antimetal-agent

# Common issues:
# - Invalid API key
# - Network connectivity to intake service
# - Insufficient RBAC permissions

No Data Flowing

# Check intake worker logs
kubectl logs -n antimetal-system -l app=antimetal-agent | grep intake

# Verify connectivity
kubectl exec -n antimetal-system -it deployment/antimetal-agent -- \
  wget -O- https://intake.antimetal.com/health

High Memory Usage

# Check resource usage
kubectl top pod -n antimetal-system

# Adjust batch settings in values.yaml:
intake:
  batchSize: 50  # Reduce batch size
  batchInterval: "5s"  # Send more frequently

Next Steps

Troubleshooting

Quick Checklist

If the agent isn't working, check these common issues first:

  1. βœ“ Namespace exists: kubectl get ns antimetal-system
  2. βœ“ ConfigMap exists: kubectl get cm cluster-info -n antimetal-system
  3. βœ“ Image is available: kubectl describe pod -n antimetal-system | grep Image
  4. βœ“ API key is set: kubectl get deployment agent -n antimetal-system -o yaml | grep intake-api-key
  5. βœ“ Pod is running: kubectl get pods -n antimetal-system

Common Issues

1. Build Errors

eBPF compilation errors

undefined: ExecsnoopEvent

Solution: Run make generate before building to generate eBPF types.

Binary not found in container

exec /agent: no such file or directory

Solution: Ensure you're building a static binary with CGO_ENABLED=0.

2. Deployment Issues

ImagePullBackOff in K3s/Kind

Warning  Failed     22s   kubelet  Error: ImagePullBackOff

Solution: Import the image into your local cluster:

# K3s
docker save your-image:tag | sudo k3s ctr images import -

# Kind
kind load docker-image your-image:tag

Pod CrashLoopBackOff

Error: configmaps "cluster-info" not found
# or
Error: kubeconfig not found in cluster-info ConfigMap

Solution: The agent requires a properly formatted cluster-info ConfigMap in the kube-public namespace:

# Create the ConfigMap with correct structure
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-info
  namespace: kube-public
data:
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: $(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
        server: $(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.server}')
      name: my-cluster
    contexts:
    - context:
        cluster: my-cluster
      name: my-cluster
    current-context: my-cluster
EOF

# Alternative if you're using EKS: Pass cluster name directly
kubectl patch deployment agent -n antimetal-system --type=json -p='[
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubernetes-provider=eks"},
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubernetes-provider-eks-cluster-name=my-cluster"}
]'

Why this happens: K3s and Kind don't automatically create the cluster-info ConfigMap that kubeadm-based clusters have. The agent uses this to determine the cluster name.

3. Missing Dependencies

bpftool not found

Error: bpftool is required to generate vmlinux.h

Solution: Install kernel development tools:

# Ubuntu/Debian
sudo apt-get install linux-tools-common linux-tools-generic

# Or skip eBPF features
make build NO_EBPF=1

Build Requirements

For full functionality including eBPF collectors:

  • Go 1.21+ (1.22+ recommended)
  • Docker or Podman
  • make, gcc, git
  • Linux kernel headers
  • bpftool (for eBPF)

For basic functionality:

  • Go 1.21+
  • Docker
  • make

Verifying Your Setup

  1. Check the agent is running:
kubectl get pods -n antimetal-system
kubectl logs -n antimetal-system -l app=antimetal-agent
  1. Check metrics endpoint:
kubectl port-forward -n antimetal-system svc/metrics-service 8443:8443
curl -k https://localhost:8443/metrics
  1. Verify cloud provider detection:
kubectl logs -n antimetal-system -l app=antimetal-agent | grep -i "provider"

Getting Access

To obtain Antimetal API credentials:

  1. Contact [email protected]
  2. Provide your organization details and use case
  3. You'll receive your API key and endpoint configuration

Note: The agent requires a valid API key to connect to the intake service. Without it, the agent may fail to start or will be unable to send data.

Getting Help

Complete Example

Here's a complete script that builds and deploys the agent from scratch:

#!/bin/bash
set -e

# Configuration
CLUSTER_NAME="test-cluster"
REGION="us-west-2"
API_KEY="your-api-key-here"

echo "πŸš€ Building Antimetal System Agent..."

# Clone and build
git clone https://github.com/antimetal/system-agent.git
cd system-agent

# Build binary
echo "πŸ“¦ Building static binary..."
CGO_ENABLED=0 go build -o agent ./cmd/main.go
mkdir -p linux/$(go env GOARCH)
cp agent linux/$(go env GOARCH)/agent

# Build Docker image
echo "🐳 Building Docker image..."
docker build -t antimetal-agent:local .

# Setup Kubernetes
echo "☸️  Setting up Kubernetes resources..."
kubectl create namespace antimetal-system || true

# Create cluster-info ConfigMap in kube-public with proper kubeconfig
echo "πŸ“‹ Creating cluster-info ConfigMap..."
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-info
  namespace: kube-public
data:
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: $(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
        server: $(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.server}')
      name: $CLUSTER_NAME
    contexts:
    - context:
        cluster: $CLUSTER_NAME
      name: $CLUSTER_NAME
    current-context: $CLUSTER_NAME
EOF

# Deploy
echo "πŸš€ Deploying agent..."
kustomize build config/default | kubectl apply -f -

# Configure
echo "βš™οΈ  Configuring deployment..."
kubectl wait --for=condition=available deployment/agent -n antimetal-system --timeout=60s || true

kubectl patch deployment agent -n antimetal-system --type=json -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "antimetal-agent:local"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/imagePullPolicy", "value": "IfNotPresent"},
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--intake-api-key='$API_KEY'"},
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--intake-address=intake.antimetal.com:443"}
]'

# For K3s - import image
if command -v k3s &> /dev/null; then
  echo "πŸ“₯ Importing image to K3s..."
  docker save antimetal-agent:local | sudo k3s ctr images import -
fi

# Verify
echo "βœ… Verifying deployment..."
sleep 10
kubectl get pods -n antimetal-system
kubectl logs -n antimetal-system -l app.kubernetes.io/name=agent --tail=10

echo "πŸŽ‰ Done! Check pod status with: kubectl get pods -n antimetal-system"

Troubleshooting

Having issues? See Getting Started Issues for common deployment problems and solutions.


Ready to dive deeper? Check out the Architecture Overview β†’