Kubernetes Deployment - antimetal/system-agent GitHub Wiki

Kubernetes Deployment

This guide covers deploying the Antimetal System Agent to Kubernetes clusters using various methods including Helm, Kustomize, and raw manifests.

Prerequisites

  • Kubernetes cluster (1.19+)
  • kubectl configured with cluster access
  • Cluster admin permissions (for RBAC setup)
  • Antimetal API key from console.antimetal.com

Deployment Methods

Method 1: Helm (Recommended)

The easiest way to deploy the agent:

# Add Antimetal Helm repository
helm repo add antimetal https://charts.antimetal.com
helm repo update

# Install the agent
helm install antimetal-agent antimetal/system-agent \
  --namespace antimetal-system \
  --create-namespace \
  --set intake.apiKey="YOUR_API_KEY"

# Verify installation
kubectl get pods -n antimetal-system
helm status antimetal-agent -n antimetal-system

Helm Values

# values.yaml
replicaCount: 1

image:
  repository: antimetal/system-agent
  tag: latest
  pullPolicy: IfNotPresent

intake:
  endpoint: "intake.antimetal.com:443"
  apiKey: ""  # Required
  batchSize: 100
  batchInterval: "10s"

performance:
  enabled: true
  interval: "60s"
  collectors:
    - cpu
    - memory
    - network
    - disk

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

nodeSelector: {}
tolerations: []
affinity: {}

serviceAccount:
  create: true
  name: antimetal-agent

rbac:
  create: true

leaderElection:
  enabled: true
  namespace: antimetal-system

monitoring:
  serviceMonitor:
    enabled: false  # Enable for Prometheus Operator

Custom Installation

# With custom values file
helm install antimetal-agent antimetal/system-agent \
  -n antimetal-system \
  --create-namespace \
  -f values.yaml

# Upgrade existing installation
helm upgrade antimetal-agent antimetal/system-agent \
  -n antimetal-system \
  -f values.yaml

# Rollback if needed
helm rollback antimetal-agent -n antimetal-system

Method 2: Kustomize

For GitOps workflows:

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: antimetal-system

resources:
  - https://github.com/antimetal/system-agent/config/default?ref=v1.0.0

# Create secret for API key
secretGenerator:
  - name: antimetal-credentials
    literals:
      - api-key=YOUR_API_KEY

# Patch deployment with secret
patchesStrategicMerge:
  - deployment-patch.yaml

# Custom configuration
configMapGenerator:
  - name: antimetal-config
    files:
      - config.yaml
# deployment-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: antimetal-agent
spec:
  template:
    spec:
      containers:
      - name: agent
        env:
        - name: ANTIMETAL_INTAKE_API_KEY
          valueFrom:
            secretKeyRef:
              name: antimetal-credentials
              key: api-key

Deploy with:

kubectl apply -k .

Method 3: Raw Manifests

For maximum control:

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: antimetal-system

---
# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: antimetal-agent
  namespace: antimetal-system

---
# clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: antimetal-agent
rules:
# Core resources
- apiGroups: [""]
  resources:
    - nodes
    - pods
    - services
    - persistentvolumes
    - persistentvolumeclaims
    - namespaces
    - endpoints
  verbs: ["get", "list", "watch"]

# Apps resources
- apiGroups: ["apps"]
  resources:
    - deployments
    - daemonsets
    - statefulsets
    - replicasets
  verbs: ["get", "list", "watch"]

# Batch resources
- apiGroups: ["batch"]
  resources:
    - jobs
    - cronjobs
  verbs: ["get", "list", "watch"]

# Leader election
- apiGroups: ["coordination.k8s.io"]
  resources:
    - leases
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

# Events (for leader election)
- apiGroups: [""]
  resources:
    - events
  verbs: ["create", "patch"]

---
# clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: antimetal-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: antimetal-agent
subjects:
- kind: ServiceAccount
  name: antimetal-agent
  namespace: antimetal-system

---
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: antimetal-credentials
  namespace: antimetal-system
type: Opaque
stringData:
  api-key: "YOUR_API_KEY"

---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: antimetal-config
  namespace: antimetal-system
data:
  config.yaml: |
    intake:
      endpoint: "intake.antimetal.com:443"
      batchSize: 100
      batchInterval: "10s"
    performance:
      enabled: true
      interval: "60s"

---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: antimetal-agent
  namespace: antimetal-system
  labels:
    app: antimetal-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: antimetal-agent
  template:
    metadata:
      labels:
        app: antimetal-agent
    spec:
      serviceAccountName: antimetal-agent
      containers:
      - name: agent
        image: antimetal/system-agent:latest
        imagePullPolicy: IfNotPresent
        args:
          - --config=/etc/antimetal/config.yaml
          - --leader-election=true
          - --leader-election-namespace=antimetal-system
        env:
        - name: ANTIMETAL_INTAKE_API_KEY
          valueFrom:
            secretKeyRef:
              name: antimetal-credentials
              key: api-key
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOST_PROC
          value: "/host/proc"
        - name: HOST_SYS
          value: "/host/sys"
        ports:
        - name: metrics
          containerPort: 8080
          protocol: TCP
        - name: health
          containerPort: 8081
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: health
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          httpGet:
            path: /readyz
            port: health
          initialDelaySeconds: 5
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        volumeMounts:
        - name: config
          mountPath: /etc/antimetal
          readOnly: true
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 65532
          capabilities:
            drop:
            - ALL
      volumes:
      - name: config
        configMap:
          name: antimetal-config
      - name: proc
        hostPath:
          path: /proc
          type: Directory
      - name: sys
        hostPath:
          path: /sys
          type: Directory

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: antimetal-agent-metrics
  namespace: antimetal-system
  labels:
    app: antimetal-agent
spec:
  ports:
  - name: metrics
    port: 8080
    targetPort: metrics
  selector:
    app: antimetal-agent

Deploy with:

kubectl apply -f namespace.yaml
kubectl apply -f .

Cloud-Specific Deployments

EKS (AWS)

Additional configuration for EKS:

# eks-values.yaml
cloudProvider: eks

# IAM for Service Accounts (IRSA)
serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/antimetal-agent

# Node affinity for better performance
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      preference:
        matchExpressions:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
          - m5.large
          - m5.xlarge

GKE (Google Cloud)

# gke-values.yaml
cloudProvider: gke

# Workload Identity
serviceAccount:
  create: true
  annotations:
    iam.gke.io/gcp-service-account: [email protected]

# GKE Autopilot compatible resources
resources:
  requests:
    cpu: 250m
    memory: 512Mi
    ephemeral-storage: 1Gi
  limits:
    cpu: 500m
    memory: 512Mi
    ephemeral-storage: 1Gi

AKS (Azure)

# aks-values.yaml
cloudProvider: aks

# Azure AD Pod Identity
podLabels:
  aadpodidbinding: antimetal-agent

# AKS-specific tolerations
tolerations:
- key: CriticalAddonsOnly
  operator: Exists

High Availability Deployment

For production environments:

# ha-values.yaml
replicaCount: 3  # Only one will be active (leader election)

# Pod disruption budget
podDisruptionBudget:
  enabled: true
  minAvailable: 1

# Anti-affinity to spread across nodes
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - antimetal-agent
        topologyKey: kubernetes.io/hostname

# Resource limits for stability
resources:
  requests:
    cpu: 200m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

# Priority class
priorityClassName: system-cluster-critical

Security Hardening

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: antimetal-agent
  namespace: antimetal-system
spec:
  podSelector:
    matchLabels:
      app: antimetal-agent
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Allow Prometheus scraping
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8080
  egress:
  # Allow DNS
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53
  # Allow Kubernetes API
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          component: kube-apiserver
  # Allow intake service
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
    ports:
    - protocol: TCP
      port: 443

Pod Security Policy

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: antimetal-agent
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
  - ALL
  volumes:
  - 'configMap'
  - 'secret'
  - 'hostPath'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  readOnlyRootFilesystem: true

Monitoring Integration

Prometheus ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: antimetal-agent
  namespace: antimetal-system
spec:
  selector:
    matchLabels:
      app: antimetal-agent
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    scheme: http

Verification

Check Deployment Status

# Check pod status
kubectl get pods -n antimetal-system

# Check logs
kubectl logs -n antimetal-system -l app=antimetal-agent

# Check leader election
kubectl get lease -n antimetal-system

# Verify RBAC
kubectl auth can-i --list --as system:serviceaccount:antimetal-system:antimetal-agent

Health Checks

# Port-forward for local access
kubectl port-forward -n antimetal-system deployment/antimetal-agent 8081:8081

# Check health
curl http://localhost:8081/healthz
curl http://localhost:8081/readyz

# Check metrics
kubectl port-forward -n antimetal-system deployment/antimetal-agent 8080:8080
curl http://localhost:8080/metrics

Troubleshooting Deployment

Common Issues

  1. CrashLoopBackOff

    # Check logs
    kubectl logs -n antimetal-system -l app=antimetal-agent --previous
    
    # Common causes:
    # - Invalid API key
    # - Network connectivity issues
    # - Insufficient RBAC permissions
    
  2. ImagePullBackOff

    # Check events
    kubectl describe pod -n antimetal-system -l app=antimetal-agent
    
    # Solutions:
    # - Verify image name and tag
    # - Check image pull secrets if using private registry
    
  3. Pending State

    # Check events
    kubectl describe pod -n antimetal-system -l app=antimetal-agent
    
    # Common causes:
    # - Insufficient resources
    # - Node selector/affinity not matching
    # - Taints not tolerated
    

Debug Mode

Deploy with debug logging:

# debug-values.yaml
logLevel: debug

# Extra verbosity
extraArgs:
  - --log-verbosity=controller:2,intake:3

# Disable leader election for debugging
leaderElection:
  enabled: false

Upgrading

Helm Upgrade

# Check current version
helm list -n antimetal-system

# Check available versions
helm search repo antimetal/system-agent --versions

# Upgrade to specific version
helm upgrade antimetal-agent antimetal/system-agent \
  -n antimetal-system \
  --version 1.2.0 \
  -f values.yaml

# Verify upgrade
kubectl rollout status -n antimetal-system deployment/antimetal-agent

Zero-Downtime Upgrade

# Enable rolling updates
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

Uninstalling

Helm

helm uninstall antimetal-agent -n antimetal-system
kubectl delete namespace antimetal-system

Manual

kubectl delete -f .
kubectl delete namespace antimetal-system

Next Steps


For support, contact [email protected] or visit GitHub Issues