Testing - antimetal/system-agent GitHub Wiki

Cgroup Testing Guide

This guide covers testing cgroup collectors in development and production environments.

Quick Test

The project includes automated test scripts that validate:

  • Container discovery across different runtimes
  • Metric collection from cgroup files
  • Compatibility with both cgroup v1 and v2

Manual Testing Steps

1. Verify Cgroup Version

Steps to verify:

  • Check mount points to identify cgroup version
  • For v2: Look for the presence of cgroup.controllers file in the root
  • For v1: Check for separate controller directories (cpu, memory, etc.)

2. Test Container Discovery

Validation process:

  • List running containers using your container runtime
  • Verify corresponding cgroup directories exist for each container
  • Check both systemd scope units and direct runtime directories

3. Validate Metrics Collection

Test setup:

  • Start a test container with specific resource limits (e.g., 256MB memory, 0.5 CPUs)
  • Verify metrics files are created in the container's cgroup directory
  • Confirm CPU and memory statistics are being populated

Key metrics to verify:

  • CPU usage and throttling statistics
  • Memory current usage and limits
  • Container identification in cgroup paths

4. KIND Cluster Testing

Local testing workflow:

  • Create a local Kubernetes cluster using KIND
  • Deploy the agent to the cluster
  • Monitor agent logs for successful container discovery
  • Verify metrics collection across multiple containers

Test Scenarios

Basic Functionality

  1. Container detection
  2. Metrics collection
  3. Runtime compatibility

Edge Cases

  1. Missing permissions
  2. Incomplete cgroup mounts
  3. Container restarts
  4. High container churn

Performance Testing

  1. Many containers (100+)
  2. Rapid container creation/deletion
  3. Memory pressure scenarios

Troubleshooting

No Containers Detected

  • Check cgroup mount: mount | grep cgroup
  • Verify paths: ls -la /sys/fs/cgroup/
  • Check permissions: ls -la /sys/fs/cgroup/*/

Missing Metrics

  • Verify controller enabled: cat /sys/fs/cgroup/cgroup.controllers
  • Check specific files exist
  • Review agent logs for errors

Permission Errors

  • Ensure agent has read access to /sys
  • Check SecurityContext in deployment
  • Verify volume mounts are correct

Validation Checklist

  • Cgroup version correctly detected
  • All running containers discovered
  • CPU metrics collected
  • Memory metrics collected
  • No permission errors in logs
  • Metrics update at expected intervals
  • Graceful handling of missing files
⚠️ **GitHub.com Fallback** ⚠️