FAQ - antimetal/system-agent GitHub Wiki

FAQ (Frequently Asked Questions)

⚠️ Work in Progress: This documentation is currently being developed and may be incomplete or subject to change.

General Questions

What is the Antimetal System Agent?

The Antimetal System Agent is a lightweight, secure Kubernetes controller that collects infrastructure metrics and resource information from your clusters. It streams this data to the Antimetal platform for analysis, optimization, and cost management.

What data does the agent collect?

The agent collects:

  • Kubernetes resources: Pods, nodes, services, deployments, etc.
  • Performance metrics: CPU, memory, disk, network usage
  • Hardware information: System specifications and topology
  • Cloud metadata: Instance types, regions, zones

The agent does NOT collect:

  • Application logs or data
  • Environment variables or secrets
  • Network traffic content
  • Personal or sensitive information

Is the agent open source?

Yes, the System Agent is open source and available on GitHub. You can review the code, contribute improvements, and build it yourself.

What platforms are supported?

  • Kubernetes: 1.19+ (EKS, GKE, AKS, self-managed)
  • Operating Systems: Linux (kernel 2.6+)
  • Architectures: amd64, arm64
  • Container Runtimes: Docker, containerd, CRI-O

Installation & Setup

How do I install the agent?

The recommended installation method is using our Helm chart:

helm repo add antimetal https://charts.antimetal.com
helm install antimetal-agent antimetal/system-agent \
  --set cluster.name=my-cluster \
  --set antimetal.apiKey=YOUR_API_KEY

See Getting Started for detailed instructions.

What permissions does the agent need?

The agent requires read-only access to Kubernetes resources:

  • Nodes, Pods, Services, Endpoints
  • Deployments, StatefulSets, DaemonSets
  • Namespaces

See Kubernetes Deployment for the complete RBAC configuration.

Can I run the agent outside of Kubernetes?

While primarily designed for Kubernetes, the agent can run standalone for system monitoring. However, Kubernetes resource collection will not be available.

How do I configure the agent for my environment?

Configuration can be provided through:

  1. Helm values
  2. ConfigMap
  3. Environment variables
  4. Command-line flags

See Configuration Guide for details.

Security & Privacy

Is my data secure?

Yes, security is a top priority:

  • All data is encrypted in transit (TLS 1.2+)
  • The agent runs with minimal privileges
  • No sensitive data is collected
  • SOC 2 Type II certified platform

See Security Considerations for more details.

Can I control what data is collected?

Yes, you can:

  • Filter by namespace
  • Exclude specific resources
  • Disable certain collectors
  • Redact sensitive labels/annotations

Where is my data stored?

Data is processed and stored in Antimetal's secure cloud infrastructure with:

  • Regional data residency options
  • Encryption at rest
  • Regular security audits
  • GDPR compliance

Do you collect any PII?

No, the agent does not collect personally identifiable information (PII). It focuses on infrastructure metrics and Kubernetes resource metadata.

Performance & Operations

What is the performance impact?

The agent is designed to be lightweight:

  • CPU: < 0.1 cores typical
  • Memory: < 100MB typical
  • Network: < 1MB/min typical

Actual usage depends on cluster size and collection frequency.

How often does the agent collect data?

Default collection intervals:

  • Kubernetes resources: Real-time (watch API)
  • Performance metrics: Every 10-30 seconds
  • Hardware info: Once at startup

Intervals are configurable.

Can the agent handle large clusters?

Yes, the agent is tested with clusters containing:

  • 1000+ nodes
  • 50,000+ pods
  • 100,000+ resources

It uses efficient batching and compression for large environments.

What happens if the agent loses connectivity?

The agent will:

  1. Buffer data locally (up to configured limits)
  2. Retry with exponential backoff
  3. Resume sending when connection is restored
  4. Drop old data if buffer fills

Troubleshooting

The agent is not sending data

Check these common issues:

  1. API key is correct
  2. Network connectivity to Antimetal API
  3. Agent has proper RBAC permissions
  4. No firewall blocking outbound HTTPS

See Troubleshooting for detailed diagnostics.

High CPU or memory usage

This can be caused by:

  • Very large clusters (adjust batch size)
  • Aggressive collection intervals
  • Too many top processes being tracked

Adjust configuration to reduce load.

Missing metrics for some resources

Verify that:

  • Resources are in monitored namespaces
  • No exclusion filters are blocking them
  • Agent has permission to read those resources
  • Resources have required labels (if filtering)

Errors in agent logs

Common errors and solutions:

  • permission denied: Check RBAC configuration
  • connection refused: Verify API endpoint
  • certificate error: Check TLS configuration
  • context deadline exceeded: Network timeout

Advanced Topics

Can I extend the agent with custom collectors?

Yes, the agent has a pluggable architecture. See Custom Collectors for the development guide.

How do I monitor the agent itself?

The agent exposes Prometheus metrics on :8080/metrics:

  • Collection success/failure rates
  • Processing performance
  • Queue sizes
  • Error counts

Can I run multiple agents in one cluster?

This is not recommended as it would result in duplicate data. Use a single agent deployment per cluster.

How do I upgrade the agent?

Using Helm:

helm upgrade antimetal-agent antimetal/system-agent

The agent supports rolling updates with zero downtime.

Integration

Does the agent work with my monitoring stack?

The agent complements existing monitoring:

  • Runs alongside Prometheus, Datadog, etc.
  • Different focus (infrastructure optimization vs. application monitoring)
  • No conflicts with other agents

Can I forward data to my own systems?

The agent is designed to work with the Antimetal platform. For custom integrations, consider:

  • Using the agent's Prometheus metrics
  • Building a custom collector
  • Using Antimetal's API to retrieve data

Does it integrate with CI/CD pipelines?

Yes, through:

  • Antimetal API for cost estimates
  • GitHub Actions for deployment validation
  • Webhook notifications for cost anomalies

Support

How do I get help?

  • Documentation: This wiki
  • GitHub Issues: Bug reports and features
  • Discord: Community support
  • Email: [email protected]

How do I report bugs?

  1. Check existing GitHub issues
  2. Create a new issue with:
    • Agent version
    • Kubernetes version
    • Error messages
    • Steps to reproduce

How can I contribute?

We welcome contributions! See Contributing for:

  • Development setup
  • Coding standards
  • Pull request process
  • Testing requirements

What's on the roadmap?

Current priorities:

  • Windows node support
  • Additional cloud provider integrations
  • Enhanced eBPF collectors
  • Multi-cluster management

Check our GitHub project board for latest updates.