FAQ - antimetal/system-agent GitHub Wiki

FAQ (Frequently Asked Questions)

⚠️ Work in Progress: This documentation is currently being developed and may be incomplete or subject to change.

General Questions

What is the Antimetal System Agent?

The Antimetal System Agent is a lightweight, secure Kubernetes controller that collects infrastructure metrics and resource information from your clusters. It streams this data to the Antimetal platform for analysis, optimization, and cost management.

What data does the agent collect?

The agent collects:

Kubernetes resources: Pods, nodes, services, deployments, etc.
Performance metrics: CPU, memory, disk, network usage
Hardware information: System specifications and topology
Cloud metadata: Instance types, regions, zones

The agent does NOT collect:

Application logs or data
Environment variables or secrets
Network traffic content
Personal or sensitive information

Is the agent open source?

Yes, the System Agent is open source and available on GitHub. You can review the code, contribute improvements, and build it yourself.

What platforms are supported?

Kubernetes: 1.19+ (EKS, GKE, AKS, self-managed)
Operating Systems: Linux (kernel 2.6+)
Architectures: amd64, arm64
Container Runtimes: Docker, containerd, CRI-O

Installation & Setup

How do I install the agent?

The recommended installation method is using our Helm chart:

helm repo add antimetal https://charts.antimetal.com
helm install antimetal-agent antimetal/system-agent \
  --set cluster.name=my-cluster \
  --set antimetal.apiKey=YOUR_API_KEY

See Getting Started for detailed instructions.

What permissions does the agent need?

The agent requires read-only access to Kubernetes resources:

Nodes, Pods, Services, Endpoints
Deployments, StatefulSets, DaemonSets
Namespaces

See Kubernetes Deployment for the complete RBAC configuration.

Can I run the agent outside of Kubernetes?

While primarily designed for Kubernetes, the agent can run standalone for system monitoring. However, Kubernetes resource collection will not be available.

How do I configure the agent for my environment?

Configuration can be provided through:

Helm values
ConfigMap
Environment variables
Command-line flags

See Configuration Guide for details.

Security & Privacy

Is my data secure?

Yes, security is a top priority:

All data is encrypted in transit (TLS 1.2+)
The agent runs with minimal privileges
No sensitive data is collected
SOC 2 Type II certified platform

See Security Considerations for more details.

Can I control what data is collected?

Yes, you can:

Filter by namespace
Exclude specific resources
Disable certain collectors
Redact sensitive labels/annotations

Where is my data stored?

Data is processed and stored in Antimetal's secure cloud infrastructure with:

Regional data residency options
Encryption at rest
Regular security audits
GDPR compliance

Do you collect any PII?

No, the agent does not collect personally identifiable information (PII). It focuses on infrastructure metrics and Kubernetes resource metadata.

Performance & Operations

What is the performance impact?

The agent is designed to be lightweight:

CPU: < 0.1 cores typical
Memory: < 100MB typical
Network: < 1MB/min typical

Actual usage depends on cluster size and collection frequency.

How often does the agent collect data?

Default collection intervals:

Kubernetes resources: Real-time (watch API)
Performance metrics: Every 10-30 seconds
Hardware info: Once at startup

Intervals are configurable.

Can the agent handle large clusters?

Yes, the agent is tested with clusters containing:

1000+ nodes
50,000+ pods
100,000+ resources

It uses efficient batching and compression for large environments.

What happens if the agent loses connectivity?

The agent will:

Buffer data locally (up to configured limits)
Retry with exponential backoff
Resume sending when connection is restored
Drop old data if buffer fills

Troubleshooting

The agent is not sending data

Check these common issues:

API key is correct
Network connectivity to Antimetal API
Agent has proper RBAC permissions
No firewall blocking outbound HTTPS

See Troubleshooting for detailed diagnostics.

High CPU or memory usage

This can be caused by:

Very large clusters (adjust batch size)
Aggressive collection intervals
Too many top processes being tracked

Adjust configuration to reduce load.

Missing metrics for some resources

Verify that:

Resources are in monitored namespaces
No exclusion filters are blocking them
Agent has permission to read those resources
Resources have required labels (if filtering)

Errors in agent logs

Common errors and solutions:

permission denied: Check RBAC configuration
connection refused: Verify API endpoint
certificate error: Check TLS configuration
context deadline exceeded: Network timeout

Advanced Topics

Can I extend the agent with custom collectors?

Yes, the agent has a pluggable architecture. See Custom Collectors for the development guide.

How do I monitor the agent itself?

The agent exposes Prometheus metrics on :8080/metrics:

Collection success/failure rates
Processing performance
Queue sizes
Error counts

Can I run multiple agents in one cluster?

This is not recommended as it would result in duplicate data. Use a single agent deployment per cluster.

How do I upgrade the agent?

Using Helm:

helm upgrade antimetal-agent antimetal/system-agent

The agent supports rolling updates with zero downtime.

Integration

Does the agent work with my monitoring stack?

The agent complements existing monitoring:

Runs alongside Prometheus, Datadog, etc.
Different focus (infrastructure optimization vs. application monitoring)
No conflicts with other agents

Can I forward data to my own systems?

The agent is designed to work with the Antimetal platform. For custom integrations, consider:

Using the agent's Prometheus metrics
Building a custom collector
Using Antimetal's API to retrieve data

Does it integrate with CI/CD pipelines?

Yes, through:

Antimetal API for cost estimates
GitHub Actions for deployment validation
Webhook notifications for cost anomalies

Support

How do I get help?

Documentation: This wiki
GitHub Issues: Bug reports and features
Discord: Community support
Email: [email protected]

How do I report bugs?

Check existing GitHub issues
Create a new issue with:
- Agent version
- Kubernetes version
- Error messages
- Steps to reproduce

How can I contribute?

We welcome contributions! See Contributing for:

Development setup
Coding standards
Pull request process
Testing requirements

What's on the roadmap?

Current priorities:

Windows node support
Additional cloud provider integrations
Enhanced eBPF collectors
Multi-cluster management

Check our GitHub project board for latest updates.