FAQ - antimetal/system-agent GitHub Wiki
FAQ (Frequently Asked Questions)
⚠️ Work in Progress: This documentation is currently being developed and may be incomplete or subject to change.
General Questions
What is the Antimetal System Agent?
The Antimetal System Agent is a lightweight, secure Kubernetes controller that collects infrastructure metrics and resource information from your clusters. It streams this data to the Antimetal platform for analysis, optimization, and cost management.
What data does the agent collect?
The agent collects:
- Kubernetes resources: Pods, nodes, services, deployments, etc.
- Performance metrics: CPU, memory, disk, network usage
- Hardware information: System specifications and topology
- Cloud metadata: Instance types, regions, zones
The agent does NOT collect:
- Application logs or data
- Environment variables or secrets
- Network traffic content
- Personal or sensitive information
Is the agent open source?
Yes, the System Agent is open source and available on GitHub. You can review the code, contribute improvements, and build it yourself.
What platforms are supported?
- Kubernetes: 1.19+ (EKS, GKE, AKS, self-managed)
- Operating Systems: Linux (kernel 2.6+)
- Architectures: amd64, arm64
- Container Runtimes: Docker, containerd, CRI-O
Installation & Setup
How do I install the agent?
The recommended installation method is using our Helm chart:
helm repo add antimetal https://charts.antimetal.com
helm install antimetal-agent antimetal/system-agent \
--set cluster.name=my-cluster \
--set antimetal.apiKey=YOUR_API_KEY
See Getting Started for detailed instructions.
What permissions does the agent need?
The agent requires read-only access to Kubernetes resources:
- Nodes, Pods, Services, Endpoints
- Deployments, StatefulSets, DaemonSets
- Namespaces
See Kubernetes Deployment for the complete RBAC configuration.
Can I run the agent outside of Kubernetes?
While primarily designed for Kubernetes, the agent can run standalone for system monitoring. However, Kubernetes resource collection will not be available.
How do I configure the agent for my environment?
Configuration can be provided through:
- Helm values
- ConfigMap
- Environment variables
- Command-line flags
See Configuration Guide for details.
Security & Privacy
Is my data secure?
Yes, security is a top priority:
- All data is encrypted in transit (TLS 1.2+)
- The agent runs with minimal privileges
- No sensitive data is collected
- SOC 2 Type II certified platform
See Security Considerations for more details.
Can I control what data is collected?
Yes, you can:
- Filter by namespace
- Exclude specific resources
- Disable certain collectors
- Redact sensitive labels/annotations
Where is my data stored?
Data is processed and stored in Antimetal's secure cloud infrastructure with:
- Regional data residency options
- Encryption at rest
- Regular security audits
- GDPR compliance
Do you collect any PII?
No, the agent does not collect personally identifiable information (PII). It focuses on infrastructure metrics and Kubernetes resource metadata.
Performance & Operations
What is the performance impact?
The agent is designed to be lightweight:
- CPU: < 0.1 cores typical
- Memory: < 100MB typical
- Network: < 1MB/min typical
Actual usage depends on cluster size and collection frequency.
How often does the agent collect data?
Default collection intervals:
- Kubernetes resources: Real-time (watch API)
- Performance metrics: Every 10-30 seconds
- Hardware info: Once at startup
Intervals are configurable.
Can the agent handle large clusters?
Yes, the agent is tested with clusters containing:
- 1000+ nodes
- 50,000+ pods
- 100,000+ resources
It uses efficient batching and compression for large environments.
What happens if the agent loses connectivity?
The agent will:
- Buffer data locally (up to configured limits)
- Retry with exponential backoff
- Resume sending when connection is restored
- Drop old data if buffer fills
Troubleshooting
The agent is not sending data
Check these common issues:
- API key is correct
- Network connectivity to Antimetal API
- Agent has proper RBAC permissions
- No firewall blocking outbound HTTPS
See Troubleshooting for detailed diagnostics.
High CPU or memory usage
This can be caused by:
- Very large clusters (adjust batch size)
- Aggressive collection intervals
- Too many top processes being tracked
Adjust configuration to reduce load.
Missing metrics for some resources
Verify that:
- Resources are in monitored namespaces
- No exclusion filters are blocking them
- Agent has permission to read those resources
- Resources have required labels (if filtering)
Errors in agent logs
Common errors and solutions:
permission denied
: Check RBAC configurationconnection refused
: Verify API endpointcertificate error
: Check TLS configurationcontext deadline exceeded
: Network timeout
Advanced Topics
Can I extend the agent with custom collectors?
Yes, the agent has a pluggable architecture. See Custom Collectors for the development guide.
How do I monitor the agent itself?
The agent exposes Prometheus metrics on :8080/metrics
:
- Collection success/failure rates
- Processing performance
- Queue sizes
- Error counts
Can I run multiple agents in one cluster?
This is not recommended as it would result in duplicate data. Use a single agent deployment per cluster.
How do I upgrade the agent?
Using Helm:
helm upgrade antimetal-agent antimetal/system-agent
The agent supports rolling updates with zero downtime.
Integration
Does the agent work with my monitoring stack?
The agent complements existing monitoring:
- Runs alongside Prometheus, Datadog, etc.
- Different focus (infrastructure optimization vs. application monitoring)
- No conflicts with other agents
Can I forward data to my own systems?
The agent is designed to work with the Antimetal platform. For custom integrations, consider:
- Using the agent's Prometheus metrics
- Building a custom collector
- Using Antimetal's API to retrieve data
Does it integrate with CI/CD pipelines?
Yes, through:
- Antimetal API for cost estimates
- GitHub Actions for deployment validation
- Webhook notifications for cost anomalies
Support
How do I get help?
- Documentation: This wiki
- GitHub Issues: Bug reports and features
- Discord: Community support
- Email: [email protected]
How do I report bugs?
- Check existing GitHub issues
- Create a new issue with:
- Agent version
- Kubernetes version
- Error messages
- Steps to reproduce
How can I contribute?
We welcome contributions! See Contributing for:
- Development setup
- Coding standards
- Pull request process
- Testing requirements
What's on the roadmap?
Current priorities:
- Windows node support
- Additional cloud provider integrations
- Enhanced eBPF collectors
- Multi-cluster management
Check our GitHub project board for latest updates.