Integration Guide - antimetal/system-agent GitHub Wiki
Integration Guide
⚠️ Work in Progress: This documentation is currently being developed and may be incomplete or subject to change.
Overview
This guide explains how to integrate the Antimetal System Agent with your infrastructure, monitoring systems, and workflows. It covers API integration, webhook configuration, and automation scenarios.
Integration Architecture
graph TD
A[System Agent] --> B[Antimetal API]
B --> C[Your Systems]
A --> D[Metrics Export]
B --> D
C --> D
API Integration
Accessing Collected Data
The Antimetal API provides programmatic access to data collected by the System Agent.
Authentication
# Using API key
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.antimetal.com/v1/clusters/my-cluster/metrics
API Endpoints
Get Cluster Metrics
GET /v1/clusters/{cluster_name}/metrics
Response:
{
"cluster_name": "production-eks",
"timestamp": "2024-01-15T10:30:00Z",
"metrics": {
"cpu": {
"usage_cores": 45.2,
"capacity_cores": 100,
"utilization_percent": 45.2
},
"memory": {
"usage_bytes": 68719476736,
"capacity_bytes": 137438953472,
"utilization_percent": 50.0
}
}
}
Get Resource Inventory
GET /v1/clusters/{cluster_name}/resources
Get Cost Analysis
GET /v1/clusters/{cluster_name}/costs
Client Libraries
Python Client
from antimetal import Client
client = Client(api_key="YOUR_API_KEY")
# Get cluster metrics
metrics = client.get_metrics("production-eks")
print(f"CPU Usage: {metrics.cpu.usage_cores} cores")
# Get resource recommendations
recommendations = client.get_recommendations("production-eks")
for rec in recommendations:
print(f"{rec.resource}: {rec.action}")
Go Client
import "github.com/antimetal/go-client"
client := antimetal.NewClient("YOUR_API_KEY")
// Get cluster metrics
metrics, err := client.GetMetrics(ctx, "production-eks")
if err != nil {
log.Fatal(err)
}
fmt.Printf("CPU Usage: %.2f cores\n", metrics.CPU.UsageCores)
Node.js Client
const { AntimetalClient } = require('@antimetal/client');
const client = new AntimetalClient({ apiKey: 'YOUR_API_KEY' });
// Get cluster metrics
const metrics = await client.getMetrics('production-eks');
console.log(`CPU Usage: ${metrics.cpu.usageCores} cores`);
Webhook Integration
Configure webhooks to receive real-time notifications about your infrastructure.
Webhook Configuration
# In Antimetal platform
webhooks:
- name: cost-alerts
url: https://your-domain.com/webhooks/antimetal
secret: YOUR_WEBHOOK_SECRET
events:
- cost.anomaly.detected
- resource.optimization.available
- cluster.health.degraded
Webhook Handler Example
from flask import Flask, request
import hmac
import hashlib
app = Flask(__name__)
WEBHOOK_SECRET = "YOUR_WEBHOOK_SECRET"
@app.route('/webhooks/antimetal', methods=['POST'])
def handle_webhook():
# Verify signature
signature = request.headers.get('X-Antimetal-Signature')
expected = hmac.new(
WEBHOOK_SECRET.encode(),
request.data,
hashlib.sha256
).hexdigest()
if not hmac.compare_digest(signature, expected):
return "Unauthorized", 401
# Process event
event = request.json
if event['type'] == 'cost.anomaly.detected':
send_alert_to_slack(
f"Cost anomaly detected in {event['cluster']}: "
f"${event['data']['amount']} over budget"
)
return "OK", 200
Webhook Events
Event Type | Description | Payload |
---|---|---|
cost.anomaly.detected |
Unusual cost spike | Cost details, threshold |
resource.optimization.available |
Optimization found | Resource, savings estimate |
cluster.health.degraded |
Cluster issues | Health status, affected resources |
quota.limit.approaching |
Near quota limits | Resource type, current usage |
Monitoring Integration
Prometheus Metrics
The System Agent exposes Prometheus metrics for monitoring its own health:
# Prometheus scrape config
scrape_configs:
- job_name: 'antimetal-agent'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: antimetal-agent
Available Metrics
# Agent health
antimetal_agent_up{cluster="production"} 1
antimetal_agent_collection_errors_total{collector="cpu"} 0
# Collection metrics
antimetal_agent_collections_total{collector="cpu",status="success"} 1234
antimetal_agent_collection_duration_seconds{collector="cpu"} 0.023
# Data transmission
antimetal_agent_bytes_sent_total{cluster="production"} 12345678
antimetal_agent_send_errors_total{cluster="production"} 0
Grafana Dashboards
Import our pre-built dashboards:
{
"dashboard": {
"title": "Antimetal System Agent",
"panels": [
{
"title": "Collection Success Rate",
"targets": [{
"expr": "rate(antimetal_agent_collections_total[5m])"
}]
}
]
}
}
Alerting Rules
groups:
- name: antimetal_agent
rules:
- alert: AntimetalAgentDown
expr: up{job="antimetal-agent"} == 0
for: 5m
annotations:
summary: "Antimetal agent is down"
- alert: AntimetalCollectionFailures
expr: rate(antimetal_agent_collection_errors_total[5m]) > 0.1
annotations:
summary: "High collection error rate"
CI/CD Integration
GitHub Actions
name: Cost Analysis
on:
pull_request:
types: [opened, synchronize]
jobs:
cost-impact:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Analyze Cost Impact
uses: antimetal/cost-analysis-action@v1
with:
api-key: ${{ secrets.ANTIMETAL_API_KEY }}
cluster: production-eks
- name: Comment PR
uses: actions/github-script@v6
with:
script: |
const impact = ${{ steps.cost.outputs.impact }};
github.rest.issues.createComment({
issue_number: context.issue.number,
body: `💰 Estimated cost impact: $${impact}/month`
});
GitLab CI
cost_analysis:
stage: test
image: antimetal/cli:latest
script:
- antimetal analyze --cluster=$CLUSTER_NAME
only:
- merge_requests
Jenkins Pipeline
pipeline {
agent any
stages {
stage('Cost Analysis') {
steps {
sh '''
antimetal analyze \
--cluster=${CLUSTER_NAME} \
--format=json > cost-report.json
'''
publishHTML([
reportDir: '.',
reportFiles: 'cost-report.json',
reportName: 'Cost Analysis'
])
}
}
}
}
Automation Examples
Auto-scaling Integration
# Scale based on Antimetal recommendations
import kubernetes
from antimetal import Client
antimetal = Client(api_key="YOUR_API_KEY")
k8s = kubernetes.client.AppsV1Api()
def auto_scale():
recommendations = antimetal.get_recommendations("production")
for rec in recommendations:
if rec.type == "scale" and rec.confidence > 0.8:
k8s.patch_namespaced_deployment_scale(
name=rec.resource_name,
namespace=rec.namespace,
body={"spec": {"replicas": rec.target_replicas}}
)
print(f"Scaled {rec.resource_name} to {rec.target_replicas}")
Cost Alerting
#!/bin/bash
# Daily cost alert script
COST=$(curl -s -H "Authorization: Bearer $API_KEY" \
https://api.antimetal.com/v1/clusters/production/costs/daily | \
jq -r '.total')
if (( $(echo "$COST > 1000" | bc -l) )); then
slack-notify "⚠️ Daily cost exceeded $1000: $$COST"
fi
Resource Cleanup
# Clean up unused resources based on Antimetal data
def cleanup_unused_resources():
unused = antimetal.get_unused_resources("production")
for resource in unused:
if resource.type == "PersistentVolume" and resource.unused_days > 30:
if confirm_deletion(resource):
k8s.delete_persistent_volume(resource.name)
log_deletion(resource)
Platform Integrations
Slack Integration
# Slack bot for cost queries
@slack_command("/antimetal cost")
def handle_cost_command(command):
cluster = command.text or "production"
metrics = antimetal.get_cost_summary(cluster)
return {
"text": f"💰 {cluster} costs",
"attachments": [{
"fields": [
{"title": "Daily", "value": f"${metrics.daily}"},
{"title": "Monthly", "value": f"${metrics.monthly}"},
{"title": "Trend", "value": metrics.trend}
]
}]
}
PagerDuty Integration
# PagerDuty integration key
integrations:
pagerduty:
routing_key: YOUR_ROUTING_KEY
triggers:
- event: cost.spike
severity: warning
threshold: 150 # 150% of normal
Datadog Integration
# Send Antimetal metrics to Datadog
from datadog import statsd
def sync_metrics():
metrics = antimetal.get_metrics("production")
statsd.gauge('antimetal.cpu.usage', metrics.cpu.usage_cores)
statsd.gauge('antimetal.memory.usage', metrics.memory.usage_bytes)
statsd.gauge('antimetal.cost.hourly', metrics.cost.hourly)
Best Practices
API Usage
- Rate Limiting: Respect rate limits (1000 requests/hour)
- Caching: Cache responses appropriately
- Error Handling: Implement exponential backoff
- Pagination: Use pagination for large datasets
Security
- API Keys: Store securely, rotate regularly
- Webhooks: Validate signatures
- Network: Use HTTPS only
- Access Control: Limit API key permissions
Monitoring
- Health Checks: Monitor agent and API health
- Metrics: Track API usage and errors
- Alerts: Set up critical alerts
- Logging: Log all API interactions
Troubleshooting
Common Issues
API Connection Errors
# Test API connectivity
curl -H "Authorization: Bearer $API_KEY" \
https://api.antimetal.com/v1/health
Webhook Delivery Failures
- Check webhook URL accessibility
- Verify signature validation
- Review webhook logs in platform
Data Lag
- Check agent health
- Verify network connectivity
- Review agent logs
See Also
- API Reference - Complete API documentation
- gRPC API - Agent communication protocol
- Configuration Guide - Integration configuration
- Security Considerations - Security best practices