Monitoring - theartusz/config GitHub Wiki
-
alertmanagerconfig is specific for each namespace
-
the config files from each namespace are than merged together by config handler
-
to access the alertmanager config file run:
k get alertmanagerconfig -n=<namespace> -o=yaml
-
alertmanager matches alerts with routes based on labels: alertmanager:
- receiver: zeebe-slack-tmp-grafana-testalerts-default group_by: - job match: namespace: zeebe severity: critical
alert:
labels: job: zeebe-cluster severity: critical
- example of prometheus cpu usage:
sum(rate(container_cpu_usage_seconds_total{container!="",pod="prometheus-prometheus-operator-kube-p-prometheus-0"}[5m]))by(pod)
- example of prometheus memory usage: ``
- helmchart values for limits in prometheus:
prometheus: prometheusSpec: resources: limits: memory: 3000Mi cpu: 500m requests: memory: 1500Mi cpu: 300m
aks-monitoring-with-prometheus On managed cloud services (like AKS, GKE and EKS) some targets (kube-scheduler, kube-controller-manager...) are hidden by the provider. To desable the alerts one must edit the default values when installing prometheus. Example:
# Forcing Kubelet metrics scraping on http
kubelet:
enabled: true
serviceMonitor:
https: false
# Disabling scraping of Master Nodes Components
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeEtcd:
enabled: false
kubeProxy:
enabled: false
burn rate = (SLO budget consumption x SLO period) / alerting window size
14.4 = (0.02 x 30 days x 24 hours) / 1 hour
- to consume 2% of SLO budget in 1 hour one needs burn rate of 14.4