Monitoring - theartusz/config GitHub Wiki

Alertmanager

  • alertmanagerconfig is specific for each namespace

  • the config files from each namespace are than merged together by config handler

  • to access the alertmanager config file run:

k get alertmanagerconfig -n=<namespace> -o=yaml
  • alertmanager matches alerts with routes based on labels: alertmanager:

    - receiver: zeebe-slack-tmp-grafana-testalerts-default
        group_by:
        - job
        match:
          namespace: zeebe
          severity: critical

    alert:

    labels:
      job: zeebe-cluster
      severity: critical

Prometheus

  • example of prometheus cpu usage:
sum(rate(container_cpu_usage_seconds_total{container!="",pod="prometheus-prometheus-operator-kube-p-prometheus-0"}[5m]))by(pod)
  • example of prometheus memory usage: ``
  • helmchart values for limits in prometheus:
    prometheus:
      prometheusSpec:
        resources:
          limits:
            memory: 3000Mi
            cpu: 500m
          requests:
            memory: 1500Mi
            cpu: 300m

disable monitoring of kube-scheduler

aks-monitoring-with-prometheus On managed cloud services (like AKS, GKE and EKS) some targets (kube-scheduler, kube-controller-manager...) are hidden by the provider. To desable the alerts one must edit the default values when installing prometheus. Example:

# Forcing Kubelet metrics scraping on http 
kubelet:
  enabled: true
  serviceMonitor:
    https: false
# Disabling scraping of Master Nodes Components
kubeControllerManager:
  enabled: false
kubeScheduler:
  enabled: false
kubeEtcd:
  enabled: false
kubeProxy:
  enabled: false

calculating burn rate

burn rate = (SLO budget consumption x SLO period) / alerting window size

14.4 = (0.02 x 30 days x 24 hours) / 1 hour

  • to consume 2% of SLO budget in 1 hour one needs burn rate of 14.4
⚠️ **GitHub.com Fallback** ⚠️