3) Prometheus Operator for Beginner - jangjaelee/tutorials-prometheus GitHub Wiki


2022.06. ์ด์žฅ์žฌ ๐Ÿ“ง [email protected] ๐Ÿ“‚ https://github.com/jangjaelee ๐Ÿ“’ http://www.awx.kr


 

What is Prometheus Operator?

Prometheus Operator๋Š” Prometheus ๊ธฐ๋ฐ˜ Kubernetes ๋ชจ๋‹ˆํ„ฐ๋ง ์Šคํƒ์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Kubernetes Operator ํŒจํ„ด์„ ๊ตฌํ˜„ํ•œ ๊ฒƒ ์ž…๋‹ˆ๋‹ค. Kubernetes Operator๋Š” Kubernetes์—์„œ ํŠน์ • ์„œ๋น„์Šค์˜ ์‹คํ–‰์„ ๊ด€๋ฆฌ ๋ฐ ๊ตฌํ˜„ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ถ”์ƒํ™”ํ•˜๊ธฐ ์œ„ํ•ด Kubernetes custom resource์™€ controller code๋กœ ๊ตฌ์„ฑ ๋ฉ๋‹ˆ๋‹ค.

Prometheus Operator์˜ ์ฃผ์š” ๋ชฉ์ ์€ Kubernetes ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์‹คํ–‰๋˜๋Š” Prometheus ๋ชจ๋‹ˆํ„ฐ๋ง ์Šคํƒ์˜ ๊ตฌ์„ฑ ๋ฐ ๊ด€๋ฆฌ๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๊ณ  ์ž๋™ํ™”ํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, Kubernetes CustomResourceDefinitions(CRDs)๋ฅผ ํ†ตํ•ด Prometheus, Alertmanager ๋ฐ ๊ธฐํƒ€ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ์‰ฝ๊ฒŒ ์‚ฌ์šฉ์ž ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด ServiceMonitor custom resource๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ YAML manifest์—์„œ Kubernetes ์„œ๋น„์Šค ๊ทธ๋ฃน์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, Operator controller๋Š” K8s API ์„œ๋ฒ„์™€ ํ†ต์‹ ํ•˜์—ฌ Service endpoints์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๊ตฌ์„ฑ๋œ ์„œ๋น„์Šค์— ํ•„์š”ํ•œ Prometheus scrape ๊ตฌ์„ฑ์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค.

Prometheus Operator์˜ ์ฃผ์š” ํŠน์ง•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • Kubernetes Custom Resources: Use Kubernetes custom resources to deploy and manage Prometheus, Alertmanager, and related components.
  • Simplified Deployment Configuration: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.
  • Prometheus Target Configuration: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn a Prometheus specific configuration language.

 

Overview

Prometheus Operator๋ฅผ ์„ค์น˜ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๊ณ  Prometheus ๋ชจ๋‹ˆํ„ฐ๋ง ์Šคํƒ์„ ๊ตฌ์„ฑํ•˜๊ณ  ์žˆ๋Š” Prometheus Operator์˜ custom resources์—๋Š” ์–ด๋–ค ๊ฒƒ๋“ค์ด ์žˆ๋Š”์ง€ ๊ทธ๋ฆฌ๊ณ  custom resource์˜ YAML Example๋ฅผ ์‚ดํŽด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

Operator Capability Level

Prometheus Operator๋Š” ์„ฑ์ˆ™๋„ ์ˆ˜์ค€์ด Level IV๋กœ ๊นŠ์€ ํ†ต์ฐฐ๋ ฅ(Deep Insights) ์ˆ˜์ค€์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Operator Capability Levels ๊ฐ ๋‹จ๊ณ„์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

operator-capability-level-01.png

operator-capability-level-02.png

 

Architecture

Prometheus-Operator_architecture-01.png

[์ถœ์ฒ˜] https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/Documentation/user-guides/images/architecture.png

 

Prometheus-Operator_architecture-04.svg

[์ถœ์ฒ˜] https://www.nicktriller.com/blog/managing-prometheus-on-kubernetes-with-prometheus-operator/

 

Prometheus-Operator_architecture-06.png

[์ถœ์ฒ˜] https://www.youtube.com/watch?v=Uph_Say4D3M

 

Prometheus-Operator_architecture-03.png

[์ถœ์ฒ˜] https://prometheus-operator.dev/docs/operator/troubleshooting/custom-metrics-elements.png

 

Prometheus-Operator_architecture-02.jpeg

[์ถœ์ฒ˜] https://zhuanlan.zhihu.com/p/76835425

 

Prometheus-Operator_architecture-05.png

[์ถœ์ฒ˜] https://www.youtube.com/watch?v=Uph_Say4D3M

 

CustomResourceDefinitions(CRDs)

Prometheus Operator์˜ ํ•ต์‹ฌ ๊ธฐ๋Šฅ์œผ๋กœ Kubernetes API ์„œ๋ฒ„์—์„œ ํŠน์ • ๊ฐœ์ฒด์˜ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ๋ชจ๋‹ˆํ„ฐ๋ง์„ ํ†ตํ•ด ๊ฐœ์ฒด์˜ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ๊ฐ์ง€ํ•˜๊ณ  ๊ฐœ์ฒด๊ฐ€ ์„ ์–ธ์ƒํƒœ์™€ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•˜๋ฉฐ, CRD๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘๋™ ํ•ฉ๋‹ˆ๋‹ค.

  • Prometheus : ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๋ฐฐํฌ๋ฅผ StatefulSet์œผ๋กœ ์ •์˜

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: example
      namespace: monitoring
    spec:
      replicas: 2
      serviceAccountName: prometheus-k8s
      serviceMonitorSelector: {}
      ruleSelector: {}
      podMonitorSelector: {}
      probeSelector: {}
      alerting:
        alertmanagers:
          - namespace: monitoring
            name: alertmanager-main
            port: web
  • Alertmanager : Alertmanager ๋ฐฐํฌ๋ฅผ ์ •์˜

    apiVersion: monitoring.coreos.com/v1
    kind: Alertmanager
    metadata:
      name: alertmanager-main
      namespace: monitoring
    spec:
      replicas: 3
      alertmanagerConfigSelector: {}
  • ThanosRuler : ThanosRuler ๋ฐฐํฌ์™€ ์„ค์ •์„ ์„ ์–ธ์ ์œผ๋กœ ์ •์˜ - Thanos Rule [๋งํฌ]

    apiVersion: monitoring.coreos.com/v1
    kind: ThanosRuler
    metadata:
      name: thanos-ruler-demo
      labels:
        example: thanos-ruler
      namespace: monitoring
    spec:
      image: quay.io/thanos/thanos
      ruleSelector:
        matchLabels:
          role: my-thanos-rules
      queryEndpoints:
        - dnssrv+_http._tcp.my-thanos-querier.monitoring.svc.cluster.local
  • ServiceMonitor : Kubernetes ์„œ๋น„์Šค ๊ทธ๋ฃน์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ ์–ธ์ ์œผ๋กœ ์ง€์ •

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: example
      namespace: monitoring
    spec:
      selector:
        matchLabels:
          operated-prometheus: 'true'
      endpoints:
        - port: web
          interval: 30s
  • PodMonitor : Pod ๊ทธ๋ฃน์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ ์–ต์ ์œผ๋กœ ์ง€์ •

    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
      name: example
      namespace: monitoring
    spec:
      selector:
        matchLabels:
          app: prometheus
      podMetricsEndpoints:
        - port: web
          interval: 30s
  • Probe : Ingress ๋˜๋Š” ์ •์  ๋Œ€์ƒ ๊ทธ๋ฃน์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ ์–ธ์ ์œผ๋กœ ์ง€์ •

    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: servers
      namespace: monitoring
    spec:
      jobName: servers
      interval: 10s
      prober:
        url: localhost
        path: /metrics
      metricRelabelings:
      - sourceLabels: [__address__]
        targetLabel: target
      targets:
        staticConfig:
          static:
          - 192.168.0.1:9182
          - 192.168.0.2:9182
          relabelingConfigs:
          - sourceLabels: [__param_target]
            targetLabel: instance
          - sourceLabels: [__param_target]
            targetLabel: __address__
  • PrometheusRule : ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ๊ฒฝ๊ณ  ๋ฐ ๊ธฐ๋ก ๊ทœ์น™ ์ง‘ํ•ฉ์„ ์ •์˜

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: prometheus-example-rules
      namespace: monitoring
    spec:
      groups:
        - name: ./example.rules
          rules:
            - alert: ExampleAlert
              expr: vector(1)
  • AlertmanagerConfig : Alertmanager ๊ตฌ์„ฑ์˜ ํ•˜์œ„ ์„น์…˜์„ ์„ ์–ธ์ ์œผ๋กœ ์ง€์ •

    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
      name: example
      namespace: monitoring
    spec:
      receivers:
        - name: example
      route:
        receiver: example

๋ณด๋‹ค ์ž์„ธํ•œ CRDs์˜ ์„ค๋ช…์€ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๋ณด๋‹ค ์ž์„ธํ•œ API types์— ๋Œ€ํ•œ ์„ค๋ช…์€ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

 

Step 1. Install Prometheus Operator

Prometheus Operator๋Š” ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์„ค์น˜๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

  • Prometheus Operator : bundle YAML manifest๋กœ ์„ค์น˜
  • kube-prometheus : YAML manifest๋กœ ์„ค์น˜ํ•˜๋ฉฐ ๋‹ค์Œ์˜ ๊ตฌ์„ฑ์š”์†Œ๊ฐ€ ํŒจํ‚ค์ง€๋กœ ํฌํ•จ๋จ (Prometheus Operator, Prometheus[HA], Alertmanager[HA], node-exporter, Prometheus Adapter for K8s Metrics APIs, kube-state-metrics, Grafana)
  • kube-prometheus-stack : kube-prometheus์„ Helm Charts๋กœ ํŒจํ‚ค์ง• (Helm v3+ ํ•„์š”)

๋ณธ ๋‚ด์šฉ์€ ์—์„œ๋Š” Prometheus Operator์˜ bundle YAML manifest์™€ kube-prometheus-stack์˜ Helm Charts ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๋ฉฐ ์‚ฌ์šฉํ•˜๊ธฐ ํŽธํ•œ ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Step 1.1. using a bundle YAML manifest

bundle YAML manifest๋ฅผ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์Šต๋‹ˆ๋‹ค.

manifest์—๋Š” Prometheus Operator๋ฅผ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•œ CRD, ClusterRoleBinding, ClusterRole, Deployment, ServiceAccount, Service ์˜ค๋ธŒ์ ํŠธ ๋ช…์„ธ์„œ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

$ curl -OL https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/bundle.yaml

์ ์šฉ์ „ ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๊ฐ’์„ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

๋ณ€๊ฒฝ์ „) default โ†’ ๋ณ€๊ฒฝํ›„) prometheus-operator

$ sed -i 's/namespace\: default/namespace\: prometheus\-operator/g' bundle.yaml
$ grep -i 'namespace: prometheus-operator' bundle.yaml

Prometheus Operator๊ฐ€ ๋ฐฐํฌ ๋  ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  prometheus-operator๋ฅผ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

$ kubectl create namespace prometheus-operator
namespace/prometheus-operator created

$ kubectl create -n prometheus-operator -f bundle.yaml
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
serviceaccount/prometheus-operator created
service/prometheus-operator created

์ฃผ์˜) prometheus-operator๋ฅผ ๋ฐฐํฌ ํ•  ๋•Œ kubectl CLI์—์„œ apply๊ฐ€ ์•„๋‹Œ create ์‚ฌ์šฉ ๋˜๋Š” apply ์‚ฌ์šฉ์‹œ โ€œโ€”server-sideโ€ ์˜ต์…˜์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์„ธ์š”.

apply๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜ค๋ฅ˜๋ฉ”์„ธ์ง€๋ฅผ ๋งŒ๋‚˜๊ฒŒ ๋˜๋ฉฐ,

The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

apply๋กœ ๋ฆฌ์†Œ์Šค๋ฅผ ์ƒ์„ฑ/์—…๋ฐ์ดํŠธ ํ•  ๋•Œ ๋งˆ๋‹ค ๋ช…์„ธ์„œ์˜ *metadata.annotation*์— *kubectl.kubernetes.io/last-applied-configuration*ํ•„๋“œ์— JSON ๋ฌธ์„œ๊ฐ€ ์ถ”๊ฐ€๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

[์ฐธ์กฐ I] ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ๊ณต์‹ ํ™ˆํŽ˜์ด์ง€์˜ How to create objects์—์„œ kubectl apply์˜ ์„ค๋ช…

[์ฐธ์กฐ II] Kubectl Install CRD Failed โ€” Annotations Too Long

[์ฐธ์กฐ III] Server-Side Apply

YAML manifest๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ CRDs๋“ค์€ ๋ฌด์—‡์ด ์žˆ๋Š”์ง€ ๊ทธ๋ฆฌ๊ณ  ๋ฐฐํฌ๋œ ๋ฆฌ์†Œ์Šค๋“ค์€ ๋ฌด์—‡์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • CRDs

    $ kubectl get crds | grep coreos
    alertmanagerconfigs.monitoring.coreos.com             2022-06-08T10:04:27Z
    alertmanagers.monitoring.coreos.com                   2022-06-08T10:04:27Z
    podmonitors.monitoring.coreos.com                     2022-06-08T10:04:28Z
    probes.monitoring.coreos.com                          2022-06-08T10:04:28Z
    prometheuses.monitoring.coreos.com                    2022-06-08T10:04:28Z
    prometheusrules.monitoring.coreos.com                 2022-06-08T10:04:28Z
    servicemonitors.monitoring.coreos.com                 2022-06-08T10:04:28Z
    thanosrulers.monitoring.coreos.com                    2022-06-08T10:04:28Z
  • Objects

    $ kubectl get all -n prometheus-operator
    NAME                                       READY   STATUS    RESTARTS   AGE
    pod/prometheus-operator-567cd8b6f6-nhnr5   1/1     Running   0          99s
    
    NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
    service/prometheus-operator   ClusterIP   None         <none>        8080/TCP   99s
    
    NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/prometheus-operator   1/1     1            1           99s
    
    NAME                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/prometheus-operator-567cd8b6f6   1         1         1       99s

Step 1.2. using Helm Chart of kube-prometheus-stack

Prometheus-Community์—์„œ ์ œ๊ณตํ•˜๋Š” Helm Chart package์ค‘ kube-prometheus-stack๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์„ค์น˜๋ฅผ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋จผ์ € helm chart repo๋ฅผ ๋“ฑ๋กํ•ฉ๋‹ˆ๋‹ค.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update

helm show crds๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ charts๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ค์น˜๋˜๋Š” CRDs์€ ์–ด๋–ค ๊ฒƒ๋“ค์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

$ helm show crds prometheus-community/kube-prometheus-stack > kube-prometheus-stack-crds.yaml

$ grep -A7 '^kind: CustomResourceDefinition' kube-prometheus-stack-crds.yaml | grep '^  name:' | awk '{print $2}' | sort -u
alertmanagerconfigs.monitoring.coreos.com
alertmanagers.monitoring.coreos.com
podmonitors.monitoring.coreos.com
probes.monitoring.coreos.com
prometheuses.monitoring.coreos.com
prometheusrules.monitoring.coreos.com
servicemonitors.monitoring.coreos.com
thanosrulers.monitoring.coreos.com

Prometheus Community์˜ kube-prometheus-stack์€ Prometheus-Operator์˜ kube-prometheus๋ฅผ Helm Chart๋กœ ํŒจํ‚ค์ง• ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— kube-prometheus์—์„œ ์ œ๊ณตํ•˜๋Š” Prometheus[HA], Alertmanager[HA], node-exporter, Prometheus Adapter for K8s Metrics APIs, kube-state-metrics, Grafana ๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋‚ด์šฉ์—์„œ๋Š” Prometheus-Operator๋งŒ์„ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ํŒจํ‚ค์ง• ๋˜์–ด ์ œ๊ณต๋˜๋Š” ๊ฒƒ๋“ค์˜ ๋ฐฐํฌ๋Š” ์ƒ๋žตํ•ฉ๋‹ˆ๋‹ค.

$ helm install prometheus-operator prometheus-community/kube-prometheus-stack -n prometheus-operator --create-namespace \
--set grafana.enabled=false \
--set alertmanager.enabled=false \
--set prometheus.enabled=false \
--set nodeExporter.enabled=false \
--set kubeStateMetrics.enabled=false \
--set prometheusOperator.enabled=true
NAME: prometheus-operator
LAST DEPLOYED: Thu Jun  9 16:31:20 2022
NAMESPACE: prometheus-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace prometheus-operator get pods -l "release=prometheus-operator"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

Helm charts๋กœ ์ƒ์„ฑ๋œ CRDs๋“ค์€ ๋ฌด์—‡์ด ์žˆ๋Š”์ง€ ๊ทธ๋ฆฌ๊ณ  ๋ฐฐํฌ๋œ ๋ฆฌ์†Œ์Šค๋“ค์€ ๋ฌด์—‡์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • CRDs

    $ kubectl get crds | grep coreos
    alertmanagerconfigs.monitoring.coreos.com             2022-06-09T07:31:18Z
    alertmanagers.monitoring.coreos.com                   2022-06-09T07:31:18Z
    podmonitors.monitoring.coreos.com                     2022-06-09T07:31:18Z
    probes.monitoring.coreos.com                          2022-06-09T07:31:18Z
    prometheuses.monitoring.coreos.com                    2022-06-09T07:31:18Z
    prometheusrules.monitoring.coreos.com                 2022-06-09T07:31:18Z
    servicemonitors.monitoring.coreos.com                 2022-06-09T07:31:18Z
    thanosrulers.monitoring.coreos.com                    2022-06-09T07:31:18Z
  • Objects

    $ kubectl get all -n prometheus-operator
    NAME                                                       READY   STATUS    RESTARTS   AGE
    pod/prometheus-operator-kube-p-operator-7d9dbbc6db-hgl2x   1/1     Running   0          8m53s
    
    NAME                                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
    service/prometheus-operator-kube-p-operator   ClusterIP   10.104.57.243   <none>        443/TCP   8m53s
    
    NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/prometheus-operator-kube-p-operator   1/1     1            1           8m53s
    
    NAME                                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/prometheus-operator-kube-p-operator-7d9dbbc6db   1         1         1       8m53s

Step 1.3. Kubernetes API resources of Prometheus Operator

kubectl api-resource ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ Prometheus Operator์˜ Kubernetes API resource ์ •๋ณด๋ฅผ ํ™•์ธํ•˜๊ณ  ๊ฐ ์˜ค๋ธŒ์ ํŠธ๋“ค์˜ shortname์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

$ kubectl api-resources | egrep '^NAME|coreos'
NAME                              SHORTNAMES         APIVERSION                             NAMESPACED   KIND
alertmanagerconfigs               amcfg              monitoring.coreos.com/v1alpha1         true         AlertmanagerConfig
alertmanagers                     am                 monitoring.coreos.com/v1               true         Alertmanager
podmonitors                       pmon               monitoring.coreos.com/v1               true         PodMonitor
probes                            prb                monitoring.coreos.com/v1               true         Probe
prometheuses                      prom               monitoring.coreos.com/v1               true         Prometheus
prometheusrules                   promrule           monitoring.coreos.com/v1               true         PrometheusRule
servicemonitors                   smon               monitoring.coreos.com/v1               true         ServiceMonitor
thanosrulers                      ruler              monitoring.coreos.com/v1               true         ThanosRuler

 

Step 2. Role-based access control (RBAC)

Prometheus Operator๋ฅผ ์œ„ํ•œ RBAC(์—ญํ•  ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ์ œ์–ด)์€ ๋‘๊ฐœ์˜ ํŒŒํŠธ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ฒซ์งธ๋กœ Prometheus Operator ์ž์ฒด์— ๋Œ€ํ•œ RBAC ๊ทœ์น™

๋‘˜์งธ๋กœ Prometheus๊ฐ€ target ๋ฐ Alertmanater ๊ฒ€์ƒ‰์„ ์œ„ํ•ด Kubernetes API์— ์ ‘๊ทผ์ด ํ•„์š”ํ•˜๋ฏ€๋กœ Prometheus Operator๊ฐ€ ์ƒ์„ฑํ•œ Prometheus Pod ์ž์ฒด์— ๋Œ€ํ•œ RBAC ๊ทœ์น™

Step 2.1. Prometheus Operator RBAC

Prometheus Operator๊ฐ€ RBAC ๊ธฐ๋ฐ˜ ๊ถŒํ•œ ๋ถ€์—ฌ ํ™˜๊ฒฝ์—์„œ ์ž‘๋™ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” Operator๊ฐ€ Kubernetes API์— ํ•„์š”ํ•œ ๋ชจ๋“  ๋ฆฌ์†Œ์Šค์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•ด ClusterRole์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Prometheus Operator์— RBAC๋ฅผ ์œ„ํ•œ ClusterRole, ClusterRoleBinding, ServiceAccount๋Š” Operator๋ฅผ ์„ค์น˜ํ•˜๋Š” ๊ณผ์ •์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ƒ์„ฑ๋˜๋ฉฐ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$ kubectl get clusterrole,clusterrolebinding | egrep -i '^NAME|prometheus'
NAME                                                           CREATED AT
clusterrole.rbac.authorization.k8s.io/prometheus-operator      2022-06-09T08:33:30Z

NAME                                                                   ROLE                               AGE
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator       ClusterRole/prometheus-operator    73m

$ kubectl get sa/prometheus-operator -n prometheus-operator
NAME                                 SECRETS   AGE
serviceaccount/prometheus-operator   1         73m

Prometheus Operator๋Š” customresourcedefinitions(CRDs)์™€ ํ•จ๊ป˜ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ž‘๋™ํ•จ์— ๋”ฐ๋ผ ๋‹ค์Œ object๋“ค์— ๋ชจ๋“  ๊ถŒํ•œ์ด ์š”๊ตฌ ๋ฉ๋‹ˆ๋‹ค.

  • alertmanagers
  • podmonitors
  • probes
  • prometheuses
  • prometheusrules
  • servicemonitors
  • thanosrulers

๊ทธ๋ ‡๋‹ค๋ฉด ํ•„์š”ํ•œ ๊ถŒํ•œ์ด ๋ชจ๋‘ ๋ถ€์—ฌ ๋˜์—ˆ๋Š”์ง€ kubectl describe ๋ช…๋ น์œผ๋กœ ClusterRole์„ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

$ kubectl describe clusterrole/prometheus-operator
Name:         prometheus-operator
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/name=prometheus-operator
              app.kubernetes.io/version=0.57.0
Annotations:  <none>
PolicyRule:
  Resources                                       Non-Resource URLs  Resource Names  Verbs
  ---------                                       -----------------  --------------  -----
  configmaps                                      []                 []              [*]
  secrets                                         []                 []              [*]
  statefulsets.apps                               []                 []              [*]
  alertmanagerconfigs.monitoring.coreos.com       []                 []              [*]
  alertmanagers.monitoring.coreos.com/finalizers  []                 []              [*]
  alertmanagers.monitoring.coreos.com             []                 []              [*]
  podmonitors.monitoring.coreos.com               []                 []              [*]
  probes.monitoring.coreos.com                    []                 []              [*]
  prometheuses.monitoring.coreos.com/finalizers   []                 []              [*]
  prometheuses.monitoring.coreos.com/status       []                 []              [*]
  prometheuses.monitoring.coreos.com              []                 []              [*]
  prometheusrules.monitoring.coreos.com           []                 []              [*]
  servicemonitors.monitoring.coreos.com           []                 []              [*]
  thanosrulers.monitoring.coreos.com/finalizers   []                 []              [*]
  thanosrulers.monitoring.coreos.com              []                 []              [*]
  endpoints                                       []                 []              [get create update delete]
  services/finalizers                             []                 []              [get create update delete]
  services                                        []                 []              [get create update delete]
  namespaces                                      []                 []              [get list watch]
  ingresses.networking.k8s.io                     []                 []              [get list watch]
  pods                                            []                 []              [list delete]
  nodes                                           []                 []              [list watch]

ClusterRole์™€ ๋ฌถ์ธ ClusterRoleBinding์„ describe ํ•ด์„œ ๋ณด๋ฉด prometheus-operator ClusterRole๊ณผ prometheus-operator ServiceAccount objects๋“ค์ด ๋ฌถ์ธ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$ kubectl describe clusterrolebinding/prometheus-operator
Name:         prometheus-operator
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/name=prometheus-operator
              app.kubernetes.io/version=0.57.0
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  prometheus-operator
Subjects:
  Kind            Name                 Namespace
  ----            ----                 ---------
  ServiceAccount  prometheus-operator  prometheus-operator

๊ทธ๋ ‡๋‹ค๋ฉด Prometheus Operator๋Š” ์™œ ์ด๋ ‡๊ฒŒ ๋งŽ์€ RBAC ๊ธฐ๋ฐ˜ ๊ถŒํ•œ์ด ํ•„์š” ํ• ๊นŒ์š”? ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • Alertmanager์™€ Prometheus ํด๋Ÿฌ์Šคํ„ฐ๋Š” StatefulSets์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Alertmanager ๋˜๋Š” Prometheus objects์— ๋Œ€ํ•œ ๋ชจ๋“  ๋ณ€๊ฒฝ ์‚ฌํ•ญ์€ StatefulSets์— ๋Œ€ํ•œ ๋ณ€๊ฒฝ์œผ๋กœ ์ด๋ฃจ์–ด ์ง€๊ธฐ ๋•Œ๋ฌธ์— prometheus.monitoring.coreos.com ๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•ด ๋ชจ๋“  ๊ถŒํ•œ ํ—ˆ์šฉ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • Prometheus๊ฐ€ ์‹คํ–‰๋˜๋„๋ก configurations ์ƒ์„ฑ์„ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ ConfigMaps์— ๋Œ€ํ•œ ๊ถŒํ•œ ํ—ˆ์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • Prometheus ๋˜๋Š” Alertmanager์˜ ํ•œ ๋ฒ„์ „์—์„œ ๋‹ค๋ฅธ ๋ฒ„์ „์œผ๋กœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ์ˆ˜ํ–‰ ํ•  ๋•Œ ์ƒˆ๋กœ์šด ๋ฒ„์ „์˜ Pods ๊ตฌ๋™์„ ์œ„ํ•ด list๊ฐ€ ๊ทธ๋ฆฌ๊ณ  ์ด์ „ ๋ฒ„์ „์„ ์‚ญ์ œํ•˜๊ธฐ ์œ„ํ•ด delete ๊ถŒํ•œ ํ—ˆ์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • Prometheus Operator๋Š” StatefulSet์„ ์œ„ํ•œ ๊ด€๋ฆฌ๋กœ Service๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ ์ด๊ฒƒ๋“ค์€ prometheus-operatored์™€ alertmanager-operated๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ Services object๋ฅผ ์กฐ์ •(reconcile) ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ์กฐ์ •(reconciliation)์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด get, create, update ๊ทธ๋ฆฌ๊ณ  delete ๊ถŒํ•œ ํ—ˆ์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ํ˜„์žฌ kubelet์€ self-hosted๊ฐ€ ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Pormetheus Operator๋Š” kubelet์˜ IP๋ฅผ Endpoints object๋กœ ๋™๊ธฐํ™”ํ•˜๋Š” ๊ธฐ๋Šฅ์ด ์žˆ์œผ๋ฉฐ, nodes(kubelet)์— list, watch ๊ทธ๋ฆฌ๊ณ  endpoints๋ฅผ ์œ„ํ•ด create, update ์˜ ์ ‘๊ทผ ๊ถŒํ•œ ํ—ˆ์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. [๋งํฌ]

Step 2.2. Prometheus RBAC

Prometheus ์„œ๋ฒ„๋Š” target์„ scrapeํ•˜๊ณ  Alertmanager ๊ฒ€์ƒ‰์„ ์œ„ํ•ด Kubernetes API์— ์ ‘๊ทผ ํ•˜๊ฒŒ๋˜๋ฉฐ, ํ•ด๋‹น ๋ฆฌ์†Œ์Šค์— ์ ‘๊ทผ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ServiceAccount๊ฐ€ ํ•„์š”ํ•˜๊ณ  ClusterRole์„ ์ƒ์„ฑํ•˜์—ฌ binding ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ, Prometheus๋Š” Kubernetes API์—์„œ object๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š๊ณ  ์ฝ๊ธฐ๋งŒ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— get, list ๊ทธ๋ฆฌ๊ณ  watch ์ž‘์—…์˜ ๊ถŒํ•œ๋งŒ ํ•„์š”ํ•˜๋ฉฐ, Kubernetes API์—์„œ ์ง€ํ‘œ(metric)๋ฅผ ๊ฐ€์ ธ์˜ค๋Š”๋ฐ๋กœ ์‚ฌ์šฉํ• ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ง€ํ‘œ๋ฅผ ์Šคํฌ๋žฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” /metrics/ Endpoint์— ๋Œ€ํ•œ ์ ‘๊ทผํ—ˆ์šฉ๋„ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Prometheus ๋ฐฐํฌ์— ์•ž์„œ ๊ถŒํ•œ ํ• ๋‹น์„ ์œ„ํ•ด ClusterRole์„ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    release: prometheus-operator
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
EOF

ClusterRoleBinding์„ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    release: prometheus-operator
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
EOF

ClusterRole์„ ClusterRoleBinding๊ณผ ์ž˜ ๋ฌถ์–ด์ ธ ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

$ kubectl describe clusterrolebinding prometheus
ame:         prometheus
Labels:       release=prometheus-operator
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  prometheus
Subjects:
  Kind            Name        Namespace
  ----            ----        ---------
  ServiceAccount  prometheus  monitoring

monitoring ์ด๋ฆ„์˜ ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

$ kubectl create namespace monitoring
namespace/monitoring created

prometheus ์ด๋ฆ„์˜ ServiceAccount๋ฅผ ์•ž์„œ ์ƒ์„ฑํ•œ monitoring ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์•ˆ์— ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    release: prometheus-operator
  name: prometheus
  namespace: monitoring
EOF

 

Step 3. Deploy and Configure

Step 3.1. Prometheus

Prometheus CR์„ ์ •์˜ํ•˜์—ฌ Prometheus ์„œ๋ฒ„๋ฅผ ๋ฐฐํฌ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.. spec ๊ฐ ํ•„๋“œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ Prometheus Operator API [๋งํฌ] ์„ค๋ช…์„œ๋ฅผ ์ฐธ์กฐ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    release: prometheus-operator
  name: prometheus
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: alertmanager-operated
      namespace: monitoring
      pathPrefix: /
      port: http-web
  enableAdminAPI: false
  evaluationInterval: 30s
  externalUrl: http://prometheus-service.monitoring:9090
  image: quay.io/prometheus/prometheus:v2.36.0
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: prometheus-operator
  portName: http-web
  probeNamespaceSelector: {}
  probeSelector:
    matchLabels:
      release: prometheus-operator
  replicas: 2
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      release: prometheus-operator
  scrapeInterval: 30s
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prometheus-operator
  shards: 1
  version: v2.36.0
EOF

Prometheus Operator๋Š” Prometheus CR์ด ์ƒ์„ฑ ๋œ ๊ฒƒ์„ ๊ฐ์ง€ํ•˜๊ณ  StatefulSets๊ณผ Serivce๋ฅผ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค.

$ kubectl get all,prometheus -n monitoring
NAME                          READY   STATUS    RESTARTS   AGE
pod/prometheus-prometheus-0   2/2     Running   0          70s
pod/prometheus-prometheus-1   1/2     Running   0          10s

NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
service/prometheus-operated   ClusterIP   None         <none>        9090/TCP   70s

NAME                                     READY   AGE
statefulset.apps/prometheus-prometheus   1/2     70s

NAME                                          VERSION   REPLICAS   AGE
prometheus.monitoring.coreos.com/prometheus   v2.36.0   2          71s

prometheus-operated serveice ๋ฆฌ์†Œ์Šค๋„ ํ•จ๊ป˜ ์ƒ์„ฑ์ด ๋˜๋ฉฐ Headless Service๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$ kubectl describe svc/prometheus-operated -n monitoring
Name:              prometheus-operated
Namespace:         monitoring
Labels:            operated-prometheus=true
Annotations:       <none>
Selector:          app.kubernetes.io/name=prometheus
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                None
IPs:               None
Port:              http-web  9090/TCP
TargetPort:        http-web/TCP
Endpoints:         172.16.10.109:9090,172.16.84.122:9090
Session Affinity:  None
Events:            <none>

๋‹ค์Œ์œผ๋กœ Pormetheus์˜ UI์— ์ ‘์† ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ƒ์„ฑ๋˜๋Š” prometheus-operated service๋Š” Headless Service๋กœ ํฌํŠธ ํฌ์›Œ๋”ฉ์„ ์‚ฌ์šฉํ•ด์„œ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์— ์žˆ๋Š” prometheu ์„œ๋ฒ„์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$ kubectl port-forward svc/prometheus-operated 9090 -n monitoring
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

๋˜๋Š” ์™ธ๋ถ€์—์„œ ์ ‘์† ํ•  ์ˆ˜ ์žˆ๋Š” Endpoint๋กœ service ๋ฆฌ์†Œ์Šค๋ฅผ ์ถ”๊ฐ€๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    release: prometheus-operator
  name: prometheus-service
  namespace: monitoring
spec:
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-web
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app.kubernetes.io/name: prometheus
    prometheus: prometheus
  sessionAffinity: None
  type: LoadBalancer
EOF
$ kubectl get svc/prometheus-service -n monitoring -o wide
NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE   SELECTOR
prometheus-service   LoadBalancer   10.101.188.215   192.168.1.182   9090:32663/TCP   13s   app.kubernetes.io/name=prometheus,prometheus=prometheus

ํฌํŠธ ํฌ์›Œ๋”ฉ์„ ํ•˜์…จ๋‹ค๋ฉด ์›น๋ธŒ๋ผ์šฐ์ €์—์„œ http://127.0.0.1:9090์œผ๋กœ ์ ‘์†์ด ๊ฐ€๋Šฅํ•˜๊ณ , ์ถ”๊ฐ€ Service ๋ฆฌ์†Œ์Šค๋ฅผ ์ƒ์„ฑ ํ–ˆ๋‹ค๋ฉด expose type์— ๋”ฐ๋ผ ์ƒ์„ฑ๋˜๋Š” Endpoint ์ฃผ์†Œ๋กœ ์ ‘์†ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

ํ•„์ž๋Š” MetalLB๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์–ด LoadBalancer ์œ ํ˜•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ Endpoint๋กœ ์ ‘์† ํ–ˆ์Šต๋‹ˆ๋‹ค.

Prometheus-Operator-01.png

Step 3.2. ServiceMonitor

ServiceMonitor๋Š” Kubernetes ๋‚ด์—์„œ Service ์˜ค๋ธŒ์ ํŠธ๋กœ ๋ถ€ํ„ฐ ์ง€ํ‘œ(metric)์„ ์Šคํฌ๋žฉํ•˜๋ ค๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, ์ปจํŠธ๋กค๋Ÿฌ๋Š” ์šฐ๋ฆฌ๊ฐ€ ์ •์˜ํ•œ ServiceMonitor๋ฅผ ์ž‘๋™์‹œํ‚ค๊ณ  ํ•„์š”ํ•œ Prometheus ์„œ๋ฒ„์˜ ConfitMap ๊ตฌ์„ฑ์„ ์ž๋™์œผ๋กœ ๊ด€๋ฆฌํ•ด์ฃผ๋Š” Prometheus Operator์˜ CR(custom resource) ์ค‘์— ํ•˜๋‚˜ ์ž…๋‹ˆ๋‹ค.

ServiceMonitor CR์„ ์‚ฌ์šฉํ•˜๋ฉด Prometheus์—์„œ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ๋Œ€์ƒ(targe)์„ ์ง์ ‘ ์ˆ˜์ •ํ•  ํ•„์š” ์—†์ด ์ž๋™์œผ๋กœ ๋Œ€์ƒ๋“ค์„ ๊ด€๋ฆฌ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Prometheus์˜ Service๋กœ ๋ถ€ํ„ฐ ์ง€ํ‘œ๋ฅผ ์Šคํฌ๋žฉํ•  ์ˆ˜ ์žˆ๋„๋ก ServiceMonitor CR์„ ์ƒ์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -  
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    release: prometheus-operator
  name: prometheus-servicemonitor
  namespace: monitoring
spec:
  endpoints:
  - path: /metrics
    port: http-web
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      release: prometheus-operator
EOF

$ kubectl get smon -n monitoring -o wide
NAME                          AGE
prometheus-servicemonitor     30s

ServiceMonitor์— ์˜ํ•ด ์Šคํฌ๋žฉ๋˜๋Š” ๋Œ€์ƒ ์ง€ํ‘œ(metric)์˜ ์„ค์ •์ด ์ž˜ ์ ์šฉ ๋˜์—ˆ๊ณ  ์–ด๋–ค ๊ตฌ์„ฑ ์„ค์ •์ด ์ ์šฉ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

<scrape_config> ๊ตฌ์„ฑ ์„ค์ •์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค์ก์€ [๋งํฌ]์—์„œ ํ™•์ธํ•˜์„ธ์š”.

$ kubectl exec -it prometheus-prometheus-0 -n monitoring -- cat /etc/prometheus/config_out/prometheus.env.yaml
global:
  evaluation_interval: 30s
  scrape_interval: 30s
  external_labels:
    prometheus: monitoring/prometheus
    prometheus_replica: prometheus-prometheus-0
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-rulefiles-0/*.yaml
scrape_configs:
- job_name: serviceMonitor/monitoring/prometheus-servicemonitor/0
  honor_labels: false
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - monitoring
  metrics_path: /metrics
  relabel_configs:
  - source_labels:
    - job
    target_label: __tmp_prometheus_job_name
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_release
    - __meta_kubernetes_service_labelpresent_release
    regex: (prometheus-operator);true
  - action: keep
    source_labels:
    - __meta_kubernetes_endpoint_port_name
    regex: http-web
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Node;(.*)
    replacement: ${1}
    target_label: node
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Pod;(.*)
    replacement: ${1}
    target_label: pod
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - target_label: endpoint
    replacement: http-web
  - source_labels:
    - __address__
    target_label: __tmp_hash
    modulus: 1
    action: hashmod
  - source_labels:
    - __tmp_hash
    regex: 0
    action: keep
  metric_relabel_configs: []
alerting:
  alert_relabel_configs:
  - action: labeldrop
    regex: prometheus_replica
  alertmanagers:
  - path_prefix: /
    scheme: http
    kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
        - monitoring
    api_version: v2
    relabel_configs:
    - action: keep
      source_labels:
      - __meta_kubernetes_service_name
      regex: alertmanager
    - action: keep
      source_labels:
      - __meta_kubernetes_endpoint_port_name
      regex: http-web

Prometheus Configurations์— scrape_configs ์„ค์ •์ด ServiceMonitor CR์„ ์ƒ์„ฑํ•˜๋ฉด์„œ Prometheus Operator์— ์˜ํ•ด ์ž๋™ ๊ตฌ์„ฑ ๋œ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚ด์šฉ๋งŒ ๋ณด์•„๋„ ์ƒ๋‹นํžˆ ๋ณต์žกํ•œ ์„ค์ •์œผ๋กœ ๊ฐœ๋ฐœ์ž/๊ด€๋ฆฌ์ž๊ฐ€ ์ง์ ‘ ์ง์ ‘ ์„ค์ •์„ ํ•˜๊ธฐ์—๋Š” ์‰ฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Prometheus Operator๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ scrape_config ๊ตฌ์„ฑ์„ ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Prometheus Dashboard (UI)์—์„œ Target ์ถ”๊ฐ€ ๋˜์—ˆ๊ณ  ์ง€ํ‘œ(metric)๋“ค์„ ์ˆ˜์ง‘๋˜๊ณ  ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Prometheus-Operator-02.png

Prometheus-Operator-03.png

Step 3.3. PodMonitor

Service ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€ ServiceMonitor CR์„ ์‚ฌ์šฉํ•˜๋ฉด ๋˜์ง€๋งŒ ๊ทธ๋ ‡์ง€ ์•Š๋Š” Pod๋Š” PodMonitor CR์„ ์‚ฌ์šฉํ•˜์—ฌ Pod๋กœ ๋ถ€ํ„ฐ ์ง์ ‘ ์Šคํฌ๋ž˜ํ•‘์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜๋Š” Service ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์ง€๋งŒ ์ง์ ‘์ ์œผ๋กœ ์—ฐ๊ฒฐํ•˜์ง€ ์•Š๋Š” Pod(sidecar container)๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.. ์˜ˆ๋ฅผ ๋“ค์–ด Istio sidecars ๊ฐ™์€ ๊ฒƒ๋“ค์ด ์žˆ๊ฒ ๋„ค์š”.

kube-state-metrics Pod์—์„œ ์ง์ ‘ ๋Œ€์ƒ์„ ์Šคํฌ๋ž˜ํ•‘ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. PodMonitor CR์„ ์ƒ์„ฑํ•˜๊ธฐ์ „์— kube-state-metrics์„ ๋ฐฐํฌ ํ•ฉ๋‹ˆ๋‹ค.

$ helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system

$ kubectl get all -n kube-system | egrep 'NAME|kube-state-metrics' | grep -v 'SELECTOR'
NAME                                           READY   STATUS    RESTARTS          AGE
pod/kube-state-metrics-77f54c6d8b-k56ht        1/1     Running   0                 13m

NAME                                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)      AGE
service/kube-state-metrics                   ClusterIP   10.110.249.203   <none>        8080/TCP     13m

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kube-state-metrics        1/1     1            1           13m
NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/kube-state-metrics-77f54c6d8b        1         1         1       13m

์ด์ œ PodMonitor CR์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ label selector๋ฅผ ์œ„ํ•ด app.kubernetes.io/component: metrics์™€ app.kubernetes.io/instance: kube-state-metrics ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋Š”๋ฐ label์— ๋Œ€ํ•œ ์ •๋ณด๋Š” kube-state-metrics Pod๋ฅผ kubectl describe๋กœ ํ™•์ธํ•˜๋ฉด ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$ cat << EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: kube-state-metric-podmonitor
  labels:
    release: prometheus-operator
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/component: metrics
      app.kubernetes.io/instance: kube-state-metrics
  podMetricsEndpoints:
  - targetPort: 8080
EOF

$ kubectl get pmon -n monitoring
NAME                            AGE
kube-state-metric-podmonitor    40s

PodMonitor๋ฅผ ์ƒ์„ฑ ํ•˜์˜€์ง€๋งŒ Prometheus ์„œ๋ฒ„๋Š” ์•„๋ฌด๋Ÿฐ ๋ณ€ํ™”๊ฐ€ ์—†์„ ๊ฒ๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์œ„์—์„œ Prometheus CR์„ ์‚ฌ์šฉํ•˜์—ฌ Prometheus ์„œ๋ฒ„๋ฅผ ๋ฐฐํฌ ํ•˜์˜€์ง€๋งŒ PodMonitor์˜ ๊ตฌ์„ฑ ์„ค์ •์ค‘ podMonitorSelector.matchLabels ํ•„๋“œ์˜ ๊ฐ’์œผ๋กœ release: prometheus-operator์„ ์ •์˜ ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— PodMonitor๋Š” ์ฃผ์–ด์ง„ ๊ฐ’์˜ Label์„ ์šฐ์„  ์ฐพ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ kube-state-metric Pod์˜ Label์—๋Š” ํ•ด๋‹น ๊ฐ’์ด ์—†๊ธฐ๋•Œ๋ฌธ์— Pod๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๊ฐ„๋‹จํžˆ podMonitorSelector ํ•„๋“œ์— ๊ฐ’์„ {}์œผ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ label selector์— ๋Œ€ํ•œ ์กฐ๊ฑด์„ ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. kubectl edit ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•„๋“œ ๊ฐ’์„ ๋ณ€๊ฒฝํ•ด์ฃผ์„ธ์š”.

$ kubectl get prom/prometheus -n monitoring -o yaml | egrep '  podMonitorNamespaceSelector|  podMonitorSelector'
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}

Prometheus ์„œ๋ฒ„์˜ Pod๋ฅผ ๋ณด๋ฉด PodMonitor๋กœ ์ถ”๊ฐ€๋œ target(๋Œ€์ƒ)์˜ ๊ตฌ์„ฑ ์ •๋ณด๊ฐ€ Prometheus ์„œ๋ฒ„์˜ sidecar config-reloader ์ปจํ…Œ์ด๋„ˆ์— ์˜ํ•ด ๋™์ ์œผ๋กœ ๋ณ€๊ฒฝ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

$ kubectl exec -it prometheus-prometheus-0 -n monitoring -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep -A47 -i "podMonitor/monitoring/kube-state-metric-podmonitor"
- job_name: podMonitor/monitoring/kube-state-metric-podmonitor/0
  honor_labels: false
  kubernetes_sd_configs:
  - role: pod
    namespaces:
      names:
      - kube-system
  relabel_configs:
  - source_labels:
    - job
    target_label: __tmp_prometheus_job_name
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_label_app_kubernetes_io_component
    - __meta_kubernetes_pod_labelpresent_app_kubernetes_io_component
    regex: (metrics);true
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_label_app_kubernetes_io_instance
    - __meta_kubernetes_pod_labelpresent_app_kubernetes_io_instance
    regex: (kube-state-metrics);true
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_container_port_number
    regex: "8080"
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - target_label: job
    replacement: monitoring/kube-state-metric-podmonitor
  - target_label: endpoint
    replacement: "8080"
  - source_labels:
    - __address__
    target_label: __tmp_hash
    modulus: 1
    action: hashmod
  - source_labels:
    - __tmp_hash
    regex: 0
    action: keep
  metric_relabel_configs: []

ServiceMonitor์™€ ProdMonitor์˜ ์ƒ์„ฑ์œผ๋กœ configuration์˜ ๋‚ด์šฉ์ด ์ถ”๊ฐ€๋˜๋ฉด Prometheus ์„œ๋ฒ„์˜ sidecar container์ธ prometheus-config-reloader์— ์˜ํ•ด Pormetheus ์„œ๋ฒ„์˜ ์„ค์ •์„ reload ํ•˜์—ฌ ์ ์šฉ ํ•ฉ๋‹ˆ๋‹ค.

Prometheus-Operator_architecture-05.png

$ kubectl describe pod/prometheus-prometheus-0 -n monitoring
.
.
.
config-reloader:
    Container ID:  containerd://a5c6c9003f13d9bd8eaa6336f6f01d05a6ae14f53a9c4eb093dd27df98e6c805
    Image:         quay.io/prometheus-operator/prometheus-config-reloader:v0.57.0
    Image ID:      quay.io/prometheus-operator/prometheus-config-reloader@sha256:8c45787645d17c51acb44aa0386af3aa5d8bfd7bddd8d57dd041878b9494c5ff
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/prometheus-config-reloader
    Args:
      --listen-address=:8080
      --reload-url=http://localhost:9090/-/reload
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
      --watched-dir=/etc/prometheus/rules/prometheus-prometheus-rulefiles-0
    State:       Running
      Started:   Sat, 11 Jun 2022 01:02:51 +0900
.
.
.

Prometheus Dashboard (UI)์—์„œ PodMonitor์˜ target์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Prometheus-Operator-04.png

Step 3.4. Alertmanager

Alertmanager CR์„ ์ •์˜ํ•˜์—ฌ Alertmanager ์„œ๋ฒ„๋ฅผ ๋ฐฐํฌ ํ•ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  alertmanagerConfigSelector: {}
  alertmanagerConfigNamespaceSelector: {}
  externalUrl: http://alertmanager-service.monitoring:9093
  image: quay.io/prometheus/alertmanager:v0.24.0
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  portName: "http-web"
  replicas: 1
  routePrefix: /
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus
  version: v0.24.0
EOF

์ถ”๊ฐ€๋กœ Alertmanager ์„œ๋ฒ„์˜ UI์— ์ ‘์† ํ•˜๊ธฐ ์œ„ํ•œ service ๋ฆฌ์†Œ์Šค๋„ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ, ๊ธฐ๋ณธ ์ƒ์„ฑ๋˜๋Š” alertmanager-operated service๋ฅผ ํฌํŠธ ํฌ์›Œ๋”ฉ ํ•˜์—ฌ UI ์ ‘์†์„ ํ•œ๋‹ค๋ฉด ์ด ๋‹จ๊ณ„๋Š” ์ƒ๋žตํ•˜์…”๋„ ๋ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    release: prometheus-operator
  name: alertmanager-service
  namespace: monitoring
spec:
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-web
    port: 9093
    protocol: TCP
    targetPort: 9093
  selector:
    app.kubernetes.io/name: alertmanager
    alertmanager: alertmanager
  sessionAffinity: None
  type: LoadBalancer
EOF

์ด์ œ ์ƒ์„ฑ๋œ ๋ฆฌ์†Œ์Šค๋“ค์„ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์ด Alertmanager ์„œ๋ฒ„ Pod, Service, StatefulSets ๋ฆฌ์†Œ์Šค๋“ค์ด ์ƒ์„ฑ ๋˜์—ˆ๋„ค์š”.

$ kubectl get all,am -n monitoring | egrep '^NAME|alertmanager'
NAME                              READY   STATUS    RESTARTS      AGE
pod/alertmanager-alertmanager-0   2/2     Running   0             8m

NAME                            TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
service/alertmanager-operated   ClusterIP      None             <none>          9093/TCP,9094/TCP,9094/UDP   8m
service/alertmanager-service    LoadBalancer   10.97.21.91      192.168.1.183   9093:32638/TCP               8m

NAME                                         READY   AGE
statefulset.apps/alertmanager-alertmanager   1/1     8m

NAME                                              VERSION   REPLICAS   AGE
alertmanager.monitoring.coreos.com/alertmanager   v0.24.0   1          8m

์ด์ œ Dashboard(UI)์— ์ ‘์† ํ•ด๋ณผ๊ป˜์š”. ์•ž์„œ ์ถ”๊ฐ€๋กœ ์ƒ์„ฑํ•œ Service์˜ Endpoint๋ฅผ ์‚ฌ์šฉํ•˜์‹œ๊ฑฐ๋‚˜ ํฌํŠธ ํฌ์›Œ๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ ‘์†ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. http://127,0.0.1:9093

$ kubectl port-forward svc/alertmanager-operated 9093 -n monitoring
Forwarding from 127.0.0.1:9093 -> 9093
Forwarding from [::1]:9093 -> 9093

Prometheus-Operator-05.png

Alertmanager CR์„ ์‚ฌ์šฉํ•˜์—ฌ Alertmanager ์„œ๋ฒ„๋ฅผ ๋ฐฐํฌ ํ–ˆ์ง€๋งŒ ์•„๋ฌด๋Ÿฐ ๊ฒฝ๋ณด(alert)๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์„ ๊ฒ๋‹ˆ๋‹ค.

Alertmanager์˜ ์ฃผ๋œ ๊ธฐ๋Šฅ์€ ๊ณต์‹ ํ™ˆํŽ˜์ด์ง€[๋งํฌ]์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๊ฒฝ๋ณด(alert)๋ฅผ ํ†ต๋ณด(notification)ํ•  ๋ฟ ์‹ค์ œ ๊ฒฝ๋ณด๋ฅผ ํ†ต๋ณดํ•˜๋Š” ๊ฒƒ์€ Prometheus ์„œ๋ฒ„๊ฐ€ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฝ๋ณด(alert)๋ฅผ ๋ฐœ์†กํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” Alert rules์„ Prometheus ์„œ๋ฒ„์— ๋“ฑ๋ก ํ•ด์ฃผ์–ด์•ผ ํ•˜๋ฉฐ PrometheusRule CR ์ •์˜ํ•˜์—ฌ Alert rules์„ ๋“ฑ๋ก ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

PrometheusRule ์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Step 3.5. PrometheusRule

PrometheusRule CR์€ ํ•˜๋‚˜ ์ด์ƒ์˜ RuleGroup ์ •์˜๋ฅผ ์ง€์›ํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ๊ทธ๋ฃน์€ Prometheus์—์„œ ์ง€์›ํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๊ทœ์น™(recoding๊ณผ alerting) ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ์–ธ์ ์œผ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Alert rule์„ ํ•˜๋‚˜ ๋“ฑ๋ก์„ ํ•ด๋ณผํ…๋ฐ prometheus-node-exporter์˜ ์ง€ํ‘œ(metric) ์ค‘ non-root users ํŒŒ์ผ์‹œ์Šคํ…œ ๊ณต๊ฐ„์ด 12GB์ด์ƒ ์‚ฌ์šฉ ํ•  ์ˆ˜ ์žˆ๋Š” ๋…ธ๋“œ์— ๋Œ€ํ•œ ๊ฒฝ๋ณด(alert)๋ฅผ ๋ฐœ์ƒํ•˜๋Š” ๊ทœ์น™์„ ์ •์˜ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด prometheus-node-exporter๊ฐ€ ๋ฐฐํฌ ๋˜์–ด ์žˆ์–ด์•ผ ํ•˜๋ฉฐ, ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ํด๋Ÿฌ์Šคํ„ฐ์— ๋ฐฐํฌ ๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐฐํฌํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

$ helm install node-exporter prometheus-community/prometheus-node-exporter -n kube-system

๋˜ํ•œ, node-exporter์˜ ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ธฐ ์œ„ํ•ด ServiceMonitor CR์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"release":"prometheus-operator"},"name":"prometheus-node-exporter-servicemonitor","namespace":"monitoring"},"spec":{"endpoints":[{"path":"/metrics","port":"metrics"}],"namespaceSelector":{"matchNames":["kube-system"]},"selector":{"matchLabels":{"app":"prometheus-node-exporter","release":"node-exporter"}}}}
  creationTimestamp: "2022-06-11T16:22:44Z"
  generation: 1
  labels:
    release: prometheus-operator
  name: prometheus-node-exporter-servicemonitor
  namespace: monitoring
  resourceVersion: "25028349"
  uid: b5c3c032-0fe4-4263-a0fe-7bc20d8e8116
spec:
  endpoints:
  - path: /metrics
    port: metrics
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app: prometheus-node-exporter
      release: node-exporter
EOF

์ด์ œ PrometheusRule CR์„ ์‚ฌ์šฉํ•˜์—ฌ Alert rule์„ ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    release: prometheus-operator
  name: prometheus-example-rules
spec:
  groups:
  - name: alerting_filesystem
    rules: 
    - alert: node_filesystem_avail_bytes
      expr: node_filesystem_avail_bytes > 12361482240
      for: 10s
      labels:
        severity: "critical"
EOF

Prometheus ์„œ๋ฒ„ UI์˜ Rules ์ƒํƒœ๋ฅผ ๋ณด๋ฉด alerting_filesystem ๊ทธ๋ฃน ์ด๋ฆ„์œผ๋กœ alert rule์ด ๋“ฑ๋ก ๋˜์—ˆ์œผ๋ฉฐ,

Prometheus-Operator-07.png

Alerts tab์„ ๋ณด๋ฉด node_filesystem_avail_bytes ์ง€ํ‘œ์— ๋Œ€ํ•œ alert๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Prometheus-Operator-06.png

AlertmanagerConfig CR์€ Alertmanager์˜ Alert rule์„ ๋ฐ”ํƒ•์œผ๋กœ Receiver๋กœ Alert๋ฅผ ๋ผ์šฐํŒ…(route) ํ•˜์—ฌ OpsGenie, PagerDuty, Slack, webhook, Email, VictorOps, Pushover, SNS ๊ทธ๋ฆฌ๊ณ  Telegram์œผ๋กœ ๊ฒฝ๊ณ (alert)๋ฅผ ํ†ต๋ณด(notification)ํ•˜๊ณ  ๊ธˆ์ง€ ๊ทœ์น™(inhibit rule)์˜ ์„ค์ •์„ custom resource๋กœ ์ •์˜ํ•˜์—ฌ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Alert๋ฅผ Slack์— ํ†ต๋ณด(notification)ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. Slack ์ฑ„๋„์— alert๋ฅผ ๋ณด๋‚ด๊ธฐ ์œ„ํ•ด Incoming webhooks for Slack์„ ์ฑ„๋„์— ๋“ฑ๋ก ํ•ฉ๋‹ˆ๋‹ค.

alert๋ฅผ ๋ฉ”์„ธ์ง€๋ฅผ ๋ณด๋‚ผ Slack ์›Œํฌ์ŠคํŽ˜์ด์Šค์—์„œ Incoming webhooks Apps์„ ๋“ฑ๋ก ํ•ฉ๋‹ˆ๋‹ค.

Slack_notifications-01.png

๊ฒฝ๋ณด(alert)๋ฅผ ๋ณด๋‚ผ ์ฑ„๋„์„ ์„ ํƒํ•˜๊ณ  Incoming Webhooks์„ ๋“ฑ๋ก ํ•ฉ๋‹ˆ๋‹ค.

Slack_notifications-02.png

Webhook URL์—๋Š” nofication์„ ๋ณด๋‚ผ Slack ์ฑ„๋„๊ณผ access token์ด ํฌํ•จ ๋˜์–ด ์žˆ์œผ๋‹ˆ ์ž˜ ๋ณด๊ด€ํ•ฉ๋‹ˆ๋‹ค.

Slack_notifications-03.png

์ด์ œ AlertmanagerConfig CR์„ ์ƒ์„ฑ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  labels:
    release: prometheus-operator
  name: slack-alertmanagerconfig
spec:
  receivers:
  - name: slack-nofications
    slackConfigs:
    - apiURL:
        key: token
        name: slack-webhook-url
        optional: false
      channel: '#test'
  route:
    groupBy:
    - severity
    groupInterval: 5m
    groupWait: 30s
    receiver: slack-nofications
    repeatInterval: 5m
EOF

Prometheus Operator๊ฐ€ Slack ์ฑ„๋„์— ์ ‘๊ทผ ํ•˜๊ธฐ ์œ„ํ•œ Slack ์ฑ„๋„ URL๊ณผ ์ ‘๊ทผ ์ธ์ฆ ์ •๋ณด๋ฅผ Kubernetes Secret์— ์ƒ์„ฑํ•˜๋ฉฐ ์ฐธ์กฐํ•˜๋ฉฐ, ์ด Secret์€ AlertmanagerConfig CR object์™€ ๋™์ผํ•œ ๋„ค์ž„์ŠคํŽ˜์ด์Šค์— ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Slack ์ ‘์† ์ •๋ณด๋ฅผ ์„ ์œ„ํ•œ Secret์„ ์ƒ์„ฑ์— ์•ž์„œ ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ Incoming Webhook URL์„ Secret์— ๋“ฑ๋กํ•˜๊ธฐ ์œ„ํ•ด base64๋กœ encodingํ•˜๊ณ  Secret์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

$ echo -e "https://hooks.slack.com/services/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" | base64
aHR0cHM6Ly9ob29rcy5zbGFjay5jb20vc2VydmljZXMvVFA1TjNHSzQzL0IwM0szTlZTUE1ML3JHVWRscktJQm1SWVliREVvYVJLR3FCZgo=
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: slack-webhook-url
data:
  token: aHR0cHM6Ly9ob29rcy5zbGFjay5jb20vc2VydmljZXMvVFA1TjNHSzQzL0IwM0szTlZTUE1ML3JHVWRscktJQm1SWVliREVvYVJLR3FCZgo=
EOF

Alertmanager Dashboard UI์— ์ ‘์†ํ•˜๋ฉด Alert๊ฐ€ 9๊ฐœ trigger ๋˜์—ˆ๋„ค์š”.

Slack_notifications-04.png

๊ทธ๋ฆฌ๊ณ  Slack ์ฑ„๋„๋กœ alert๊ฐ€ notification ๋œ๊ฒƒ๋„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Slack_notifications-05.png

์ถ”๊ฐ€๋กœ Alertmanager CR ์ƒ์„ฑ์‹œ Alertmanager configuration์€ ๊ฐ™์€ ๋„ค์ž„์ŠคํŽ˜์ด์Šค์˜ Kubernetes Secret์œผ๋กœ ํ”„๋กœ๋น„์ €๋‹ ๋ฉ๋‹ˆ๋‹ค.

$ kubectl describe secret/alertmanager-alertmanager-generated -n monitoring
Name:         alertmanager-alertmanager-generated
Namespace:    monitoring
Labels:       managed-by=prometheus-operator
Annotations:  <none>

Type:  Opaque

Data
====
alertmanager.yaml:  495 bytes

Secret์˜ ๋‚ด์šฉ์„ ๋ณด๋ฉด AlertmanagerConfiguration CR๋กœ ์ •์˜ํ•œ configuration์ด ์„ค์ • ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

$ kubectl get secret/alertmanager-alertmanager-generated -n monitoring -o jsonpath="{.data.alertmanager\.yaml}" | base64 -d
route:
  receiver: "null"
  routes:
  - receiver: monitoring/slack-alertmanagerconfig/slack-nofications
    group_by:
    - severity
    matchers:
    - namespace="monitoring"
    continue: true
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 5m
receivers:
- name: "null"
- name: monitoring/slack-alertmanagerconfig/slack-nofications
  slack_configs:
  - api_url: https://hooks.slack.com/services/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
    channel: '#test'
templates: []

 

Linting of CRD configuration files

CRD ๊ตฌ์„ฑ ํŒŒ์ผ์˜ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ๋ฅผ ์ž๋™ํ™” ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ž์„ธํ•œ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์€ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

po-lint๋Š” prometheus-operator์˜ api/monitoring/v1์˜ types์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ๋ฒ•์˜ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

 

Appendix

Collections in YAML Example

  • ServiceMonitor for Alertmanager

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        release: prometheus-operator
      name: alertmanager-servicemonitor
    spec:
      endpoints:
      - path: /metrics
        port: http-web
      namespaceSelector:
        matchNames:
        - monitoring
      selector:
        matchLabels:
          release: prometheus-operator
  • ServiceMonitor for prometheus-node-exportor

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        release: prometheus-operator
      name: prometheus-node-exporter-servicemonitor
    spec:
      endpoints:
      - path: /metrics
        port: metrics
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          app: prometheus-node-exporter
          release: node-exporter
  • PorMonitor for kube-state-metric

    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
      labels:
        release: prometheus-operator
      name: kube-state-metric-podmonitor
    spec:
      namespaceSelector:
        matchNames:
        - kube-system
      podMetricsEndpoints:
      - targetPort: 8080
      selector:
        matchLabels:
          app.kubernetes.io/component: metrics
          app.kubernetes.io/instance: kube-state-metrics

 

Official Website

Website

GitHub

Operator Hub

GO Packages

 

Reference

 

END

โš ๏ธ **GitHub.com Fallback** โš ๏ธ