3) Prometheus Operator for Beginner - jangjaelee/tutorials-prometheus GitHub Wiki
2022.06. ์ด์ฅ์ฌ ๐ง [email protected] ๐ https://github.com/jangjaelee ๐ http://www.awx.kr
Prometheus Operator๋ Prometheus ๊ธฐ๋ฐ Kubernetes ๋ชจ๋ํฐ๋ง ์คํ์ ๊ด๋ฆฌํ๊ธฐ ์ํ์ฌ Kubernetes Operator ํจํด์ ๊ตฌํํ ๊ฒ ์ ๋๋ค. Kubernetes Operator๋ Kubernetes์์ ํน์ ์๋น์ค์ ์คํ์ ๊ด๋ฆฌ ๋ฐ ๊ตฌํ ์ธ๋ถ ์ ๋ณด๋ฅผ ์ถ์ํํ๊ธฐ ์ํด Kubernetes custom resource์ controller code๋ก ๊ตฌ์ฑ ๋ฉ๋๋ค.
Prometheus Operator์ ์ฃผ์ ๋ชฉ์ ์ Kubernetes ํด๋ฌ์คํฐ์์ ์คํ๋๋ Prometheus ๋ชจ๋ํฐ๋ง ์คํ์ ๊ตฌ์ฑ ๋ฐ ๊ด๋ฆฌ๋ฅผ ๋จ์ํํ๊ณ ์๋ํํ๋ ๊ฒ์ด๋ฉฐ, Kubernetes CustomResourceDefinitions(CRDs)๋ฅผ ํตํด Prometheus, Alertmanager ๋ฐ ๊ธฐํ ๊ตฌ์ฑ ์์๋ฅผ ์ฝ๊ฒ ์ฌ์ฉ์ ์ง์ ํ ์ ์์ต๋๋ค.
์๋ฅผ ๋ค์ด ServiceMonitor
custom resource๋ฅผ ์ฌ์ฉํ์ฌ YAML manifest์์ Kubernetes ์๋น์ค ๊ทธ๋ฃน์ ๋ชจ๋ํฐ๋งํ๋ ๋ฐฉ๋ฒ์ ๊ตฌ์ฑํ ์ ์์ผ๋ฉฐ, Operator controller๋ K8s API ์๋ฒ์ ํต์ ํ์ฌ Service endpoints์ ๋ชจ๋ํฐ๋งํ๊ณ ๊ตฌ์ฑ๋ ์๋น์ค์ ํ์ํ Prometheus scrape ๊ตฌ์ฑ์ ์๋์ผ๋ก ์์ฑ ํฉ๋๋ค.
Prometheus Operator์ ์ฃผ์ ํน์ง์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
- Kubernetes Custom Resources: Use Kubernetes custom resources to deploy and manage Prometheus, Alertmanager, and related components.
- Simplified Deployment Configuration: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.
- Prometheus Target Configuration: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn a Prometheus specific configuration language.
Prometheus Operator๋ฅผ ์ค์น ๋ฐฉ๋ฒ์ ์์๋ณด๊ณ Prometheus ๋ชจ๋ํฐ๋ง ์คํ์ ๊ตฌ์ฑํ๊ณ ์๋ Prometheus Operator์ custom resources์๋ ์ด๋ค ๊ฒ๋ค์ด ์๋์ง ๊ทธ๋ฆฌ๊ณ custom resource์ YAML Example๋ฅผ ์ดํด ๋ณด๊ฒ ์ต๋๋ค.
Prometheus Operator๋ ์ฑ์๋ ์์ค์ด Level IV๋ก ๊น์ ํต์ฐฐ๋ ฅ(Deep Insights) ์์ค์ ์ ๊ณตํฉ๋๋ค.
Operator Capability Levels ๊ฐ ๋จ๊ณ์ ๋ํ ์์ธํ ์ค๋ช ์ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ์ธ์.
[์ถ์ฒ] https://www.nicktriller.com/blog/managing-prometheus-on-kubernetes-with-prometheus-operator/
[์ถ์ฒ] https://www.youtube.com/watch?v=Uph_Say4D3M
[์ถ์ฒ] https://prometheus-operator.dev/docs/operator/troubleshooting/custom-metrics-elements.png
[์ถ์ฒ] https://zhuanlan.zhihu.com/p/76835425
[์ถ์ฒ] https://www.youtube.com/watch?v=Uph_Say4D3M
Prometheus Operator์ ํต์ฌ ๊ธฐ๋ฅ์ผ๋ก Kubernetes API ์๋ฒ์์ ํน์ ๊ฐ์ฒด์ ๋ณ๊ฒฝ ์ฌํญ์ ๋ชจ๋ํฐ๋ง์ ํตํด ๊ฐ์ฒด์ ๋ณ๊ฒฝ ์ฌํญ์ ๊ฐ์งํ๊ณ ๊ฐ์ฒด๊ฐ ์ ์ธ์ํ์ ์ผ์นํ๋์ง ํ์ธํ๋ฉฐ, CRD๋ฅผ ์ฌ์ฉํ์ฌ ์๋ ํฉ๋๋ค.
-
Prometheus : ํ๋ก๋ฉํ ์ฐ์ค ๋ฐฐํฌ๋ฅผ StatefulSet์ผ๋ก ์ ์
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: example namespace: monitoring spec: replicas: 2 serviceAccountName: prometheus-k8s serviceMonitorSelector: {} ruleSelector: {} podMonitorSelector: {} probeSelector: {} alerting: alertmanagers: - namespace: monitoring name: alertmanager-main port: web
-
Alertmanager : Alertmanager ๋ฐฐํฌ๋ฅผ ์ ์
apiVersion: monitoring.coreos.com/v1 kind: Alertmanager metadata: name: alertmanager-main namespace: monitoring spec: replicas: 3 alertmanagerConfigSelector: {}
-
ThanosRuler : ThanosRuler ๋ฐฐํฌ์ ์ค์ ์ ์ ์ธ์ ์ผ๋ก ์ ์ - Thanos Rule [๋งํฌ]
apiVersion: monitoring.coreos.com/v1 kind: ThanosRuler metadata: name: thanos-ruler-demo labels: example: thanos-ruler namespace: monitoring spec: image: quay.io/thanos/thanos ruleSelector: matchLabels: role: my-thanos-rules queryEndpoints: - dnssrv+_http._tcp.my-thanos-querier.monitoring.svc.cluster.local
-
ServiceMonitor : Kubernetes ์๋น์ค ๊ทธ๋ฃน์ ๋ชจ๋ํฐ๋งํ๋ ๋ฐฉ๋ฒ์ ์ ์ธ์ ์ผ๋ก ์ง์
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example namespace: monitoring spec: selector: matchLabels: operated-prometheus: 'true' endpoints: - port: web interval: 30s
-
PodMonitor : Pod ๊ทธ๋ฃน์ ๋ชจ๋ํฐ๋งํ๋ ๋ฐฉ๋ฒ์ ์ ์ต์ ์ผ๋ก ์ง์
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: example namespace: monitoring spec: selector: matchLabels: app: prometheus podMetricsEndpoints: - port: web interval: 30s
-
Probe : Ingress ๋๋ ์ ์ ๋์ ๊ทธ๋ฃน์ ๋ชจ๋ํฐ๋งํ๋ ๋ฐฉ๋ฒ์ ์ ์ธ์ ์ผ๋ก ์ง์
apiVersion: monitoring.coreos.com/v1 kind: Probe metadata: name: servers namespace: monitoring spec: jobName: servers interval: 10s prober: url: localhost path: /metrics metricRelabelings: - sourceLabels: [__address__] targetLabel: target targets: staticConfig: static: - 192.168.0.1:9182 - 192.168.0.2:9182 relabelingConfigs: - sourceLabels: [__param_target] targetLabel: instance - sourceLabels: [__param_target] targetLabel: __address__
-
PrometheusRule : ํ๋ก๋ฉํ ์ฐ์ค ๊ฒฝ๊ณ ๋ฐ ๊ธฐ๋ก ๊ท์น ์งํฉ์ ์ ์
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: prometheus-example-rules namespace: monitoring spec: groups: - name: ./example.rules rules: - alert: ExampleAlert expr: vector(1)
-
AlertmanagerConfig : Alertmanager ๊ตฌ์ฑ์ ํ์ ์น์ ์ ์ ์ธ์ ์ผ๋ก ์ง์
apiVersion: monitoring.coreos.com/v1alpha1 kind: AlertmanagerConfig metadata: name: example namespace: monitoring spec: receivers: - name: example route: receiver: example
๋ณด๋ค ์์ธํ CRDs์ ์ค๋ช ์ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ์ธ์.
๋ณด๋ค ์์ธํ API types์ ๋ํ ์ค๋ช ์ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ์ธ์.
Prometheus Operator๋ ์ธ ๊ฐ์ง ๋ฐฉ๋ฒ์ผ๋ก ์ค์น๊ฐ ๊ฐ๋ฅํฉ๋๋ค.
- Prometheus Operator : bundle YAML manifest๋ก ์ค์น
- kube-prometheus : YAML manifest๋ก ์ค์นํ๋ฉฐ ๋ค์์ ๊ตฌ์ฑ์์๊ฐ ํจํค์ง๋ก ํฌํจ๋จ (Prometheus Operator, Prometheus[HA], Alertmanager[HA], node-exporter, Prometheus Adapter for K8s Metrics APIs, kube-state-metrics, Grafana)
- kube-prometheus-stack : kube-prometheus์ Helm Charts๋ก ํจํค์ง (Helm v3+ ํ์)
๋ณธ ๋ด์ฉ์ ์์๋ Prometheus Operator์ bundle YAML manifest์ kube-prometheus-stack์ Helm Charts ๋ ๊ฐ์ง ๋ฐฉ๋ฒ์ ๋ํด ๋ค๋ฃจ๋ฉฐ ์ฌ์ฉํ๊ธฐ ํธํ ๋ฐฉ๋ฒ์ ์ ํํ์๊ธฐ ๋ฐ๋๋๋ค.
bundle YAML manifest๋ฅผ ๋ค์ด๋ก๋ ๋ฐ์ต๋๋ค.
manifest์๋ Prometheus Operator๋ฅผ ์ค์นํ๊ธฐ ์ํ CRD, ClusterRoleBinding, ClusterRole, Deployment, ServiceAccount, Service ์ค๋ธ์ ํธ ๋ช ์ธ์๋ฅผ ํฌํจํ๊ณ ์์ต๋๋ค.
$ curl -OL https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/master/bundle.yaml
์ ์ฉ์ ๋ค์์คํ์ด์ค ๊ฐ์ ๋ณ๊ฒฝํฉ๋๋ค.
๋ณ๊ฒฝ์ ) default โ ๋ณ๊ฒฝํ) prometheus-operator
$ sed -i 's/namespace\: default/namespace\: prometheus\-operator/g' bundle.yaml
$ grep -i 'namespace: prometheus-operator' bundle.yaml
Prometheus Operator๊ฐ ๋ฐฐํฌ ๋ ๋ค์์คํ์ด์ค๋ฅผ ์์ฑํ๊ณ prometheus-operator๋ฅผ ๋ฐฐํฌํฉ๋๋ค.
$ kubectl create namespace prometheus-operator
namespace/prometheus-operator created
$ kubectl create -n prometheus-operator -f bundle.yaml
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
serviceaccount/prometheus-operator created
service/prometheus-operator created
์ฃผ์) prometheus-operator๋ฅผ ๋ฐฐํฌ ํ ๋ kubectl CLI์์ apply๊ฐ ์๋ create ์ฌ์ฉ ๋๋ apply ์ฌ์ฉ์ โโserver-sideโ ์ต์ ์ ํจ๊ป ์ฌ์ฉํ์ธ์.
apply๋ฅผ ์ฌ์ฉํ๊ฒ ๋๋ฉด ๋ค์๊ณผ ๊ฐ์ ์ค๋ฅ๋ฉ์ธ์ง๋ฅผ ๋ง๋๊ฒ ๋๋ฉฐ,
The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
apply๋ก ๋ฆฌ์์ค๋ฅผ ์์ฑ/์
๋ฐ์ดํธ ํ ๋ ๋ง๋ค ๋ช
์ธ์์ *metadata.annotation*
์ *kubectl.kubernetes.io/last-applied-configuration*
ํ๋์ JSON ๋ฌธ์๊ฐ ์ถ๊ฐ๋๊ธฐ ๋๋ฌธ์
๋๋ค.
[์ฐธ์กฐ I] ์ฟ ๋ฒ๋คํฐ์ค ๊ณต์ ํํ์ด์ง์ How to create objects์์ kubectl apply์ ์ค๋ช
[์ฐธ์กฐ II] Kubectl Install CRD Failed โ Annotations Too Long
[์ฐธ์กฐ III] Server-Side Apply
YAML manifest๋ฅผ ์ฌ์ฉํ์ฌ ์์ฑ๋ CRDs๋ค์ ๋ฌด์์ด ์๋์ง ๊ทธ๋ฆฌ๊ณ ๋ฐฐํฌ๋ ๋ฆฌ์์ค๋ค์ ๋ฌด์์ด ์๋์ง ํ์ธํด๋ณด๊ฒ ์ต๋๋ค.
-
CRDs
$ kubectl get crds | grep coreos alertmanagerconfigs.monitoring.coreos.com 2022-06-08T10:04:27Z alertmanagers.monitoring.coreos.com 2022-06-08T10:04:27Z podmonitors.monitoring.coreos.com 2022-06-08T10:04:28Z probes.monitoring.coreos.com 2022-06-08T10:04:28Z prometheuses.monitoring.coreos.com 2022-06-08T10:04:28Z prometheusrules.monitoring.coreos.com 2022-06-08T10:04:28Z servicemonitors.monitoring.coreos.com 2022-06-08T10:04:28Z thanosrulers.monitoring.coreos.com 2022-06-08T10:04:28Z
-
Objects
$ kubectl get all -n prometheus-operator NAME READY STATUS RESTARTS AGE pod/prometheus-operator-567cd8b6f6-nhnr5 1/1 Running 0 99s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-operator ClusterIP None <none> 8080/TCP 99s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-operator 1/1 1 1 99s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-operator-567cd8b6f6 1 1 1 99s
Prometheus-Community์์ ์ ๊ณตํ๋ Helm Chart package์ค kube-prometheus-stack๋ฅผ ์ฌ์ฉํ๋ ์ค์น๋ฅผ ์์๋ณด๊ฒ ์ต๋๋ค. ๋จผ์ helm chart repo๋ฅผ ๋ฑ๋กํฉ๋๋ค.
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
helm show crds๋ฅผ ์ฌ์ฉํ์ฌ charts๋ฅผ ์ฌ์ฉํ์ฌ ์ค์น๋๋ CRDs์ ์ด๋ค ๊ฒ๋ค์ด ์๋์ง ํ์ธํด๋ณด๊ฒ ์ต๋๋ค.
$ helm show crds prometheus-community/kube-prometheus-stack > kube-prometheus-stack-crds.yaml
$ grep -A7 '^kind: CustomResourceDefinition' kube-prometheus-stack-crds.yaml | grep '^ name:' | awk '{print $2}' | sort -u
alertmanagerconfigs.monitoring.coreos.com
alertmanagers.monitoring.coreos.com
podmonitors.monitoring.coreos.com
probes.monitoring.coreos.com
prometheuses.monitoring.coreos.com
prometheusrules.monitoring.coreos.com
servicemonitors.monitoring.coreos.com
thanosrulers.monitoring.coreos.com
Prometheus Community์ kube-prometheus-stack์ Prometheus-Operator์ kube-prometheus๋ฅผ Helm Chart๋ก ํจํค์ง ํ๊ธฐ ๋๋ฌธ์ kube-prometheus์์ ์ ๊ณตํ๋ Prometheus[HA], Alertmanager[HA], node-exporter, Prometheus Adapter for K8s Metrics APIs, kube-state-metrics, Grafana ๋ค์ด ํฌํจ๋์ด ์์ต๋๋ค.
๋ณธ ๋ด์ฉ์์๋ Prometheus-Operator๋ง์ ์ค์นํ๊ธฐ ์ํด ํจํค์ง ๋์ด ์ ๊ณต๋๋ ๊ฒ๋ค์ ๋ฐฐํฌ๋ ์๋ตํฉ๋๋ค.
$ helm install prometheus-operator prometheus-community/kube-prometheus-stack -n prometheus-operator --create-namespace \
--set grafana.enabled=false \
--set alertmanager.enabled=false \
--set prometheus.enabled=false \
--set nodeExporter.enabled=false \
--set kubeStateMetrics.enabled=false \
--set prometheusOperator.enabled=true
NAME: prometheus-operator
LAST DEPLOYED: Thu Jun 9 16:31:20 2022
NAMESPACE: prometheus-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace prometheus-operator get pods -l "release=prometheus-operator"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
Helm charts๋ก ์์ฑ๋ CRDs๋ค์ ๋ฌด์์ด ์๋์ง ๊ทธ๋ฆฌ๊ณ ๋ฐฐํฌ๋ ๋ฆฌ์์ค๋ค์ ๋ฌด์์ด ์๋์ง ํ์ธํด๋ณด๊ฒ ์ต๋๋ค.
-
CRDs
$ kubectl get crds | grep coreos alertmanagerconfigs.monitoring.coreos.com 2022-06-09T07:31:18Z alertmanagers.monitoring.coreos.com 2022-06-09T07:31:18Z podmonitors.monitoring.coreos.com 2022-06-09T07:31:18Z probes.monitoring.coreos.com 2022-06-09T07:31:18Z prometheuses.monitoring.coreos.com 2022-06-09T07:31:18Z prometheusrules.monitoring.coreos.com 2022-06-09T07:31:18Z servicemonitors.monitoring.coreos.com 2022-06-09T07:31:18Z thanosrulers.monitoring.coreos.com 2022-06-09T07:31:18Z
-
Objects
$ kubectl get all -n prometheus-operator NAME READY STATUS RESTARTS AGE pod/prometheus-operator-kube-p-operator-7d9dbbc6db-hgl2x 1/1 Running 0 8m53s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-operator-kube-p-operator ClusterIP 10.104.57.243 <none> 443/TCP 8m53s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-operator-kube-p-operator 1/1 1 1 8m53s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-operator-kube-p-operator-7d9dbbc6db 1 1 1 8m53s
kubectl api-resource ๋ช ๋ น์ ์ฌ์ฉํ์ฌ Prometheus Operator์ Kubernetes API resource ์ ๋ณด๋ฅผ ํ์ธํ๊ณ ๊ฐ ์ค๋ธ์ ํธ๋ค์ shortname์ ์์๋ณด๊ฒ ์ต๋๋ค.
$ kubectl api-resources | egrep '^NAME|coreos'
NAME SHORTNAMES APIVERSION NAMESPACED KIND
alertmanagerconfigs amcfg monitoring.coreos.com/v1alpha1 true AlertmanagerConfig
alertmanagers am monitoring.coreos.com/v1 true Alertmanager
podmonitors pmon monitoring.coreos.com/v1 true PodMonitor
probes prb monitoring.coreos.com/v1 true Probe
prometheuses prom monitoring.coreos.com/v1 true Prometheus
prometheusrules promrule monitoring.coreos.com/v1 true PrometheusRule
servicemonitors smon monitoring.coreos.com/v1 true ServiceMonitor
thanosrulers ruler monitoring.coreos.com/v1 true ThanosRuler
Prometheus Operator๋ฅผ ์ํ RBAC(์ญํ ๊ธฐ๋ฐ ์ ๊ทผ ์ ์ด)์ ๋๊ฐ์ ํํธ๋ก ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
์ฒซ์งธ๋ก Prometheus Operator ์์ฒด์ ๋ํ RBAC ๊ท์น
๋์งธ๋ก Prometheus๊ฐ target ๋ฐ Alertmanater ๊ฒ์์ ์ํด Kubernetes API์ ์ ๊ทผ์ด ํ์ํ๋ฏ๋ก Prometheus Operator๊ฐ ์์ฑํ Prometheus Pod ์์ฒด์ ๋ํ RBAC ๊ท์น
Prometheus Operator๊ฐ RBAC ๊ธฐ๋ฐ ๊ถํ ๋ถ์ฌ ํ๊ฒฝ์์ ์๋ํ๊ธฐ ์ํด์๋ Operator๊ฐ Kubernetes API์ ํ์ํ ๋ชจ๋ ๋ฆฌ์์ค์ ์ ๊ทผํ ์ ์๋ ๊ถํ์ ๋ถ์ฌํ๊ธฐ ์ํด ClusterRole
์ด ์์ด์ผ ํฉ๋๋ค.
Prometheus Operator์ RBAC๋ฅผ ์ํ ClusterRole
, ClusterRoleBinding
, ServiceAccount
๋ Operator๋ฅผ ์ค์นํ๋ ๊ณผ์ ์์ ๊ธฐ๋ณธ์ ์ผ๋ก ์์ฑ๋๋ฉฐ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
$ kubectl get clusterrole,clusterrolebinding | egrep -i '^NAME|prometheus'
NAME CREATED AT
clusterrole.rbac.authorization.k8s.io/prometheus-operator 2022-06-09T08:33:30Z
NAME ROLE AGE
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator ClusterRole/prometheus-operator 73m
$ kubectl get sa/prometheus-operator -n prometheus-operator
NAME SECRETS AGE
serviceaccount/prometheus-operator 1 73m
Prometheus Operator๋ customresourcedefinitions(CRDs)
์ ํจ๊ป ๊ด๋ฒ์ํ๊ฒ ์๋ํจ์ ๋ฐ๋ผ ๋ค์ object๋ค์ ๋ชจ๋ ๊ถํ์ด ์๊ตฌ ๋ฉ๋๋ค.
- alertmanagers
- podmonitors
- probes
- prometheuses
- prometheusrules
- servicemonitors
- thanosrulers
๊ทธ๋ ๋ค๋ฉด ํ์ํ ๊ถํ์ด ๋ชจ๋ ๋ถ์ฌ ๋์๋์ง kubectl describe
๋ช
๋ น์ผ๋ก ClusterRole
์ ํ์ธํด๋ณด๊ฒ ์ต๋๋ค.
$ kubectl describe clusterrole/prometheus-operator
Name: prometheus-operator
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/name=prometheus-operator
app.kubernetes.io/version=0.57.0
Annotations: <none>
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
configmaps [] [] [*]
secrets [] [] [*]
statefulsets.apps [] [] [*]
alertmanagerconfigs.monitoring.coreos.com [] [] [*]
alertmanagers.monitoring.coreos.com/finalizers [] [] [*]
alertmanagers.monitoring.coreos.com [] [] [*]
podmonitors.monitoring.coreos.com [] [] [*]
probes.monitoring.coreos.com [] [] [*]
prometheuses.monitoring.coreos.com/finalizers [] [] [*]
prometheuses.monitoring.coreos.com/status [] [] [*]
prometheuses.monitoring.coreos.com [] [] [*]
prometheusrules.monitoring.coreos.com [] [] [*]
servicemonitors.monitoring.coreos.com [] [] [*]
thanosrulers.monitoring.coreos.com/finalizers [] [] [*]
thanosrulers.monitoring.coreos.com [] [] [*]
endpoints [] [] [get create update delete]
services/finalizers [] [] [get create update delete]
services [] [] [get create update delete]
namespaces [] [] [get list watch]
ingresses.networking.k8s.io [] [] [get list watch]
pods [] [] [list delete]
nodes [] [] [list watch]
ClusterRole์ ๋ฌถ์ธ ClusterRoleBinding์ describe ํด์ ๋ณด๋ฉด prometheus-operator ClusterRole๊ณผ prometheus-operator ServiceAccount objects๋ค์ด ๋ฌถ์ธ ๊ฒ์ ์ ์ ์์ต๋๋ค.
$ kubectl describe clusterrolebinding/prometheus-operator
Name: prometheus-operator
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/name=prometheus-operator
app.kubernetes.io/version=0.57.0
Annotations: <none>
Role:
Kind: ClusterRole
Name: prometheus-operator
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount prometheus-operator prometheus-operator
๊ทธ๋ ๋ค๋ฉด Prometheus Operator๋ ์ ์ด๋ ๊ฒ ๋ง์ RBAC ๊ธฐ๋ฐ ๊ถํ์ด ํ์ ํ ๊น์? ์ด์ ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
- Alertmanager์ Prometheus ํด๋ฌ์คํฐ๋
StatefulSets
์ ์ฌ์ฉํ์ฌ ์์ฑ ํฉ๋๋ค. ๋ฐ๋ผ์ Alertmanager ๋๋ Prometheus objects์ ๋ํ ๋ชจ๋ ๋ณ๊ฒฝ ์ฌํญ์StatefulSets
์ ๋ํ ๋ณ๊ฒฝ์ผ๋ก ์ด๋ฃจ์ด ์ง๊ธฐ ๋๋ฌธ์prometheus.monitoring.coreos.com
๋ฆฌ์์ค์ ๋ํด ๋ชจ๋ ๊ถํ ํ์ฉ๋์ด์ผ ํฉ๋๋ค. - Prometheus๊ฐ ์คํ๋๋๋ก configurations ์์ฑ์ ์ฒ๋ฆฌํ๋ฏ๋ก
ConfigMaps
์ ๋ํ ๊ถํ ํ์ฉ์ด ํ์ํฉ๋๋ค. - Prometheus ๋๋ Alertmanager์ ํ ๋ฒ์ ์์ ๋ค๋ฅธ ๋ฒ์ ์ผ๋ก ๋ง์ด๊ทธ๋ ์ด์
์ ์ํ ํ ๋ ์๋ก์ด ๋ฒ์ ์ Pods ๊ตฌ๋์ ์ํด
list
๊ฐ ๊ทธ๋ฆฌ๊ณ ์ด์ ๋ฒ์ ์ ์ญ์ ํ๊ธฐ ์ํดdelete
๊ถํ ํ์ฉ์ด ํ์ํฉ๋๋ค. - Prometheus Operator๋
StatefulSet
์ ์ํ ๊ด๋ฆฌ๋กService
๋ฅผ ์ฌ์ฉํ๋ฉฐ ์ด๊ฒ๋ค์prometheus-operatored
์alertmanager-operated
๋ผ๊ณ ๋ถ๋ฅด๋ฉฐServices
object๋ฅผ ์กฐ์ (reconcile) ํฉ๋๋ค. ๊ทธ๋ฆฌ๊ณ ์ด ์กฐ์ (reconciliation)์ ์ํํ๋ ค๋ฉดget
,create
,update
๊ทธ๋ฆฌ๊ณdelete
๊ถํ ํ์ฉ์ด ํ์ํฉ๋๋ค. - ํ์ฌ kubelet์ self-hosted๊ฐ ๋์ง ์๊ธฐ ๋๋ฌธ์ Pormetheus Operator๋ kubelet์ IP๋ฅผ Endpoints object๋ก ๋๊ธฐํํ๋ ๊ธฐ๋ฅ์ด ์์ผ๋ฉฐ,
nodes(kubelet)
์list
,watch
๊ทธ๋ฆฌ๊ณendpoints
๋ฅผ ์ํดcreate
,update
์ ์ ๊ทผ ๊ถํ ํ์ฉ์ด ํ์ํฉ๋๋ค. [๋งํฌ]
Prometheus ์๋ฒ๋ target์ scrapeํ๊ณ Alertmanager ๊ฒ์์ ์ํด Kubernetes API์ ์ ๊ทผ ํ๊ฒ๋๋ฉฐ, ํด๋น ๋ฆฌ์์ค์ ์ ๊ทผ์ ํ๊ธฐ ์ํด์ ServiceAccount๊ฐ ํ์ํ๊ณ ClusterRole์ ์์ฑํ์ฌ binding ํด์ฃผ์ด์ผ ํฉ๋๋ค.
๋ํ, Prometheus๋ Kubernetes API์์ object๋ฅผ ์์ ํ์ง ์๊ณ ์ฝ๊ธฐ๋ง ํ๊ธฐ ๋๋ฌธ์ get
, list
๊ทธ๋ฆฌ๊ณ watch
์์
์ ๊ถํ๋ง ํ์ํ๋ฉฐ, Kubernetes API์์ ์งํ(metric)๋ฅผ ๊ฐ์ ธ์ค๋๋ฐ๋ก ์ฌ์ฉํ ์ ์์ผ๋ฏ๋ก ์งํ๋ฅผ ์คํฌ๋ฉํ๊ธฐ ์ํด์๋ /metrics/
Endpoint์ ๋ํ ์ ๊ทผํ์ฉ๋ ํ์ํฉ๋๋ค.
Prometheus ๋ฐฐํฌ์ ์์ ๊ถํ ํ ๋น์ ์ํด ClusterRole
์ ์์ฑ ํฉ๋๋ค.
$ cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
release: prometheus-operator
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
EOF
ClusterRoleBinding
์ ์์ฑ ํฉ๋๋ค.
$ cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
release: prometheus-operator
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
EOF
ClusterRole
์ ClusterRoleBinding
๊ณผ ์ ๋ฌถ์ด์ ธ ์๋์ง ํ์ธํฉ๋๋ค.
$ kubectl describe clusterrolebinding prometheus
ame: prometheus
Labels: release=prometheus-operator
Annotations: <none>
Role:
Kind: ClusterRole
Name: prometheus
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount prometheus monitoring
monitoring ์ด๋ฆ์ ๋ค์์คํ์ด์ค๋ฅผ ์์ฑํฉ๋๋ค.
$ kubectl create namespace monitoring
namespace/monitoring created
prometheus ์ด๋ฆ์ ServiceAccount
๋ฅผ ์์ ์์ฑํ monitoring ๋ค์์คํ์ด์ค ์์ ์์ฑํฉ๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
release: prometheus-operator
name: prometheus
namespace: monitoring
EOF
Step 3.1. Prometheus
Prometheus
CR์ ์ ์ํ์ฌ Prometheus ์๋ฒ๋ฅผ ๋ฐฐํฌ ํ๊ฒ ์ต๋๋ค.. spec
๊ฐ ํ๋์ ๋ํ ์์ธํ ์ค๋ช
์ Prometheus Operator API [๋งํฌ] ์ค๋ช
์๋ฅผ ์ฐธ์กฐ ํ์๊ธฐ ๋ฐ๋๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
release: prometheus-operator
name: prometheus
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-operated
namespace: monitoring
pathPrefix: /
port: http-web
enableAdminAPI: false
evaluationInterval: 30s
externalUrl: http://prometheus-service.monitoring:9090
image: quay.io/prometheus/prometheus:v2.36.0
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus-operator
portName: http-web
probeNamespaceSelector: {}
probeSelector:
matchLabels:
release: prometheus-operator
replicas: 2
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
release: prometheus-operator
scrapeInterval: 30s
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus-operator
shards: 1
version: v2.36.0
EOF
Prometheus Operator๋ Prometheus
CR์ด ์์ฑ ๋ ๊ฒ์ ๊ฐ์งํ๊ณ StatefulSets
๊ณผ Serivce
๋ฅผ ์์ฑ ํฉ๋๋ค.
$ kubectl get all,prometheus -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-prometheus-0 2/2 Running 0 70s
pod/prometheus-prometheus-1 1/2 Running 0 10s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-operated ClusterIP None <none> 9090/TCP 70s
NAME READY AGE
statefulset.apps/prometheus-prometheus 1/2 70s
NAME VERSION REPLICAS AGE
prometheus.monitoring.coreos.com/prometheus v2.36.0 2 71s
prometheus-operated serveice
๋ฆฌ์์ค๋ ํจ๊ป ์์ฑ์ด ๋๋ฉฐ Headless Service๋ผ๋ ๊ฒ์ ์ ์ ์์ต๋๋ค.
$ kubectl describe svc/prometheus-operated -n monitoring
Name: prometheus-operated
Namespace: monitoring
Labels: operated-prometheus=true
Annotations: <none>
Selector: app.kubernetes.io/name=prometheus
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: None
IPs: None
Port: http-web 9090/TCP
TargetPort: http-web/TCP
Endpoints: 172.16.10.109:9090,172.16.84.122:9090
Session Affinity: None
Events: <none>
๋ค์์ผ๋ก Pormetheus์ UI์ ์ ์ ํด ๋ณด๊ฒ ์ต๋๋ค. ๊ธฐ๋ณธ์ ์ผ๋ก ์์ฑ๋๋ prometheus-operated service
๋ Headless Service๋ก ํฌํธ ํฌ์๋ฉ์ ์ฌ์ฉํด์ ํด๋ฌ์คํฐ ๋ด์ ์๋ prometheu ์๋ฒ์ ์ ๊ทผํ ์ ์์ต๋๋ค.
$ kubectl port-forward svc/prometheus-operated 9090 -n monitoring
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
๋๋ ์ธ๋ถ์์ ์ ์ ํ ์ ์๋ Endpoint๋ก service
๋ฆฌ์์ค๋ฅผ ์ถ๊ฐ๋ก ์์ฑํ ์ ์์ต๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: Service
metadata:
labels:
release: prometheus-operator
name: prometheus-service
namespace: monitoring
spec:
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http-web
port: 9090
protocol: TCP
targetPort: 9090
selector:
app.kubernetes.io/name: prometheus
prometheus: prometheus
sessionAffinity: None
type: LoadBalancer
EOF
$ kubectl get svc/prometheus-service -n monitoring -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
prometheus-service LoadBalancer 10.101.188.215 192.168.1.182 9090:32663/TCP 13s app.kubernetes.io/name=prometheus,prometheus=prometheus
ํฌํธ ํฌ์๋ฉ์ ํ์
จ๋ค๋ฉด ์น๋ธ๋ผ์ฐ์ ์์ http://127.0.0.1:9090
์ผ๋ก ์ ์์ด ๊ฐ๋ฅํ๊ณ , ์ถ๊ฐ Service
๋ฆฌ์์ค๋ฅผ ์์ฑ ํ๋ค๋ฉด expose type์ ๋ฐ๋ผ ์์ฑ๋๋ Endpoint ์ฃผ์๋ก ์ ์ํ์๋ฉด ๋ฉ๋๋ค.
ํ์๋ MetalLB๋ฅผ ์ฌ์ฉํ๊ณ ์์ด LoadBalancer ์ ํ์ ์ฌ์ฉํ์ฌ ์์ฑ๋ Endpoint๋ก ์ ์ ํ์ต๋๋ค.
Step 3.2. ServiceMonitor
ServiceMonitor๋ Kubernetes ๋ด์์ Service ์ค๋ธ์ ํธ๋ก ๋ถํฐ ์งํ(metric)์ ์คํฌ๋ฉํ๋ ค๋ ์ ํ๋ฆฌ์ผ์ด์ ์ ์ ์ํ๋ ๋ฐ ์ฌ์ฉ๋๋ฉฐ, ์ปจํธ๋กค๋ฌ๋ ์ฐ๋ฆฌ๊ฐ ์ ์ํ ServiceMonitor๋ฅผ ์๋์ํค๊ณ ํ์ํ Prometheus ์๋ฒ์ ConfitMap ๊ตฌ์ฑ์ ์๋์ผ๋ก ๊ด๋ฆฌํด์ฃผ๋ Prometheus Operator์ CR(custom resource) ์ค์ ํ๋ ์ ๋๋ค.
ServiceMonitor
CR์ ์ฌ์ฉํ๋ฉด Prometheus์์ ๋ชจ๋ํฐ๋งํ ๋์(targe)์ ์ง์ ์์ ํ ํ์ ์์ด ์๋์ผ๋ก ๋์๋ค์ ๊ด๋ฆฌ ํ ์ ์์ต๋๋ค.
Prometheus์ Service
๋ก ๋ถํฐ ์งํ๋ฅผ ์คํฌ๋ฉํ ์ ์๋๋ก ServiceMonitor
CR์ ์์ฑํ๊ฒ ์ต๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
release: prometheus-operator
name: prometheus-servicemonitor
namespace: monitoring
spec:
endpoints:
- path: /metrics
port: http-web
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
release: prometheus-operator
EOF
$ kubectl get smon -n monitoring -o wide
NAME AGE
prometheus-servicemonitor 30s
ServiceMonitor์ ์ํด ์คํฌ๋ฉ๋๋ ๋์ ์งํ(metric)์ ์ค์ ์ด ์ ์ ์ฉ ๋์๊ณ ์ด๋ค ๊ตฌ์ฑ ์ค์ ์ด ์ ์ฉ ๋์๋์ง ํ์ธํด๋ณด๊ฒ ์ต๋๋ค.
<scrape_config>
๊ตฌ์ฑ ์ค์ ์ ๋ํ ์์ธํ ์ค์ก์ [๋งํฌ]์์ ํ์ธํ์ธ์.
$ kubectl exec -it prometheus-prometheus-0 -n monitoring -- cat /etc/prometheus/config_out/prometheus.env.yaml
global:
evaluation_interval: 30s
scrape_interval: 30s
external_labels:
prometheus: monitoring/prometheus
prometheus_replica: prometheus-prometheus-0
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-rulefiles-0/*.yaml
scrape_configs:
- job_name: serviceMonitor/monitoring/prometheus-servicemonitor/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
metrics_path: /metrics
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- action: keep
source_labels:
- __meta_kubernetes_service_label_release
- __meta_kubernetes_service_labelpresent_release
regex: (prometheus-operator);true
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: http-web
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Node;(.*)
replacement: ${1}
target_label: node
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod;(.*)
replacement: ${1}
target_label: pod
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- source_labels:
- __meta_kubernetes_service_name
target_label: job
replacement: ${1}
- target_label: endpoint
replacement: http-web
- source_labels:
- __address__
target_label: __tmp_hash
modulus: 1
action: hashmod
- source_labels:
- __tmp_hash
regex: 0
action: keep
metric_relabel_configs: []
alerting:
alert_relabel_configs:
- action: labeldrop
regex: prometheus_replica
alertmanagers:
- path_prefix: /
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
api_version: v2
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_name
regex: alertmanager
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: http-web
Prometheus Configurations์ scrape_configs
์ค์ ์ด ServiceMonitor
CR์ ์์ฑํ๋ฉด์ Prometheus Operator์ ์ํด ์๋ ๊ตฌ์ฑ ๋๊ฒ์ ์ ์ ์์ต๋๋ค.
๋ด์ฉ๋ง ๋ณด์๋ ์๋นํ ๋ณต์กํ ์ค์ ์ผ๋ก ๊ฐ๋ฐ์/๊ด๋ฆฌ์๊ฐ ์ง์ ์ง์ ์ค์ ์ ํ๊ธฐ์๋ ์ฝ์ง ์์ต๋๋ค. ๊ทธ๋ฌ๋ Prometheus Operator๋ฅผ ์ฌ์ฉํจ์ผ๋ก์ ๊ฐ๋จํ๊ฒ scrape_config
๊ตฌ์ฑ์ ํ ์ ์์์ต๋๋ค.
Prometheus Dashboard (UI)์์ Target ์ถ๊ฐ ๋์๊ณ ์งํ(metric)๋ค์ ์์ง๋๊ณ ์์์ ์ ์ ์์ต๋๋ค.
Step 3.3. PodMonitor
Service
๋ฆฌ์์ค๋ฅผ ์ฌ์ฉํ๋ ์ ํ๋ฆฌ์ผ์ด์
์ ServiceMonitor
CR์ ์ฌ์ฉํ๋ฉด ๋์ง๋ง ๊ทธ๋ ์ง ์๋ Pod๋ PodMonitor
CR์ ์ฌ์ฉํ์ฌ Pod๋ก ๋ถํฐ ์ง์ ์คํฌ๋ํ์ ํ ์ ์์ต๋๋ค.
๋๋ Service
๋ฆฌ์์ค๋ฅผ ์ฌ์ฉํ๊ณ ์์ง๋ง ์ง์ ์ ์ผ๋ก ์ฐ๊ฒฐํ์ง ์๋ Pod(sidecar container)๋ ๊ฐ๋ฅํฉ๋๋ค.. ์๋ฅผ ๋ค์ด Istio sidecars ๊ฐ์ ๊ฒ๋ค์ด ์๊ฒ ๋ค์.
kube-state-metrics Pod์์ ์ง์ ๋์์ ์คํฌ๋ํ ํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค. PodMonitor
CR์ ์์ฑํ๊ธฐ์ ์ kube-state-metrics์ ๋ฐฐํฌ ํฉ๋๋ค.
$ helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system
$ kubectl get all -n kube-system | egrep 'NAME|kube-state-metrics' | grep -v 'SELECTOR'
NAME READY STATUS RESTARTS AGE
pod/kube-state-metrics-77f54c6d8b-k56ht 1/1 Running 0 13m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-state-metrics ClusterIP 10.110.249.203 <none> 8080/TCP 13m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-state-metrics 1/1 1 1 13m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kube-state-metrics-77f54c6d8b 1 1 1 13m
์ด์ PodMonitor
CR์ ์์ฑํฉ๋๋ค. ์ด๋ label selector๋ฅผ ์ํด app.kubernetes.io/component: metrics์ app.kubernetes.io/instance: kube-state-metrics ์ฌ์ฉํ๊ฒ ๋๋๋ฐ label์ ๋ํ ์ ๋ณด๋ kube-state-metrics Pod๋ฅผ kubectl describe๋ก ํ์ธํ๋ฉด ์ ์ ์์ต๋๋ค.
$ cat << EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kube-state-metric-podmonitor
labels:
release: prometheus-operator
spec:
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app.kubernetes.io/component: metrics
app.kubernetes.io/instance: kube-state-metrics
podMetricsEndpoints:
- targetPort: 8080
EOF
$ kubectl get pmon -n monitoring
NAME AGE
kube-state-metric-podmonitor 40s
PodMonitor
๋ฅผ ์์ฑ ํ์์ง๋ง Prometheus ์๋ฒ๋ ์๋ฌด๋ฐ ๋ณํ๊ฐ ์์ ๊ฒ๋๋ค. ์๋ํ๋ฉด ์์์ Prometheus
CR์ ์ฌ์ฉํ์ฌ Prometheus ์๋ฒ๋ฅผ ๋ฐฐํฌ ํ์์ง๋ง PodMonitor
์ ๊ตฌ์ฑ ์ค์ ์ค podMonitorSelector.matchLabels
ํ๋์ ๊ฐ์ผ๋ก release: prometheus-operator
์ ์ ์ ํ์๊ธฐ ๋๋ฌธ์ PodMonitor
๋ ์ฃผ์ด์ง ๊ฐ์ Label์ ์ฐ์ ์ฐพ๊ธฐ ๋๋ฌธ์
๋๋ค. ๊ทธ๋ ์ง๋ง kube-state-metric Pod์ Label์๋ ํด๋น ๊ฐ์ด ์๊ธฐ๋๋ฌธ์ Pod๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค. ๊ฐ๋จํ podMonitorSelector ํ๋์ ๊ฐ์ {}
์ผ๋ก ๋ณ๊ฒฝํ์ฌ label selector์ ๋ํ ์กฐ๊ฑด์ ์ง์ ํ์ง ์์ผ๋ฉด ๋ฉ๋๋ค. kubectl edit ๋ช
๋ น์ ์ฌ์ฉํ์ฌ ํ๋ ๊ฐ์ ๋ณ๊ฒฝํด์ฃผ์ธ์.
$ kubectl get prom/prometheus -n monitoring -o yaml | egrep ' podMonitorNamespaceSelector| podMonitorSelector'
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
Prometheus ์๋ฒ์ Pod๋ฅผ ๋ณด๋ฉด PodMonitor
๋ก ์ถ๊ฐ๋ target(๋์)์ ๊ตฌ์ฑ ์ ๋ณด๊ฐ Prometheus ์๋ฒ์ sidecar config-reloader ์ปจํ
์ด๋์ ์ํด ๋์ ์ผ๋ก ๋ณ๊ฒฝ ๋์์ต๋๋ค.
$ kubectl exec -it prometheus-prometheus-0 -n monitoring -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep -A47 -i "podMonitor/monitoring/kube-state-metric-podmonitor"
- job_name: podMonitor/monitoring/kube-state-metric-podmonitor/0
honor_labels: false
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kube-system
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- action: keep
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_component
- __meta_kubernetes_pod_labelpresent_app_kubernetes_io_component
regex: (metrics);true
- action: keep
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- __meta_kubernetes_pod_labelpresent_app_kubernetes_io_instance
regex: (kube-state-metrics);true
- action: keep
source_labels:
- __meta_kubernetes_pod_container_port_number
regex: "8080"
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- target_label: job
replacement: monitoring/kube-state-metric-podmonitor
- target_label: endpoint
replacement: "8080"
- source_labels:
- __address__
target_label: __tmp_hash
modulus: 1
action: hashmod
- source_labels:
- __tmp_hash
regex: 0
action: keep
metric_relabel_configs: []
ServiceMonitor์ ProdMonitor์ ์์ฑ์ผ๋ก configuration์ ๋ด์ฉ์ด ์ถ๊ฐ๋๋ฉด Prometheus ์๋ฒ์ sidecar container์ธ prometheus-config-reloader
์ ์ํด Pormetheus ์๋ฒ์ ์ค์ ์ reload ํ์ฌ ์ ์ฉ ํฉ๋๋ค.
$ kubectl describe pod/prometheus-prometheus-0 -n monitoring
.
.
.
config-reloader:
Container ID: containerd://a5c6c9003f13d9bd8eaa6336f6f01d05a6ae14f53a9c4eb093dd27df98e6c805
Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.57.0
Image ID: quay.io/prometheus-operator/prometheus-config-reloader@sha256:8c45787645d17c51acb44aa0386af3aa5d8bfd7bddd8d57dd041878b9494c5ff
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://localhost:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-prometheus-rulefiles-0
State: Running
Started: Sat, 11 Jun 2022 01:02:51 +0900
.
.
.
Prometheus Dashboard (UI)์์ PodMonitor์ target์ ํ์ธ ํ ์ ์์ต๋๋ค.
Step 3.4. Alertmanager
Alertmanager
CR์ ์ ์ํ์ฌ Alertmanager ์๋ฒ๋ฅผ ๋ฐฐํฌ ํฉ๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: alertmanager
namespace: monitoring
spec:
alertmanagerConfigSelector: {}
alertmanagerConfigNamespaceSelector: {}
externalUrl: http://alertmanager-service.monitoring:9093
image: quay.io/prometheus/alertmanager:v0.24.0
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
portName: "http-web"
replicas: 1
routePrefix: /
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus
version: v0.24.0
EOF
์ถ๊ฐ๋ก Alertmanager ์๋ฒ์ UI์ ์ ์ ํ๊ธฐ ์ํ service
๋ฆฌ์์ค๋ ์์ฑ ํฉ๋๋ค. ๋ง์ฝ, ๊ธฐ๋ณธ ์์ฑ๋๋ alertmanager-operated service
๋ฅผ ํฌํธ ํฌ์๋ฉ ํ์ฌ UI ์ ์์ ํ๋ค๋ฉด ์ด ๋จ๊ณ๋ ์๋ตํ์
๋ ๋ฉ๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: Service
metadata:
labels:
release: prometheus-operator
name: alertmanager-service
namespace: monitoring
spec:
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http-web
port: 9093
protocol: TCP
targetPort: 9093
selector:
app.kubernetes.io/name: alertmanager
alertmanager: alertmanager
sessionAffinity: None
type: LoadBalancer
EOF
์ด์ ์์ฑ๋ ๋ฆฌ์์ค๋ค์ ํ์ธํด๋ณด๊ฒ ์ต๋๋ค. ๋ค์๊ณผ ๊ฐ์ด Alertmanager ์๋ฒ Pod, Service, StatefulSets ๋ฆฌ์์ค๋ค์ด ์์ฑ ๋์๋ค์.
$ kubectl get all,am -n monitoring | egrep '^NAME|alertmanager'
NAME READY STATUS RESTARTS AGE
pod/alertmanager-alertmanager-0 2/2 Running 0 8m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 8m
service/alertmanager-service LoadBalancer 10.97.21.91 192.168.1.183 9093:32638/TCP 8m
NAME READY AGE
statefulset.apps/alertmanager-alertmanager 1/1 8m
NAME VERSION REPLICAS AGE
alertmanager.monitoring.coreos.com/alertmanager v0.24.0 1 8m
์ด์ Dashboard(UI)์ ์ ์ ํด๋ณผ๊ป์. ์์ ์ถ๊ฐ๋ก ์์ฑํ Service์ Endpoint๋ฅผ ์ฌ์ฉํ์๊ฑฐ๋ ํฌํธ ํฌ์๋ฉ์ ์ฌ์ฉํ์ฌ ์ ์ํ์๋ฉด ๋ฉ๋๋ค. http://127,0.0.1:9093
$ kubectl port-forward svc/alertmanager-operated 9093 -n monitoring
Forwarding from 127.0.0.1:9093 -> 9093
Forwarding from [::1]:9093 -> 9093
Alertmanager
CR์ ์ฌ์ฉํ์ฌ Alertmanager ์๋ฒ๋ฅผ ๋ฐฐํฌ ํ์ง๋ง ์๋ฌด๋ฐ ๊ฒฝ๋ณด(alert)๊ฐ ๋ฐ์ํ์ง ์์ ๊ฒ๋๋ค.
Alertmanager์ ์ฃผ๋ ๊ธฐ๋ฅ์ ๊ณต์ ํํ์ด์ง[๋งํฌ]์์ ๋ณผ ์ ์๋ฏ์ด ๊ฒฝ๋ณด(alert)๋ฅผ ํต๋ณด(notification)ํ ๋ฟ ์ค์ ๊ฒฝ๋ณด๋ฅผ ํต๋ณดํ๋ ๊ฒ์ Prometheus ์๋ฒ๊ฐ ํ๊ธฐ ๋๋ฌธ์ ๊ฒฝ๋ณด(alert)๋ฅผ ๋ฐ์กํ๊ธฐ ์ํด์๋ Alert rules์ Prometheus ์๋ฒ์ ๋ฑ๋ก ํด์ฃผ์ด์ผ ํ๋ฉฐ PrometheusRule
CR ์ ์ํ์ฌ Alert rules์ ๋ฑ๋ก ํ ์ ์์ต๋๋ค.
PrometheusRule
์ ๋ํด์๋ ๋ค์ ๋จ๊ณ์์ ์์๋ณด๊ฒ ์ต๋๋ค.
Step 3.5. PrometheusRule
PrometheusRule CR์ ํ๋ ์ด์์ RuleGroup ์ ์๋ฅผ ์ง์ํ๋ฉฐ, ์ด๋ฌํ ๊ทธ๋ฃน์ Prometheus์์ ์ง์ํ๋ ๋ ๊ฐ์ง ์ ํ์ ๊ท์น(recoding๊ณผ alerting) ์ค ํ๋๋ฅผ ์ ์ธ์ ์ผ๋ก ์ ์ํ ์ ์์ต๋๋ค.
Alert rule์ ํ๋ ๋ฑ๋ก์ ํด๋ณผํ ๋ฐ prometheus-node-exporter์ ์งํ(metric) ์ค non-root users ํ์ผ์์คํ ๊ณต๊ฐ์ด 12GB์ด์ ์ฌ์ฉ ํ ์ ์๋ ๋ ธ๋์ ๋ํ ๊ฒฝ๋ณด(alert)๋ฅผ ๋ฐ์ํ๋ ๊ท์น์ ์ ์ ํด๋ณด๊ฒ ์ต๋๋ค.
์ด๋ฅผ ์ํด prometheus-node-exporter๊ฐ ๋ฐฐํฌ ๋์ด ์์ด์ผ ํ๋ฉฐ, ์ฟ ๋ฒ๋คํฐ์ค ํด๋ฌ์คํฐ์ ๋ฐฐํฌ ๋์ด ์์ง ์๋ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ๋ฐฐํฌํ์๊ธฐ ๋ฐ๋๋๋ค.
$ helm install node-exporter prometheus-community/prometheus-node-exporter -n kube-system
๋ํ, node-exporter์ ์งํ๋ฅผ ์์งํ๊ธฐ ์ํด ServiceMonitor
CR์ ์์ฑํฉ๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"release":"prometheus-operator"},"name":"prometheus-node-exporter-servicemonitor","namespace":"monitoring"},"spec":{"endpoints":[{"path":"/metrics","port":"metrics"}],"namespaceSelector":{"matchNames":["kube-system"]},"selector":{"matchLabels":{"app":"prometheus-node-exporter","release":"node-exporter"}}}}
creationTimestamp: "2022-06-11T16:22:44Z"
generation: 1
labels:
release: prometheus-operator
name: prometheus-node-exporter-servicemonitor
namespace: monitoring
resourceVersion: "25028349"
uid: b5c3c032-0fe4-4263-a0fe-7bc20d8e8116
spec:
endpoints:
- path: /metrics
port: metrics
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: prometheus-node-exporter
release: node-exporter
EOF
์ด์ PrometheusRule
CR์ ์ฌ์ฉํ์ฌ Alert rule์ ์์ฑ ํฉ๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
release: prometheus-operator
name: prometheus-example-rules
spec:
groups:
- name: alerting_filesystem
rules:
- alert: node_filesystem_avail_bytes
expr: node_filesystem_avail_bytes > 12361482240
for: 10s
labels:
severity: "critical"
EOF
Prometheus ์๋ฒ UI์ Rules ์ํ๋ฅผ ๋ณด๋ฉด alerting_filesystem ๊ทธ๋ฃน ์ด๋ฆ์ผ๋ก alert rule์ด ๋ฑ๋ก ๋์์ผ๋ฉฐ,
Alerts tab์ ๋ณด๋ฉด node_filesystem_avail_bytes ์งํ์ ๋ํ alert๊ฐ ๋ฐ์ํ๊ณ ์๋ ๊ฒ์ ๋ณผ ์ ์์ต๋๋ค.
Step 3.6. AlertmanagerConfig
AlertmanagerConfig
CR์ Alertmanager์ Alert rule์ ๋ฐํ์ผ๋ก Receiver๋ก Alert๋ฅผ ๋ผ์ฐํ
(route) ํ์ฌ OpsGenie, PagerDuty, Slack, webhook, Email, VictorOps, Pushover, SNS ๊ทธ๋ฆฌ๊ณ Telegram์ผ๋ก ๊ฒฝ๊ณ (alert)๋ฅผ ํต๋ณด(notification)ํ๊ณ ๊ธ์ง ๊ท์น(inhibit rule)์ ์ค์ ์ custom resource๋ก ์ ์ํ์ฌ ๊ตฌ์ฑํ ์ ์์ต๋๋ค.
Alert๋ฅผ Slack์ ํต๋ณด(notification)ํ๋ ๋ฐฉ๋ฒ์ ๋ํด ์ค๋ช ํ๊ฒ ์ต๋๋ค. Slack ์ฑ๋์ alert๋ฅผ ๋ณด๋ด๊ธฐ ์ํด Incoming webhooks for Slack์ ์ฑ๋์ ๋ฑ๋ก ํฉ๋๋ค.
alert๋ฅผ ๋ฉ์ธ์ง๋ฅผ ๋ณด๋ผ Slack ์ํฌ์คํ์ด์ค์์ Incoming webhooks Apps์ ๋ฑ๋ก ํฉ๋๋ค.
๊ฒฝ๋ณด(alert)๋ฅผ ๋ณด๋ผ ์ฑ๋์ ์ ํํ๊ณ Incoming Webhooks์ ๋ฑ๋ก ํฉ๋๋ค.
Webhook URL์๋ nofication์ ๋ณด๋ผ Slack ์ฑ๋๊ณผ access token์ด ํฌํจ ๋์ด ์์ผ๋ ์ ๋ณด๊ดํฉ๋๋ค.
์ด์ AlertmanagerConfig
CR์ ์์ฑ ํด๋ณด๊ฒ ์ต๋๋ค.
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
labels:
release: prometheus-operator
name: slack-alertmanagerconfig
spec:
receivers:
- name: slack-nofications
slackConfigs:
- apiURL:
key: token
name: slack-webhook-url
optional: false
channel: '#test'
route:
groupBy:
- severity
groupInterval: 5m
groupWait: 30s
receiver: slack-nofications
repeatInterval: 5m
EOF
Prometheus Operator๊ฐ Slack ์ฑ๋์ ์ ๊ทผ ํ๊ธฐ ์ํ Slack ์ฑ๋ URL๊ณผ ์ ๊ทผ ์ธ์ฆ ์ ๋ณด๋ฅผ Kubernetes Secret์ ์์ฑํ๋ฉฐ ์ฐธ์กฐํ๋ฉฐ, ์ด Secret์ AlertmanagerConfig CR object์ ๋์ผํ ๋ค์์คํ์ด์ค์ ์์ด์ผ ํฉ๋๋ค.
Slack ์ ์ ์ ๋ณด๋ฅผ ์ ์ํ Secret์ ์์ฑ์ ์์ ์์์ ์ธ๊ธํ Incoming Webhook URL์ Secret์ ๋ฑ๋กํ๊ธฐ ์ํด base64๋ก encodingํ๊ณ Secret์ ์์ฑํฉ๋๋ค.
$ echo -e "https://hooks.slack.com/services/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" | base64
aHR0cHM6Ly9ob29rcy5zbGFjay5jb20vc2VydmljZXMvVFA1TjNHSzQzL0IwM0szTlZTUE1ML3JHVWRscktJQm1SWVliREVvYVJLR3FCZgo=
$ cat <<EOF | kubectl apply -n monitoring -f -
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: slack-webhook-url
data:
token: aHR0cHM6Ly9ob29rcy5zbGFjay5jb20vc2VydmljZXMvVFA1TjNHSzQzL0IwM0szTlZTUE1ML3JHVWRscktJQm1SWVliREVvYVJLR3FCZgo=
EOF
Alertmanager Dashboard UI์ ์ ์ํ๋ฉด Alert๊ฐ 9๊ฐ trigger ๋์๋ค์.
๊ทธ๋ฆฌ๊ณ Slack ์ฑ๋๋ก alert๊ฐ notification ๋๊ฒ๋ ํ์ธํ ์ ์์ต๋๋ค.
์ถ๊ฐ๋ก Alertmanager CR ์์ฑ์ Alertmanager configuration์ ๊ฐ์ ๋ค์์คํ์ด์ค์ Kubernetes Secret์ผ๋ก ํ๋ก๋น์ ๋ ๋ฉ๋๋ค.
$ kubectl describe secret/alertmanager-alertmanager-generated -n monitoring
Name: alertmanager-alertmanager-generated
Namespace: monitoring
Labels: managed-by=prometheus-operator
Annotations: <none>
Type: Opaque
Data
====
alertmanager.yaml: 495 bytes
Secret์ ๋ด์ฉ์ ๋ณด๋ฉด AlertmanagerConfiguration CR๋ก ์ ์ํ configuration์ด ์ค์ ๋์ด ์์ต๋๋ค.
$ kubectl get secret/alertmanager-alertmanager-generated -n monitoring -o jsonpath="{.data.alertmanager\.yaml}" | base64 -d
route:
receiver: "null"
routes:
- receiver: monitoring/slack-alertmanagerconfig/slack-nofications
group_by:
- severity
matchers:
- namespace="monitoring"
continue: true
group_wait: 30s
group_interval: 5m
repeat_interval: 5m
receivers:
- name: "null"
- name: monitoring/slack-alertmanagerconfig/slack-nofications
slack_configs:
- api_url: https://hooks.slack.com/services/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
channel: '#test'
templates: []
CRD ๊ตฌ์ฑ ํ์ผ์ ์ ํจ์ฑ ๊ฒ์ฌ๋ฅผ ์๋ํ ํ ์ ์์ต๋๋ค.
์์ธํ ์ฌ์ฉ ๋ฐฉ๋ฒ์ [๋งํฌ]๋ฅผ ์ฐธ์กฐํ์ธ์.
- po-lint source : main.go
po-lint๋ prometheus-operator์ api/monitoring/v1์ types์ ์ฌ์ฉํ์ฌ ๋ฌธ๋ฒ์ ์ ํจ์ฑ ๊ฒ์ฌ๋ฅผ ํฉ๋๋ค.
- https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/apis/monitoring/v1/types.go
- https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/apis/monitoring/v1/thanos_types.go
-
ServiceMonitor for Alertmanager
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: release: prometheus-operator name: alertmanager-servicemonitor spec: endpoints: - path: /metrics port: http-web namespaceSelector: matchNames: - monitoring selector: matchLabels: release: prometheus-operator
-
ServiceMonitor for prometheus-node-exportor
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: release: prometheus-operator name: prometheus-node-exporter-servicemonitor spec: endpoints: - path: /metrics port: metrics namespaceSelector: matchNames: - kube-system selector: matchLabels: app: prometheus-node-exporter release: node-exporter
-
PorMonitor for kube-state-metric
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: labels: release: prometheus-operator name: kube-state-metric-podmonitor spec: namespaceSelector: matchNames: - kube-system podMetricsEndpoints: - targetPort: 8080 selector: matchLabels: app.kubernetes.io/component: metrics app.kubernetes.io/instance: kube-state-metrics
Website
GitHub
- https://github.com/prometheus-operator/prometheus-operator
- https://github.com/helm/charts/blob/master/stable/prometheus-operator/README.md Helm Charts Configuration์ ์ค๋ช ์ด ์์ธํจ
- kube-prometheus
- kube-prometheus stack
Operator Hub
GO Packages
- https://www.infracloud.io/blogs/prometheus-operator-helm-guide/
- https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#how-to-create-objects
- https://kubernetes.io/docs/reference/using-api/server-side-apply/
- https://medium.com/pareture/kubectl-install-crd-failed-annotations-too-long-2ebc91b40c7d
- https://grafana.com/docs/grafana-cloud/kubernetes/prometheus/prometheus_operator/
- https://alibaba-cloud.medium.com/kubernetes-cluster-monitoring-using-prometheusv-4fc77c971330
- https://operatorframework.io/operator-capabilities/
- https://www.nicktriller.com/blog/managing-prometheus-on-kubernetes-with-prometheus-operator/
- https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/
- https://gurumee92.tistory.com/category/Monitoring/Prometheus
- https://www.youtube.com/watch?v=Uph_Say4D3M
END