prometheus kubernetes - ghdrako/doc_snipets GitHub Wiki
-
https://prometheus.io/docs/prometheus/1.8/configuration/configuration/
-
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
-
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
-
https://kevinfeng.github.io/post/kubernetes-sd-in-prometheus/ - przeczytac !!!
-
https://rtfm.co.ua/en/kubernetes-monitoring-with-prometheus-exporters-a-service-discovery-and-its-roles/#pod_role - przeczytac !!!
Prometheus and Alert Manager components, e.g., Prometheus Server, will be deployed as Kubernetes objects (e.g., pods, services, etc.) and will also be created under the monitoring namespace.
$kubectl create namespace monitoring
create a cluster role and binding. Kubernetes resources access is regulated via role-based access control (RBAC). RBAC uses the rbac.authorization.k8s.io API to manage authorization. In the RBAC API, a cluster role contains rules that represent a set of permissions on the Kubernetes cluster. A cluster role will be used to provide access to the following:
- Non-resource endpoints (like /healthz)
- Cluster-scoped resources (like nodes)
- Namespaced resources (like pods) across all namespaces (needed to run
kubectl get pods --all-namespaces
, for example) Cluster role binding grants the permissions defined in a cluster role to a user or set of users. It holds a list of subjects (users, groups, or service accounts) and a reference to the role being granted. Permissions can be granted within a namespace cluster-wide using a cluster role binding.
$kubectl create -f clusterRole.yaml
A config map will be used to decouple any configuration artifacts from image content and alerting rules, which will be mounted to the Prometheus container in the /etc/prometheus as prometheus.yaml and prometheus.rules files.
kubectl create -f config-map.yaml
$kubectl apply -f prometheus-deployment.yaml -n monitoring
global:
scrape_interval: 15s
external_labels:
monitor: 'eks-dev-monitor'
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'redis'
static_configs:
- targets: ['redis:9121']
The kubernetes_sd_configs
configuration describes how to retrieve the list of targets to scrape using the Kubernetes REST API. Kubernetes has a component called API server that exposes a REST API that lets end-users, different parts of your cluster, and external components communicate with one another.
To discover targets, multiple roles can be chosen.
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
The config means pull metrics from https://${POD_IP}:${POD_PORT}/metrics
of all the pods in kubernetes, where POD_IP and POD_PORT can be found
in pod spec. This works in theory but practically, we don’t want to scrape from all the pods and sometimes it provides metrics in specific
port and path. How do we implement a switch for scraping and specify
scrape address for each pod? The answer is relabelling
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (.+)
First of all, we can use it by special pod annotations:
1 annotations:
2 "prometheus.io/scrape":"true", 3
3 "prometheus.io/path":"/mypath/metrics",
4 "prometheus.io/port":"8080",
5 "prometheus.io/scheme":"http"
Labels begin with __
are special labels used inside prometheus. The scrape address will be represents by ${__scheme__}://${__address__}/${__metrics_path__}
which is http://${POD_IP}:8080/mypath/metrics</
. Labels begin with __meta_kubernetes_pod_annotation
represent pod annotations, and relabel_configs works one by one:
- Continue if
prometheus.io/scrape
is set to true, else ignore the pod; - Replace value of
__metrics_path__
by value ofprometheus.io/path
; - Change port in
__address__
by value ofprometheus.io/port
; - Replace value of
__scheme__
by value ofprometheus.io/scheme
;
Explanation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
Scrapowane sa tylko nody ktore maja taka anotacje
...
apiVersion: ...
kind: Deployment
...
template:
metadata:
annotations:
prometheus.io/scrape: "true"
Other example
annotations:
prometheus.io/scrape: "true"
prometheus.io.scheme: "https"
prometheus.io/path: "/metrics"
prometheus.io/port: "9191"
This would mean that the corresponding Kubernetes object will be scraped thanks to the annotation prometheus.io/scrape value of true, that the metrics can be reached at port 9191 at path /metrics. It is worth noticing that the name of the annotation can be anything you want. To showcase this, we’ll then use the followings:
annotations:
se7entyse7en.prometheus/scrape: "true"
se7entyse7en.prometheus/scheme: "https"
se7entyse7en.prometheus/path: "/metrics"
se7entyse7en.prometheus/port: "9191"
As previously mentioned, each target that is scraped comes with some default labels depending on the role and on the type of target. The relabel_config provides the ability to rewrite the set of labels of a target before it gets scraped.
What does this mean? Let’s say for example that thanks to our kubernetes-service-endpoints scraping job configured with role: endpoints Prometheus discovers a Service object by using the Kubernetes API. For each target, the list of rules in relabel_config is applied to that target.
Let’s consider a service as follows:
apiVersion: v1
kind: Service
metadata:
name: app
annotations:
se7entyse7en.prometheus/scrape: "true"
se7entyse7en.prometheus/scheme: "https"
se7entyse7en.prometheus/path: "/metrics"
se7entyse7en.prometheus/port: "9191"
spec:
selector:
app: app
ports:
- port: 9191
when applying the relabelling rules, Prometheus has just discovered the target, but it didn’t yet scrape the metrics. Indeed, we’ll now see that the way the metrics are going to be scraped, will depend on the relabelling rules.
The first rule controls whether the target has to be scraped at all or not
- source_labels: [__meta_kubernetes_service_annotation_se7entyse7en_prometheus_scrape]
action: keep
regex: true
As you can see the source_labels is a list of labels. This list of labels is first concatenated by using a separator that can be configured and that is ;
by default. Given that in this rule there’s only one item, there’s no concatenation happening.
service there’s a meta label called __meta_kubernetes_service_annotation_<annotationname>
that maps to the corresponding (slugified) annotation in the service object. In our example then, the concatenated source_labels is simply equal to the string true thanks to se7entyse7en.prometheus/scrape: "true"
.
The action: keep
makes Prometheus ignore all the targets whose concatenated source_labels don’t match the regex that in our case is equal to true. Since according to our example the regex true matches the value true, the target is not ignored. Don't confuse true with being a boolean here, you can even decide to use a regex that matches an annotation value of "yes, please scrape me".
# Drop finished jobs
- action: drop
regex: Succeeded|Failed
source_labels:
- __meta_kubernetes_pod_phase
- source_labels: [__meta_kubernetes_service_annotation_se7entyse7en_prometheus_scheme]
action: replace
target_label: __scheme__
regex: (https?)
According to previous logic, the concatenated source_labels is equal to https thanks to the se7entyse7en.prometheus/scheme: "https" annotation. The action: replace replaces the label in target_label with the concatenated source_labels if the concatenated source_labels matches the regex. In our case, the regex (https?) matches the concatenated source_labels that is https. The outcome is that the label scheme now has the value of https.
But what is the label __scheme__
? The label __scheme__
is a special one that indicates to Prometheus what is the URL that should be used to scrape the target's metrics. After the relabelling, the target's metrics will be scraped at __scheme__://__address____metrics_path__
where __address__
and __metrics_path__
are two other special labels similarly to __scheme__
. The next rules will indeed deal with these.
The third rule controls what is the path that exposes the metrics:
- source_labels: [__meta_kubernetes_service_annotation_se7entyse7en_prometheus_path]
action: replace
target_label: __metrics_path__
regex: (.+)
This rule works exactly like the previous one, the only difference is the regex. With this rule, we replace __metrics_path__
with whatever is in our custom Kubernetes annotation. In our case, it will be then equal to /metrics thanks to the se7entyse7en.prometheus/path: "/metrics"
annotation.
The fourth rule finally controls the value of address that is the missing part to have the final URL to scrape:
- source_labels: [__address__, __meta_kubernetes_service_annotation_se7entyse7en_prometheus_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
This rule is very similar to the previous ones, the differences are that it also has a replacement key and that we have multiple source_labels. Let’s start with the source_labels. As previously explained, the values are concatenated and the separator ;
is used. By default the label __address__
has the form <host>:<port>
and is the address that Prometheus used to discover the target. I don’t know exactly what port is used for that purpose, but it's not important for our goal, so let’s just assume that is 1234 and that the host is something like se7entyse7en_app_service. Thanks to the se7entyse7en.prometheus/port: "9191"
annotation, we obtain that the concatenated source_labels
is equal to: se7entyse7en_app_service:1234;9191
. From this string, we want to keep the host but use the port coming from the annotation. The regex and the replacement configurations are exactly meant for this: the regex uses 2 capturing groups, one for the host, and one for the port, and the replacement is set up in a way so that the output is $1:$2
that corresponds to the captured host and port separated by :
.
So now we finally have __scheme__
, __address__
and __metrics_path__
! We said that the target URL that will be used for scraping the metrics is given by:
__scheme__://__address____metrics_path__
If we replace each part we have:
https://se7entyse7en_app_service:9191/metrics
The remaining rules are simply adding some default labels to the metrics when they'll be stored:
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod
In this case, we're adding the labels kubernetes_namespace, kubernetes_service, and kubernetes_pod from the corresponding meta labels.
To recap, these are the steps to automatically discover the targets to scrape with the configured labels:
- Prometheus discovers the targets using the Kubernetes API according to the kubernetes_sd_config configuration,
- Relabelling is applied according to relabel_config,
- Targets are scraped according to special labels
__address__
, scheme, metrics_path, - Metrics are stored with the labels according to relabel_config and all the labels starting with __ are stripped
For other resources like nodes, services or endpoints, it works the same way as pods. The details of configuration can be found at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config.
Kubernetes SD in Prometheus has a collection of so-called “roles”, which defines how to collect and display metrics. Each Role has its own set of labels which you already know from the official documentation. The ones that interest you are:
- service: will find and return each Service and its Port
- pod: will find pods and return its containers as targets to grab metrics from
- endpoints: will create targets from each Endpoint for each Service found in a cluster
Some examples can be found below:
The difference between roles is not only in how targets are discovered but also in which labels are automatically attached to those targets. For example, with role node each target has a label called __meta_kubernetes_node_name
that contains the name of the node object, which is not available with role pod. With role pod each target has a label called __meta_kubernetes_pod_name
that contains the name of the pod object, which is not available with role node.
The nice thing about the role endpoints is that Prometheus provides different labels depending on the target: if it’s a pod, then the labels provided are those of the role pod, if it’s a service, then those of the role service. In addition, there’s also a set of extra labels that are available independently from the target.
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# for all or only some services.
- job_name: "kubernetes-services"
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
# Example relabel to probe only some services that have "example.io/should_be_probed = true" annotation
# - source_labels: [__meta_kubernetes_service_annotation_example_io_should_be_probed]
# action: keep
# regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: service
https://docs.seldon.io/projects/seldon-core/en/latest/analytics/analytics.html
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# Drop finished jobs
- action: drop
regex: Succeeded|Failed
source_labels:
- __meta_kubernetes_pod_phase
- job_name: seldon-models
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
action: keep
regex: true
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_label_seldon_io_model
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: metrics(-.*)?
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
action: replace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_pod_name
action: replace
target_label: kubernetes_pod_name
Labels are sets of key-value pairs that allow us to characterize and organize what’s actually being measured in a Prometheus metric.
For example, when measuring HTTP latency, we might use labels to record the HTTP method and status returned, which endpoint was called, and which server was responsible for the request.
Prometheus also provides some internal labels for us. These begin with two underscores and are removed after all relabeling steps are applied; that means they will not be available unless we explicitly configure them to.
Some of these special labels available to us are
Label name | Description |
---|---|
name | The scraped metric’s name |
address | host:port of the scrape target |
scheme | URI scheme of the scrape target |
metrics_path | Metrics endpoint of the scrape target |
_param | is the value of the first URL parameter passed to the target |
scrape_interval | The target’s scrape interval (experimental) |
scrape_timeout | The target’s timeout (experimental) |
_meta | Special labels set set by the Service Discovery mechanism |
__tmp | Special prefix used to temporarily store label values before discarding them |
- keep/drop The keep and drop actions allow us to filter out targets and metrics based on whether our label values match the provided regex.
- labelkeep/labeldrop The labelkeep and labeldrop actions allow for filtering the label set itself.
- replace Replace is the default action for a relabeling rule if we haven’t specified one; it allows us to overwrite the value of a single label by the contents of the replacement field.
- hashmod The hashmod action provides a mechanism for horizontally scaling Prometheus.
- labelmap The labelmap action is used to map one or more label pairs to different label names.