prometheus kubernetes - ghdrako/doc_snipets GitHub Wiki

Install on k8

Create Namespace

Prometheus and Alert Manager components, e.g., Prometheus Server, will be deployed as Kubernetes objects (e.g., pods, services, etc.) and will also be created under the monitoring namespace.

$kubectl create namespace monitoring

ClusterRole and ClusterRoleBinding

create a cluster role and binding. Kubernetes resources access is regulated via role-based access control (RBAC). RBAC uses the rbac.authorization.k8s.io API to manage authorization. In the RBAC API, a cluster role contains rules that represent a set of permissions on the Kubernetes cluster. A cluster role will be used to provide access to the following:

  • Non-resource endpoints (like /healthz)
  • Cluster-scoped resources (like nodes)
  • Namespaced resources (like pods) across all namespaces (needed to run kubectl get pods --all-namespaces, for example) Cluster role binding grants the permissions defined in a cluster role to a user or set of users. It holds a list of subjects (users, groups, or service accounts) and a reference to the role being granted. Permissions can be granted within a namespace cluster-wide using a cluster role binding.
$kubectl create -f clusterRole.yaml

Create a Config Map

A config map will be used to decouple any configuration artifacts from image content and alerting rules, which will be mounted to the Prometheus container in the /etc/prometheus as prometheus.yaml and prometheus.rules files.

kubectl create -f config-map.yaml

Create a Prometheus Deployment

$kubectl apply -f prometheus-deployment.yaml -n monitoring

Monitoring k8

static config

    global:
      scrape_interval:     15s
      external_labels:
        monitor: 'eks-dev-monitor'
    
    scrape_configs:
      
      - job_name: 'prometheus'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9090']
      - job_name: 'redis'
        static_configs:
          - targets: ['redis:9121']

kubernetes_sd_configs

The kubernetes_sd_configs configuration describes how to retrieve the list of targets to scrape using the Kubernetes REST API. Kubernetes has a component called API server that exposes a REST API that lets end-users, different parts of your cluster, and external components communicate with one another.

To discover targets, multiple roles can be chosen.


- job_name: 'kubernetes-pods'       
      kubernetes_sd_configs:       
      - role: pod

The config means pull metrics from https://${POD_IP}:${POD_PORT}/metrics of all the pods in kubernetes, where POD_IP and POD_PORT can be found in pod spec. This works in theory but practically, we don’t want to scrape from all the pods and sometimes it provides metrics in specific port and path. How do we implement a switch for scraping and specify scrape address for each pod? The answer is relabelling

- job_name: 'kubernetes-pods'       
      kubernetes_sd_configs:       
      - role: pod       
      relabel_configs:       
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]         
        action: keep         
        regex: true       
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]         
        action: replace         
        target_label: __metrics_path__         
        regex: (.+)       
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]         
        action: replace         
        regex: ([^:]+)(?::\d+)?;(\d+)         
        replacement: $1:$2         
        target_label: __address__       
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]         
        action: replace         
        target_label: __scheme__         
        regex: (.+)

First of all, we can use it by special pod annotations:

1 annotations:   
2   "prometheus.io/scrape":"true", 3
3   "prometheus.io/path":"/mypath/metrics", 
4   "prometheus.io/port":"8080",   
5   "prometheus.io/scheme":"http"

Labels begin with __ are special labels used inside prometheus. The scrape address will be represents by ${__scheme__}://${__address__}/${__metrics_path__} which is http://${POD_IP}:8080/mypath/metrics</. Labels begin with __meta_kubernetes_pod_annotation represent pod annotations, and relabel_configs works one by one:

  1. Continue if prometheus.io/scrape is set to true, else ignore the pod;
  2. Replace value of __metrics_path__by value of prometheus.io/path;
  3. Change port in __address__ by value of prometheus.io/port;
  4. Replace value of __scheme__by value of prometheus.io/scheme;

Explanation

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]         
        action: keep         
        regex: true     

Scrapowane sa tylko nody ktore maja taka anotacje

...
apiVersion: ...
kind: Deployment
...
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"

Other example

annotations:
    prometheus.io/scrape: "true"
    prometheus.io.scheme: "https"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "9191"

This would mean that the corresponding Kubernetes object will be scraped thanks to the annotation prometheus.io/scrape value of true, that the metrics can be reached at port 9191 at path /metrics. It is worth noticing that the name of the annotation can be anything you want. To showcase this, we’ll then use the followings:

annotations:
    se7entyse7en.prometheus/scrape: "true"
    se7entyse7en.prometheus/scheme: "https"
    se7entyse7en.prometheus/path: "/metrics"
    se7entyse7en.prometheus/port: "9191"

As previously mentioned, each target that is scraped comes with some default labels depending on the role and on the type of target. The relabel_config provides the ability to rewrite the set of labels of a target before it gets scraped.

What does this mean? Let’s say for example that thanks to our kubernetes-service-endpoints scraping job configured with role: endpoints Prometheus discovers a Service object by using the Kubernetes API. For each target, the list of rules in relabel_config is applied to that target.

Let’s consider a service as follows:

apiVersion: v1
kind: Service
metadata:
  name: app
  annotations:
    se7entyse7en.prometheus/scrape: "true"
    se7entyse7en.prometheus/scheme: "https"
    se7entyse7en.prometheus/path: "/metrics"
    se7entyse7en.prometheus/port: "9191"
spec:
  selector:
    app: app
  ports:
  - port: 9191

when applying the relabelling rules, Prometheus has just discovered the target, but it didn’t yet scrape the metrics. Indeed, we’ll now see that the way the metrics are going to be scraped, will depend on the relabelling rules.

To scrape or not to scrape ?

The first rule controls whether the target has to be scraped at all or not

        - source_labels: [__meta_kubernetes_service_annotation_se7entyse7en_prometheus_scrape]
          action: keep
          regex: true

As you can see the source_labels is a list of labels. This list of labels is first concatenated by using a separator that can be configured and that is ; by default. Given that in this rule there’s only one item, there’s no concatenation happening.

service there’s a meta label called __meta_kubernetes_service_annotation_<annotationname> that maps to the corresponding (slugified) annotation in the service object. In our example then, the concatenated source_labels is simply equal to the string true thanks to se7entyse7en.prometheus/scrape: "true".

The action: keep makes Prometheus ignore all the targets whose concatenated source_labels don’t match the regex that in our case is equal to true. Since according to our example the regex true matches the value true, the target is not ignored. Don't confuse true with being a boolean here, you can even decide to use a regex that matches an annotation value of "yes, please scrape me".

Remove


        # Drop finished jobs
        - action: drop
          regex: Succeeded|Failed
          source_labels:
          - __meta_kubernetes_pod_phase

Where are the metrics?

        - source_labels: [__meta_kubernetes_service_annotation_se7entyse7en_prometheus_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)

According to previous logic, the concatenated source_labels is equal to https thanks to the se7entyse7en.prometheus/scheme: "https" annotation. The action: replace replaces the label in target_label with the concatenated source_labels if the concatenated source_labels matches the regex. In our case, the regex (https?) matches the concatenated source_labels that is https. The outcome is that the label scheme now has the value of https.

But what is the label __scheme__? The label __scheme__ is a special one that indicates to Prometheus what is the URL that should be used to scrape the target's metrics. After the relabelling, the target's metrics will be scraped at __scheme__://__address____metrics_path__ where __address__ and __metrics_path__ are two other special labels similarly to __scheme__. The next rules will indeed deal with these.

The third rule controls what is the path that exposes the metrics:

        - source_labels: [__meta_kubernetes_service_annotation_se7entyse7en_prometheus_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)

This rule works exactly like the previous one, the only difference is the regex. With this rule, we replace __metrics_path__ with whatever is in our custom Kubernetes annotation. In our case, it will be then equal to /metrics thanks to the se7entyse7en.prometheus/path: "/metrics" annotation.

The fourth rule finally controls the value of address that is the missing part to have the final URL to scrape:

        - source_labels: [__address__, __meta_kubernetes_service_annotation_se7entyse7en_prometheus_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2

This rule is very similar to the previous ones, the differences are that it also has a replacement key and that we have multiple source_labels. Let’s start with the source_labels. As previously explained, the values are concatenated and the separator ; is used. By default the label __address__ has the form <host>:<port> and is the address that Prometheus used to discover the target. I don’t know exactly what port is used for that purpose, but it's not important for our goal, so let’s just assume that is 1234 and that the host is something like se7entyse7en_app_service. Thanks to the se7entyse7en.prometheus/port: "9191" annotation, we obtain that the concatenated source_labels is equal to: se7entyse7en_app_service:1234;9191. From this string, we want to keep the host but use the port coming from the annotation. The regex and the replacement configurations are exactly meant for this: the regex uses 2 capturing groups, one for the host, and one for the port, and the replacement is set up in a way so that the output is $1:$2 that corresponds to the captured host and port separated by :.

So now we finally have __scheme__, __address__ and __metrics_path__! We said that the target URL that will be used for scraping the metrics is given by:

__scheme__://__address____metrics_path__

If we replace each part we have:

https://se7entyse7en_app_service:9191/metrics

Extra labels

The remaining rules are simply adding some default labels to the metrics when they'll be stored:

        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_service
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod

In this case, we're adding the labels kubernetes_namespace, kubernetes_service, and kubernetes_pod from the corresponding meta labels.

To recap, these are the steps to automatically discover the targets to scrape with the configured labels:

  1. Prometheus discovers the targets using the Kubernetes API according to the kubernetes_sd_config configuration,
  2. Relabelling is applied according to relabel_config,
  3. Targets are scraped according to special labels __address__, scheme, metrics_path,
  4. Metrics are stored with the labels according to relabel_config and all the labels starting with __ are stripped

For other resources like nodes, services or endpoints, it works the same way as pods. The details of configuration can be found at https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config.

Kubernetes SD in Prometheus has a collection of so-called “roles”, which defines how to collect and display metrics. Each Role has its own set of labels which you already know from the official documentation. The ones that interest you are:

  • service: will find and return each Service and its Port
  • pod: will find pods and return its containers as targets to grab metrics from
  • endpoints: will create targets from each Endpoint for each Service found in a cluster

Some examples can be found below:

The difference between roles is not only in how targets are discovered but also in which labels are automatically attached to those targets. For example, with role node each target has a label called __meta_kubernetes_node_name that contains the name of the node object, which is not available with role pod. With role pod each target has a label called __meta_kubernetes_pod_name that contains the name of the pod object, which is not available with role node.

The nice thing about the role endpoints is that Prometheus provides different labels depending on the target: if it’s a pod, then the labels provided are those of the role pod, if it’s a service, then those of the role service. In addition, there’s also a set of extra labels that are available independently from the target.

Service monitoring

  # Example scrape config for probing services via the Blackbox Exporter.
  #
  # The relabeling allows the actual service scrape endpoint to be configured
  # for all or only some services.
  - job_name: "kubernetes-services"

    metrics_path: /probe
    params:
      module: [http_2xx]

    kubernetes_sd_configs:
      - role: service

    relabel_configs:
      # Example relabel to probe only some services that have "example.io/should_be_probed = true" annotation
      #  - source_labels: [__meta_kubernetes_service_annotation_example_io_should_be_probed]
      #    action: keep
      #    regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter.example.com:9115
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: service

Seldon

https://docs.seldon.io/projects/seldon-core/en/latest/analytics/analytics.html

    scrape_configs:
    
      - job_name: 'kubernetes-pods'

        kubernetes_sd_configs:
        - role: pod

        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        # Drop finished jobs
        - action: drop
          regex: Succeeded|Failed
          source_labels:
          - __meta_kubernetes_pod_phase

      - job_name: seldon-models
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          action: keep
          regex: true
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_pod_label_seldon_io_model
        - source_labels: [__meta_kubernetes_pod_container_port_name]
          action: keep
          regex: metrics(-.*)?
        - source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: ${1}:${2}
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels:
          - __meta_kubernetes_namespace
          action: replace
          target_label: kubernetes_namespace
        - source_labels:
          - __meta_kubernetes_pod_name
          action: replace
          target_label: kubernetes_pod_name

Relabeling

Prometheus labels

Labels are sets of key-value pairs that allow us to characterize and organize what’s actually being measured in a Prometheus metric.

For example, when measuring HTTP latency, we might use labels to record the HTTP method and status returned, which endpoint was called, and which server was responsible for the request.

Internal labels

Prometheus also provides some internal labels for us. These begin with two underscores and are removed after all relabeling steps are applied; that means they will not be available unless we explicitly configure them to.

Some of these special labels available to us are

Label name Description
name The scraped metric’s name
address host:port of the scrape target
scheme URI scheme of the scrape target
metrics_path Metrics endpoint of the scrape target
_param is the value of the first URL parameter passed to the target
scrape_interval The target’s scrape interval (experimental)
scrape_timeout The target’s timeout (experimental)
_meta Special labels set set by the Service Discovery mechanism
__tmp Special prefix used to temporarily store label values before discarding them
Available actions
  • keep/drop The keep and drop actions allow us to filter out targets and metrics based on whether our label values match the provided regex.
  • labelkeep/labeldrop The labelkeep and labeldrop actions allow for filtering the label set itself.
  • replace Replace is the default action for a relabeling rule if we haven’t specified one; it allows us to overwrite the value of a single label by the contents of the replacement field.
  • hashmod The hashmod action provides a mechanism for horizontally scaling Prometheus.
  • labelmap The labelmap action is used to map one or more label pairs to different label names.
⚠️ **GitHub.com Fallback** ⚠️