Prometheus and Grafana - lago-morph/chiller GitHub Wiki

Getting things running

# clone the git repository 
git clone https://github.com/prometheus-operator/kube-prometheus/ kube-prometheus
cd kube-prometheus

# Start a 1.28 version of k8s with minikube with some specific command-line arguments
minikube delete
minikube start --kubernetes-version=v1.28.12 --memory=5g --bootstrapper=kubeadm \
    --extra-config=kubelet.authentication-token-webhook=true \
    --extra-config=kubelet.authorization-mode=Webhook \
    --extra-config=scheduler.bind-address=0.0.0.0 \
    --extra-config=controller-manager.bind-address=0.0.0.0 \
    --feature-gates=SidecarContainers=true

# Apply the manifests in a couple of steps to avoid race conditions during startup
kubectl apply --server-side -f manifests/setup
kubectl wait \
	--for condition=Established \
	--all CustomResourceDefinition \
	--namespace=monitoring
kubectl apply -f manifests/

Now, in 3 different terminals (or do it in background if you don't care if status messages are interleaved) set up port forwarding. The address is the external address of my VM so I can use the web browser on my Windows laptop.

# grafana username/password is admin/admin
kubectl port-forward -n monitoring --address=192.168.88.130 svc/grafana 3000
kubectl -n monitoring port-forward --address=192.168.88.130 svc/alertmanager-main 9093
kubectl port-forward -n monitoring --address=192.168.88.130 svc/prometheus-k8s 9090

I used Kubernetes v1.28 because the compatibility matrix for kube-prometheus stopped there. Will check out 1.29 and maybe 1.30 later. GKE considers 1.29 the stable release, but you can specify 1.28 too, so if there are compatibility issues either I'll fix them (and contribute changes back to project) or I'll just sit at 1.28

What next

There are an almost-overwhelming number of pre-defined dashboards in Grafana. None of the ones I've looked at so far is particularly useful. Part of that is probably that I have no application deployed.
What do I want to monitor? Probably the database, CPU and memory usage on node VMs (k8s is usually configured without swap so out of memory is very bad), 4xx and 5xx errors from the chiller_frontend and chiller_api pods, and stats on latency for outside web requests through the application, by API call. This last one I'll use later for the deployment testing and rollout health monitoring (easy to "break" by adding a sleep somewhere).
- Let's focus on 4xx and 5xx errors first. There are two places this might come from, the service associated with them or the gunicorn WSGI server that is hosting the Flask applications in my pods. gunicorn provides instrumentation using StatsD. StatsD is a push protocol, whereas Prometheus does pulls. So you have to use some sort of adapter. Luckily, Prometheus already has such a beast, the statsd_exporter. This can be run standalone, or as a sidecar in each pod. The sidecar seems easier - each pod that uses gunicorn simply runs a second container in the same pod that collects the StatsD stream, and exposes a /metrics endpoint on the same IP as the pod for Prometheus to scrape. This isn't unusual - as a matter of fact this is how the k8s networking plugins work - they have their own container running in each pod dealing with networking.

I kind of followed this tutorial to see how this works. That tutorial is for docker, so I had to adapt it a bit to run on k8s. Argh. Unfortunately, sidecars are a beta feature of k8s starting with 1.29. So, not doing that I guess. I guess I'll have an exporter pod running or something.

Update, SidecarContainers is an alpha feature on 1.28. It is enabled using a feature gate. I need to restart minikube with the feature gate enabled. I'll update the instructions above so this happens. We can verify the feature gate works with kubectl -n kube-system describe pod kube-apiserver-minikube | grep Sidecar.

Sidecar containers

This simple example shows how sidecar containers work. We create a pod with two containers. One is nginx, which is serving on port 80. But we don't expose that port on the pod. We have a second container running socat, which is a really cool general-purpose forwarding utility. It listens on port 9011 (which we do expose), and whenever a request is received, it queries localhost:80. And it works :-) This is a modified version of an example from here.

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
  - image: alpine/socat
    name: socat
    ports:
      - containerPort: 9011
    command: ["socat"]
    args: ["-v", "TCP4-LISTEN:9011", "TCP4:localhost:80"]

Modifying Helm chart

Trying to modify the helm chart to do two things:

take the image information as an argument or something
use the sidecar

Image changed in command line

This isn't easy in Helm without messing around with how I structured my values.yaml file. Basically, Helm wants you to override simple config values, not things that are nested inside lists. You can do it but its fragile because you don't reference collection elements by name, you do it by number, and that just makes everything fragile. I have to just know that my api config is the first one in my list of deployments, and reference it as Deployment[0] instead of Deployment["api"]. Anyway, there are two ways to handle this (that aren't janky and tedious):

Use Kustomize as a post-renderer for Helm. This is actually really slick. See this blog post for an example.
Use Helmfile. This is probably overkill for now. I will come back to this.

Figured out how to do this. Used kustomize. I had some very strange behavior getting this to work that just "went away", and I don't know why. I'm wondering if there was a control character in my shell script I was using for post-rendering. Dunno.

In the ~/chiller/helm directory, created a couple of files:

kustomization.yaml:

images:
- name: ghcr.io/lago-morph/chiller_api
  newName: ttl.sh/59c3aefe-d5e9-4086-b02d-fa62b65c0eb8
  newTag: latest
- name: ghcr.io/lago-morph/chiller_frontend
  newName: ttl.sh/af2218b3-4508-4511-9635-b8ca5e622e59
  newTag: latest

resources:
- resources.yaml

The image names correspond to the random names I used while uploading to ttl.sh.

kust.sh

#!/bin/bash
cat > resources.yaml
kubectl kustomize
rm resources.yaml

Then if I invoke helm using kust.sh as a post-renderer, it substitutes in the different image name/tag. This took way longer than it should of due to an error that I couldn't find, then suddenly just went away. Probably a hidden character somewhere.

The helm command to install while using the kustomization was:

 helm install chiller ./chiller --post-renderer ./kust.sh

Metrics sidecar

Now I need to add the sidecar. I guess I could make that an option configured through the values.yaml file.

Will need a configmap to hold the mappings between StatsD and Prometheus event formats. Here is one to start with, from this article.

apiVersion: v1
kind: ConfigMap
metadata:
  name: statsd-conf
data:
  statsd.conf: |
    mappings:
      - match: chiller.*.gunicorn.request.status.*
        help: "http response code"
        name: "http_response_code"
        labels:
          component: "$1"
          status: "$2"
          job: "chiller_${1}_gunicorn_response_code"

Will need to map that into a container running in a pod. We need the following arguments:

containerPort:9202
arg: "--statsd.mapping-config=/statsd/statsd.conf"

I can see the defaults for the thing running in the container based on this file where the parameters are set for the statsd-exporter. There is also a Dockerfile in this repo that shows how the container is structured. It is pretty simple.

This is what it all ended up looking like with prototype .yaml files. I'll stick with those until I get prometheus and grafana configured, then I'll see about modifying the Helm chart.

frontend-sc-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: chiller-frontend-sc
  name: chiller-frontend-sc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chiller-frontend-sc
  template:
    metadata:
      labels:
        app: chiller-frontend-sc
      name: chiller-frontend-sc
    spec:
      volumes:
      - name: statsd-conf-volume
        configMap:
          name: statsd-conf
      containers:
      - image: ttl.sh/af2218b3-4508-4511-9635-b8ca5e622e59:latest
        name: chiller-frontend
        env:
        - name: CHILLER_HOST
          value: "chiller-api"
      - name: statsd-exporter
        image: prom/statsd-exporter
        args:
        - "--statsd.mapping-config=/statsd/statsd.conf"
        ports:
        - containerPort: 9102
        volumeMounts:
        - name: statsd-conf-volume
          mountPath: /statsd

Then a service file

frontend-sc-service.yaml:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: chiller-frontend-sc
  name: chiller-frontend-sc
spec:
  ports:
  - port: 80
    name: http
  - port: 9102
    name: prom
  selector:
    app: chiller-frontend-sc

I tried creating a CRD for Prometheus Operator like so - but it doesn't seem to work

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    name: chiller-frontend-sc
  name: chiller-frontend-sc
  namespace: default
spec:
  endpoints:
  - interval: 15s
    port: prom
  selector:
    matchLabels:
      name: chiller-frontend-sc

I can see that something is happening because in the Prometheus web UI I see something in status/targets and status/configuration. But it isn't detecting the endpoints correctly, and the documentation is kind of light on how to actually configure things. Argh.

This actually ended up working. There was an old version of this sitting in the monitoring namespace. When I deleted both of them, then recreated this one in default, it picked it up. Took a few minutes though.

I initially thought it wasn't getting the stats that were modified with the configmap for the frontend-sc service. It turns out it just had a name I needed to search more diligently to find. The ones not filtered are called "chiller_*". This one is just called "http_response_code". The data I need is there.

Note that there is another CRD called a ScrapeConfig that allows you to more-or-less directly configure Prometheus without going through all the autodiscovery stuff. The autodiscovery stuff is very powerful, but the learning curve is pretty extreme. In the short term it might be a shortcut to getting the few metrics I want to work with going and showing up in Grafana.

Got a couple of dashboards working in Grafana. First, created a load test deployment, and got that working so a load test pod is constantly creating a user, then logging in and adding 4 movies, then logging off and doing that 9 more times, then it starts again (after a random wait of 0-30 seconds). Had 6 of these running. The dashboards I have show the rate of requests served by the frontend (by http response code), and the 50%, 90% and 99% response time for each request going through the frontend.

Here are some snippets from the two graphs (most of the JSON is removed)

          "expr": "sum by(service, quantile) (chiller_frontend_gunicorn_request_duration{service=\"chiller-frontend-sc\"})",
          "legendFormat": "quantile={{quantile}}",
        }
      ],
      "title": "Response time through frontend",
      "type": "timeseries"


          "expr": "sum by(service, status) (rate(http_response_code{component=\"frontend\"}[$__rate_interval]))",
        }
      ],
      "title": "Request rate",
      "type": "timeseries"

Migrating to community helm chart

There are three related projects for installing prometheus-operator on Kubernetes:

Prometheus Operator which installs just prometheus, not the entire stack
kube-prometheus which is intended as a full stack monitoring solution, customizable by jsonnet
Prometheus community helm chart which is intended as a Helm version of kube-prometheus

I tried using the community Helm chart in the past, and for some reason I couldn't get it to work. I understand what is going on a lot better now, so I'm going to give it another go. This is because it seems to be more straightforward to deal with Helm charts using Terraform than it is to work with a collection of manifest files that have dependencies on each other (install CRDs first, then install the rest). It also seems a lot cleaner, as the solutions for dealing with manifest files I've come up with all seem like dirty hacks.

First get prometheus community repo and get it updated

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Install chart all gonzo-like

helm install prom-stack prometheus-community/kube-prometheus-stack

And, it had some issues that I will deal with

installed in default namespace instead of 'monitoring' (including CRDs) (use --set namespaceOverride=monitoring --set grafana.namespaceOverride=monitoring --set grafana.adminPassword=admin)
I thought Grafana didn't install - just took a while to show up (so this is fine)
Login to grafana is not admin/admin.

Uninstall and remove CRDs

helm uninstall prom-stack
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheusagents.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd scrapeconfigs.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com

Try again

 helm install prom-stack prometheus-community/kube-prometheus-stack --set grafana.adminPassword=admin --create-namespace --namespace monitoring

Debugging

It seems random if we are able to pick up the metrics in Grafana. I have a feeling it is a sequencing issue.

Check to see that statsd exporter working

Find IP addresses of pods

kubectl get pods -l 'app in (chiller-frontend,chiller-api)' -o custom-columns=CONTAINER:.status.podIP

kubectl run -it test --image=busybox -- sh

Then in there, do a wget of the IPs you have from above:

 wget -q -O - http://10.244.0.15:9102/metrics | grep promhttp

Check to see if the prometheus operator is looking at your ServiceMonitor CRDs

There is a CRD which configures the pometheus operator. It is of type prometheuses.monitoring.coreos.com.
I was able to look at mine with

kubectl get prometheuses.monitoring.coreos.com -n monitoring prom-stack-kube-prometheus-prometheus -o yaml | less

Depending on the version of the kube-prometheus stack that is installed (via manifests or helm chart) the specifics may be different.

In this case, that showed me that it was looking for a particular label for ServiceMonitor objects that it would monitor:

  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prom-stack

There are similar rules for scrapes, rules, probes, and podMonitors.

I fixed this with:

kubectl label servicemonitors.monitoring.coreos.com -n default --all release=prom-stack

Note that this targets all ServiceMonitor CRDs in the default namespace. The ones installed with the prometheus helm chart are in the monitoring namespace.