23. Prometheus ‐ Metrics - bohdanabadi/doroha-simulator GitHub Wiki

Metrics

In order to collect and display metrics about our modules we have opted to use prometheus an open source project that scraps data from our modules to store and be laters processed and fetched. The way its setup its pretty simple let's delve a bit

Setup

The setup comprises of two files. Docker file and yaml configrutaion file, lets take a look at both

FROM prom/prometheus:v2.50.1
ADD prometheus.yml /etc/prometheus/

The docker file is pretty simple and straight forward

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'api'
    scrape_interval: 45s
    static_configs:
      - targets: ['127.0.0.1:8081']
  - job_name: 'simulation'
    scrape_interval: 45s
    static_configs:
      - targets: ['127.0.0.1:8080']

The above configratuon was referenced from this link. It elaborates much more but basically our configuration.

This Prometheus configuration snippet outlines how metrics are collected from different targets within a system. Let's break down its key components:

Global Configuration

  • scrape_interval: 15s: This sets the default interval at which Prometheus scrapes (collects metrics from) targets. In this case, it's set to every 15 seconds. This means Prometheus will request data from each configured target every 15 seconds unless specified otherwise in individual scrape_configs.

  • external_labels: Labels specified here are attached to all time series and alerts when they're sent to external systems like federation partners, remote storage, or Alertmanager. In this configuration, every metric or alert will include a label monitor: 'codelab-monitor', which could be useful for distinguishing metrics from different Prometheus instances or environments.

Scrape Configurations

Under scrape_configs, specific targets from which Prometheus will scrape metrics are defined. Each entry under this section configures a distinct set of targets and specifies how they should be scraped.

  1. API Job:

    • job_name: 'api': A label job=api will be added to all time series data scraped from this job, helping to identify its source.
    • scrape_interval: 45s: Overrides the global scrape interval for this job, setting it to every 45 seconds. This means Prometheus will request metrics from the targets under this job every 45 seconds.
    • static_configs: Defines static targets for scraping. Here, Prometheus is configured to scrape metrics from 127.0.0.1:8081. This could represent an API server running on the local machine on port 8081.
  2. Simulation Job:

    • job_name: 'simulation': A label job=simulation will be added to all time series data scraped from this job.
    • scrape_interval: 45s: Similarly, for the simulation job, the scrape interval is set to every 45 seconds, overriding the default interval.
    • static_configs: Targets 127.0.0.1:8080 for metric collection, indicating a simulation service running locally on port 8080.

In summary, this configuration tells Prometheus to collect metrics from two local services (an API and a simulation service) at a specified interval of every 45 seconds, with a default interval of 15 seconds for other potential jobs. It also applies global labels to all data for identification when interfacing with external systems.

Important Notice: Setting the targets to 127.0.0.1:port allows this containerized application to interact with services outside the container. This setup is effective when the container is initiated with the --network="host" option, a method primarily suitable for Linux environments. For users on macOS and Windows, a slightly different approach is necessary to achieve similar functionality.

Modules Setup

Now that we have our scrapping service setup we have to do couple of things on our go apps

First thing we need to add prometheus client_golang this is a simple client that has a lot of functionality. Second we need to expose a /metrics endpoint. This is where prometheus scraps data.

Metrics endpoint

so a simple code snippet to expose that endpoint

reg := prometheus.NewRegistry()
m := observibility.GetMetrics()
promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{Registry: reg})
srv.engine.GET("/metrics", gin.WrapH(promHandler))

The above code just simply exposing an endpoitn for the prometheus service to hit and fetch metric data.

Adding custom metrics

So there are a lot of metric types a very good guide for prometheus for go is this link. But lets go over a simple example. Lets create a counter metric that will count the number of errors. Lets create a struct with that metric

type Metrics struct {
	ErrorCounter   prometheus.Counter
}

So this ErrorCounter with be of type Counter

So lets initiate this metric

const AppNameSpace = "api"
const ErrorCounterMetricName = "response_error_counter"
const ErrorCounterMetricHelp = "Error response counter"

func newMetrics() *Metrics {
	m := &Metrics{
		ErrorCounter: prometheus.NewCounter(prometheus.CounterOpts{
			Namespace: AppNameSpace,
			Name:      ErrorCounterMetricName,
			Help:      ErrorCounterMetricHelp,
		}),
	}
	return m
}

We are almost there, so we have to decide when we have to log an error. We can pick a handler func, and when we send back an error response we can log, like this

if err := c.ShouldBindJSON(&journeyToPatch); err != nil {
		observibility.GetMetrics().LogErrorCounter()
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

So this code is saying if we are not able deserialize data from the request then lets send back an error response but before that lets log an error

func (m *Metrics) LogErrorCounter() {
	m.ErrorCounter.Inc()
}

And LogErrorCounter func is simply Incrementing the Counter.

Feedback and Debugging

Now lastly we can go to localhost:9090 and check prometheus dashboard inspect our metric and check the counter value.

⚠️ **GitHub.com Fallback** ⚠️