23. Prometheus ‐ Metrics - bohdanabadi/doroha-simulator GitHub Wiki
In order to collect and display metrics about our modules we have opted to use prometheus an open source project that scraps data from our modules to store and be laters processed and fetched. The way its setup its pretty simple let's delve a bit
The setup comprises of two files. Docker file and yaml configrutaion file, lets take a look at both
FROM prom/prometheus:v2.50.1
ADD prometheus.yml /etc/prometheus/
The docker file is pretty simple and straight forward
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'api'
scrape_interval: 45s
static_configs:
- targets: ['127.0.0.1:8081']
- job_name: 'simulation'
scrape_interval: 45s
static_configs:
- targets: ['127.0.0.1:8080']
The above configratuon was referenced from this link. It elaborates much more but basically our configuration.
This Prometheus configuration snippet outlines how metrics are collected from different targets within a system. Let's break down its key components:
-
scrape_interval: 15s: This sets the default interval at which Prometheus scrapes (collects metrics from) targets. In this case, it's set to every 15 seconds. This means Prometheus will request data from each configured target every 15 seconds unless specified otherwise in individualscrape_configs. -
external_labels: Labels specified here are attached to all time series and alerts when they're sent to external systems like federation partners, remote storage, or Alertmanager. In this configuration, every metric or alert will include a labelmonitor: 'codelab-monitor', which could be useful for distinguishing metrics from different Prometheus instances or environments.
Under scrape_configs, specific targets from which Prometheus will scrape metrics are defined. Each entry under this section configures a distinct set of targets and specifies how they should be scraped.
-
API Job:
-
job_name: 'api': A labeljob=apiwill be added to all time series data scraped from this job, helping to identify its source. -
scrape_interval: 45s: Overrides the global scrape interval for this job, setting it to every 45 seconds. This means Prometheus will request metrics from the targets under this job every 45 seconds. -
static_configs: Defines static targets for scraping. Here, Prometheus is configured to scrape metrics from127.0.0.1:8081. This could represent an API server running on the local machine on port 8081.
-
-
Simulation Job:
-
job_name: 'simulation': A labeljob=simulationwill be added to all time series data scraped from this job. -
scrape_interval: 45s: Similarly, for the simulation job, the scrape interval is set to every 45 seconds, overriding the default interval. -
static_configs: Targets127.0.0.1:8080for metric collection, indicating a simulation service running locally on port 8080.
-
In summary, this configuration tells Prometheus to collect metrics from two local services (an API and a simulation service) at a specified interval of every 45 seconds, with a default interval of 15 seconds for other potential jobs. It also applies global labels to all data for identification when interfacing with external systems.
Important Notice: Setting the targets to 127.0.0.1:port allows this containerized application to interact with services outside the container. This setup is effective when the container is initiated with the --network="host" option, a method primarily suitable for Linux environments. For users on macOS and Windows, a slightly different approach is necessary to achieve similar functionality.
Now that we have our scrapping service setup we have to do couple of things on our go apps
First thing we need to add prometheus client_golang this is a simple client that has a lot of functionality. Second we need to expose a /metrics endpoint. This is where prometheus scraps data.
so a simple code snippet to expose that endpoint
reg := prometheus.NewRegistry()
m := observibility.GetMetrics()
promHandler := promhttp.HandlerFor(reg, promhttp.HandlerOpts{Registry: reg})
srv.engine.GET("/metrics", gin.WrapH(promHandler))The above code just simply exposing an endpoitn for the prometheus service to hit and fetch metric data.
So there are a lot of metric types a very good guide for prometheus for go is this link. But lets go over a simple example. Lets create a counter metric that will count the number of errors. Lets create a struct with that metric
type Metrics struct {
ErrorCounter prometheus.Counter
}So this ErrorCounter with be of type Counter
So lets initiate this metric
const AppNameSpace = "api"
const ErrorCounterMetricName = "response_error_counter"
const ErrorCounterMetricHelp = "Error response counter"
func newMetrics() *Metrics {
m := &Metrics{
ErrorCounter: prometheus.NewCounter(prometheus.CounterOpts{
Namespace: AppNameSpace,
Name: ErrorCounterMetricName,
Help: ErrorCounterMetricHelp,
}),
}
return m
}We are almost there, so we have to decide when we have to log an error. We can pick a handler func, and when we send back an error response we can log, like this
if err := c.ShouldBindJSON(&journeyToPatch); err != nil {
observibility.GetMetrics().LogErrorCounter()
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}So this code is saying if we are not able deserialize data from the request then lets send back an error response but before that lets log an error
func (m *Metrics) LogErrorCounter() {
m.ErrorCounter.Inc()
}And LogErrorCounter func is simply Incrementing the Counter.
Now lastly we can go to localhost:9090 and check prometheus dashboard inspect our metric and check the counter value.