Using Prometheus and Grafana for Virtual Resources Monitoring - caprivm/thesis_msc GitHub Wiki

Introduction

To get a single-node Prometheus + Grafana, the minimum hardware and operative system requirements are:

Feature Value
CPU 4
RAM 8 GiB
Disk 100 GB
OS Used Ubuntu 20.04 LTS

The contents of this page are:

Quick Introduction to Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.

Features

Prometheus's main features are:

  • A multi-dimensional data model with time series data identified by metric name and key/value pairs
  • PromQL, a flexible query language to leverage this dimensionality
  • No reliance on distributed storage; single server nodes are autonomous
  • Time series collection happens via a pull model over HTTP
  • Pushing time series is supported via an intermediary gateway
  • Targets are discovered via service discovery or static configuration
  • Multiple modes of graphing and dashboarding support

For more depth information, please see the official documentation of Prometheus.

Quick Introduction to Grafana

Grafana is open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics no matter where they are stored. In plain English, it provides you with tools to turn your time-series database (TSDB) data into beautiful graphs and visualizations. For more depth information about Grafana, please see the official documentation.

Install Docker

In this section is important considerate that the OS used for deploy Docker is Ubuntu 20.04 LTS or later. The reason is that the key for add repository consider this OS in the add-apt-repository command. Change this line according to your OS.

# Install Docker essentials.
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common && sudo apt install net-tools
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# sudo apt-key fingerprint 0EBFCD88 # Uncomment if you want validate the key
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

Verify that Docker Engine is installed correctly by running the hello-world image.

sudo docker run hello-world

Next, execute sudo docker ps -a. You must to see the container hello-world in the list.

Reference: https://docs.docker.com/engine/install/ubuntu/

Install Prometheus using Docker

All prometheus services are available as Docker images on Docker Hub. First, you need to pull the docker on your host.

sudo docker pull prom/prometheus

Wait the container pulling. Before run the Docker image, you must configure a prometheus.yml file in /etc/prometheus/ directory (if the directory does not exist, run mkdir -p /etc/prometheus/). This file is useful to get metrics in Debian Buster instance. Remember that this instance publish virtual resource performance using prometheus-node-exporter endpoint (in TCP9100 port). Now, we need to configure Prometheus for get this metrics through this endpoint. The prometheus.yml file has this configuration:

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'openstack'
    openstack_sd_configs:
      - role: 'instance'
        region: 'microstack'
        identity_endpoint: 'http://localhost:5000/v3/' # If localhost connections does not work, use VM IP.
        username: '<username>'  # For instance, <username> = admin
        domain_id: 'default'
        project_name: 'Default'
        password: '<password>'  # For instance, <password> = OAEHxLgCBz7Wz4usvolAAt61TrDUz6zz
    relabel_configs:
      - source_labels: [__meta_openstack_public_ip]
        target_label: __address__
        replacement: '$1:9100'
      - source_labels: [__meta_openstack_tag_prometheus]
        regex: true.*
        action: keep
      - source_labels: [__meta_openstack_tag_node_exporter]
        regex: true.*
        action: keep
      - action: labelmap
        regex: __meta_openstack_(.+)

Now, you can run Prometheus docker using the next command. In CLI must be appear the container ID assigned to Prometheus docker.

sudo docker run -d --name=prometheus -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
# 3d099c92648d

Wait a couple of minutes for the cotainer to start. You can run sudo docker ps -a to see the container status. Next, you can interact with the Prometheus UI in http://localhost:9090/graph. In this interface go to Status -> Service Discovery like show the next figure.

Service Discovery

So click on show more for OpenStack details. You should see some like this. In this part Prometheus is collecting virtual resource metrics from instance. For visualization, we use Grafana. See the next section.

Service Discovery OpenStack

Reference: https://prometheus.io/docs/prometheus/latest/installation/

Install Grafana using Docker

Grafana don't need some special configuration. You can execute docker pull and docker run without any additional details.

sudo docker pull grafana/grafana

Wait the container pulling. After that, run the container.

sudo docker run -d --name=grafana -p 3000:3000 grafana/grafana
# 3e0fa3a4e318

Reference: https://hub.docker.com/r/grafana/grafana/

When the docker runs you can access the Grafana UI in http://localhost:3000. Once accessed, you see a login interface whose user/password is admin/admin. Next, you must change the default password for something most secure. When you do this, you must see the next interface in screen.

First Screen Grafana

Next, consider import this dashboard to Grafana. For this, download JSON file of dashboard template and import on Grafana. If the link is broke, use my repository for downloading the JSON file. You can find the file in docs folder. When you download the JSON, go to the import option and click on upload JSON file.

Import Dashboard

When finish this procedure, wait for data to load on Prometheus and update the dashboard. By default this dashboard is update each minute in Grafana. Some example of dashboard can be seen in the next figure.

Dashboard Node Exporter

Note: It's important to say that you don't need to perform any configuration on Grafana node exporter dashboard. All needed configuration is upload by default in the JSON template.

Reference: https://yetiops.net/posts/prometheus-service-discovery-openstack/

Get Prometheus Data using Grafana API

A function that can be very useful when using Grafana is how I can obtain, from the CLI, a specific data from one of the graphs that the dashboard contains. For this, based on the example dashboard presented, to obtain data from the time series shown in one of its graphs, it is necessary to make use of the Grafana API or endpoint. For this, consider the following procedure.

  1. You must access the interface where you have the Grafana dashboard. Once there, it is necessary to go to the graph or chart from which the data is to be extracted and click on the title to display a series of options. From these options, the Inspect option must be selected and then Query, as shown in the figure.

Options from Chart

  1. In the pop-up interface, we are going to go to the Query tab and there, click on the Refresh button. When doing this procedure, you should see a text similar to the one in the following figure on the screen.

Query Chart

  1. From the text on the screen, we must take the one that belongs to the url option, since it is the endpoint that allows access to the data of the graph that interests us. The endpoint, once it is extracted, should be similar to the one presented as an example below.
api/datasources/proxy/1/api/v1/query_range?query=100%20-%20((node_filesystem_avail_bytes%7Binstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%2Cdevice!~'rootfs'%7D%20*%20100)%20%2F%20node_filesystem_size_bytes%7Binstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%2Cdevice!~'rootfs'%7D)&start=1618520160&end=1618606560&step=240

From the extracted endopint, the option query= shows what is the aggregate query that Grafana is doing to the metrics in Prometheus to show the trend line in the dashboard. The option start= represents the time from which data is taken, while the option end = is until the data is taken. Both temporary variables are in EPOCH Time format. The variable step= is used to define how often the data is taken, in the example, every 240 seconds, that is, every 4 minutes. Note that the operation (end-start)/step must be an integer value that represents the amount of data taken. If it is not an integer, Grafana will display an error.

Now, for this case, suppose that it is necessary to take the last data available from the trend line of interest. Consider the following: data is written to the dashboard every 30 seconds, so if it is 4:00:15 pm, the last data of interest is 4:00:00 pm. To get the start and end variables, consider the Python script:

import time

def get_resources_values():
    # Get date in epoch Time
    date_end = int(time.time())
    date_end = time.localtime(date_end)
    if date_end.tm_sec >= 0 and date_end.tm_sec <=30:   # Convert seconds to a valid interval in Grafana
        date_sec = 0
    else:
        date_sec = 30
    date_end = list(date_end)
    date_end[5] = date_sec
    date_end = tuple(date_end)
    date_end = int(time.mktime(date_end))   # To epoch time
    date_start = date_end - 300   # Five minutes before
    # Return values
    return date_start, date_end

def main():
    # Invoke function
    date_start, date_end = get_resources_values()
    print("date_start:", date_start)
    print("date_end:", date_end)

if __name__ == "__main__":
    main()

Then, using the date_start and date_end values, do the following query to the Grafana endpoint using the curl package. Please note that date_start = date_end - 300, this means that we are getting timestamps for a period of 5 minutes (300 seconds) from the current instant. Modify the value according to your need. Assume you are on a Linux OS and you have this option installed. The command that is executed to obtain this data is the following:

curl -u admin:<password_admin_grafana> -sb -H \"Accept: application/json\" \"http://localhost:3000/api/datasources/proxy/1/api/v1/query_range?query=sum%20by%20(mode)(irate(node_cpu_seconds_total%7Bmode%3D%27idle%27%2Cinstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%7D%5B5m%5D))%20*%20100&start="+str(date_start)+"&end="+str(date_end)+"&step=30\" | jq -r \'.data.result[].values[-1][1]\'

Replace <password_admin_grafana> with the password of the admin user in Grafana. Remember that this command would be giving you what is the last available point in the graph of Grafana based on the current time. Note the following points if the script does not return a value:

  1. There must be data in the graph of interest in Grafana for at least the last 5 minutes compared to the current time.
  2. Grafana must be running on 3000 port.
  3. Make sure you are using your Grafana endpoint. The presented command is just an example.
  4. The curl library and some version of Python (tested was 3.7.3) must be installed.

The result of the command is a number, in the case of the example, the result is 97. You can assign that value to a variable in Python using import os at the beginning and the following commands:

import os
# Used CPU in %
get_cpu_usage = "curl -u admin:orion -sb -H \"Accept: application/json\" \"http://10.80.81.189:3000/api/datasources/proxy/1/api/v1/query_range?query=sum%20by%20(mode)(irate(node_cpu_seconds_total%7Bmode%3D%27idle%27%2Cinstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%7D%5B5m%5D))%20*%20100&start="+str(date_start)+"&end="+str(date_end)+"&step=30\" | jq -r \'.data.result[].values[-1][1]\'"
# Resources status
cpu_usage = 100 - float(os.popen(get_cpu_usage).read())
# Show the data obtained from Grafana
print("The cpu_usage is:", cpu_usage)
⚠️ **GitHub.com Fallback** ⚠️