Using Prometheus and Grafana for Virtual Resources Monitoring - caprivm/thesis_msc GitHub Wiki
To get a single-node Prometheus + Grafana, the minimum hardware and operative system requirements are:
Feature | Value |
---|---|
CPU | 4 |
RAM | 8 GiB |
Disk | 100 GB |
OS Used | Ubuntu 20.04 LTS |
The contents of this page are:
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.
Prometheus's main features are:
- A multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- No reliance on distributed storage; single server nodes are autonomous
- Time series collection happens via a pull model over HTTP
- Pushing time series is supported via an intermediary gateway
- Targets are discovered via service discovery or static configuration
- Multiple modes of graphing and dashboarding support
For more depth information, please see the official documentation of Prometheus.
Grafana is open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics no matter where they are stored. In plain English, it provides you with tools to turn your time-series database (TSDB) data into beautiful graphs and visualizations. For more depth information about Grafana, please see the official documentation.
In this section is important considerate that the OS used for deploy Docker is Ubuntu 20.04 LTS or later. The reason is that the key for add repository consider this OS in the add-apt-repository
command. Change this line according to your OS.
# Install Docker essentials.
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common && sudo apt install net-tools
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# sudo apt-key fingerprint 0EBFCD88 # Uncomment if you want validate the key
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
Verify that Docker Engine is installed correctly by running the hello-world
image.
sudo docker run hello-world
Next, execute sudo docker ps -a
. You must to see the container hello-world
in the list.
Reference: https://docs.docker.com/engine/install/ubuntu/
All prometheus services are available as Docker images on Docker Hub. First, you need to pull the docker on your host.
sudo docker pull prom/prometheus
Wait the container pulling. Before run the Docker image, you must configure a prometheus.yml
file in /etc/prometheus/
directory (if the directory does not exist, run mkdir -p /etc/prometheus/
). This file is useful to get metrics in Debian Buster instance. Remember that this instance publish virtual resource performance using prometheus-node-exporter
endpoint (in TCP9100
port). Now, we need to configure Prometheus for get this metrics through this endpoint. The prometheus.yml
file has this configuration:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'openstack'
openstack_sd_configs:
- role: 'instance'
region: 'microstack'
identity_endpoint: 'http://localhost:5000/v3/' # If localhost connections does not work, use VM IP.
username: '<username>' # For instance, <username> = admin
domain_id: 'default'
project_name: 'Default'
password: '<password>' # For instance, <password> = OAEHxLgCBz7Wz4usvolAAt61TrDUz6zz
relabel_configs:
- source_labels: [__meta_openstack_public_ip]
target_label: __address__
replacement: '$1:9100'
- source_labels: [__meta_openstack_tag_prometheus]
regex: true.*
action: keep
- source_labels: [__meta_openstack_tag_node_exporter]
regex: true.*
action: keep
- action: labelmap
regex: __meta_openstack_(.+)
Now, you can run Prometheus docker using the next command. In CLI must be appear the container ID assigned to Prometheus docker.
sudo docker run -d --name=prometheus -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
# 3d099c92648d
Wait a couple of minutes for the cotainer to start. You can run sudo docker ps -a
to see the container status. Next, you can interact with the Prometheus UI in http://localhost:9090/graph. In this interface go to Status -> Service Discovery
like show the next figure.
So click on show more for OpenStack details. You should see some like this. In this part Prometheus is collecting virtual resource metrics from instance. For visualization, we use Grafana. See the next section.
Reference: https://prometheus.io/docs/prometheus/latest/installation/
Grafana don't need some special configuration. You can execute docker pull
and docker run
without any additional details.
sudo docker pull grafana/grafana
Wait the container pulling. After that, run the container.
sudo docker run -d --name=grafana -p 3000:3000 grafana/grafana
# 3e0fa3a4e318
Reference: https://hub.docker.com/r/grafana/grafana/
When the docker runs you can access the Grafana UI in http://localhost:3000. Once accessed, you see a login interface whose user/password is admin/admin
. Next, you must change the default password for something most secure. When you do this, you must see the next interface in screen.
Next, consider import this dashboard to Grafana. For this, download JSON file of dashboard template and import on Grafana. If the link is broke, use my repository for downloading the JSON file. You can find the file in docs folder. When you download the JSON, go to the import option and click on upload JSON file.
When finish this procedure, wait for data to load on Prometheus and update the dashboard. By default this dashboard is update each minute in Grafana. Some example of dashboard can be seen in the next figure.
Note: It's important to say that you don't need to perform any configuration on Grafana node exporter dashboard. All needed configuration is upload by default in the JSON template.
Reference: https://yetiops.net/posts/prometheus-service-discovery-openstack/
A function that can be very useful when using Grafana is how I can obtain, from the CLI, a specific data from one of the graphs that the dashboard contains. For this, based on the example dashboard presented, to obtain data from the time series shown in one of its graphs, it is necessary to make use of the Grafana API or endpoint. For this, consider the following procedure.
- You must access the interface where you have the Grafana dashboard. Once there, it is necessary to go to the graph or chart from which the data is to be extracted and click on the title to display a series of options. From these options, the
Inspect
option must be selected and thenQuery
, as shown in the figure.
- In the pop-up interface, we are going to go to the
Query
tab and there, click on theRefresh
button. When doing this procedure, you should see a text similar to the one in the following figure on the screen.
- From the text on the screen, we must take the one that belongs to the
url
option, since it is the endpoint that allows access to the data of the graph that interests us. The endpoint, once it is extracted, should be similar to the one presented as an example below.
api/datasources/proxy/1/api/v1/query_range?query=100%20-%20((node_filesystem_avail_bytes%7Binstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%2Cdevice!~'rootfs'%7D%20*%20100)%20%2F%20node_filesystem_size_bytes%7Binstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%2Cdevice!~'rootfs'%7D)&start=1618520160&end=1618606560&step=240
From the extracted endopint, the option query=
shows what is the aggregate query that Grafana is doing to the metrics in Prometheus to show the trend line in the dashboard. The option start=
represents the time from which data is taken, while the option end =
is until the data is taken. Both temporary variables are in EPOCH Time format. The variable step=
is used to define how often the data is taken, in the example, every 240 seconds, that is, every 4 minutes. Note that the operation (end-start)/step
must be an integer value that represents the amount of data taken. If it is not an integer, Grafana will display an error.
Now, for this case, suppose that it is necessary to take the last data available from the trend line of interest. Consider the following: data is written to the dashboard every 30 seconds, so if it is 4:00:15 pm, the last data of interest is 4:00:00 pm. To get the start
and end
variables, consider the Python script:
import time
def get_resources_values():
# Get date in epoch Time
date_end = int(time.time())
date_end = time.localtime(date_end)
if date_end.tm_sec >= 0 and date_end.tm_sec <=30: # Convert seconds to a valid interval in Grafana
date_sec = 0
else:
date_sec = 30
date_end = list(date_end)
date_end[5] = date_sec
date_end = tuple(date_end)
date_end = int(time.mktime(date_end)) # To epoch time
date_start = date_end - 300 # Five minutes before
# Return values
return date_start, date_end
def main():
# Invoke function
date_start, date_end = get_resources_values()
print("date_start:", date_start)
print("date_end:", date_end)
if __name__ == "__main__":
main()
Then, using the date_start
and date_end
values, do the following query to the Grafana endpoint using the curl
package. Please note that date_start = date_end - 300
, this means that we are getting timestamps for a period of 5 minutes (300 seconds) from the current instant. Modify the value according to your need. Assume you are on a Linux OS and you have this option installed. The command that is executed to obtain this data is the following:
curl -u admin:<password_admin_grafana> -sb -H \"Accept: application/json\" \"http://localhost:3000/api/datasources/proxy/1/api/v1/query_range?query=sum%20by%20(mode)(irate(node_cpu_seconds_total%7Bmode%3D%27idle%27%2Cinstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%7D%5B5m%5D))%20*%20100&start="+str(date_start)+"&end="+str(date_end)+"&step=30\" | jq -r \'.data.result[].values[-1][1]\'
Replace <password_admin_grafana>
with the password of the admin
user in Grafana. Remember that this command would be giving you what is the last available point in the graph of Grafana based on the current time. Note the following points if the script does not return a value:
- There must be data in the graph of interest in Grafana for at least the last 5 minutes compared to the current time.
- Grafana must be running on 3000 port.
- Make sure you are using your Grafana endpoint. The presented command is just an example.
- The
curl
library and some version of Python (tested was 3.7.3) must be installed.
The result of the command is a number, in the case of the example, the result is 97
. You can assign that value to a variable in Python using import os
at the beginning and the following commands:
import os
# Used CPU in %
get_cpu_usage = "curl -u admin:orion -sb -H \"Accept: application/json\" \"http://10.80.81.189:3000/api/datasources/proxy/1/api/v1/query_range?query=sum%20by%20(mode)(irate(node_cpu_seconds_total%7Bmode%3D%27idle%27%2Cinstance%3D%2210.80.81.165%3A9100%22%2Cjob%3D%22openstack%22%7D%5B5m%5D))%20*%20100&start="+str(date_start)+"&end="+str(date_end)+"&step=30\" | jq -r \'.data.result[].values[-1][1]\'"
# Resources status
cpu_usage = 100 - float(os.popen(get_cpu_usage).read())
# Show the data obtained from Grafana
print("The cpu_usage is:", cpu_usage)