Ticket 270 ‐ Implement Prometheus and Grafana Monitoring System - SupaHotBall/OE2-Group-D GitHub Wiki
- Prometheus Setup
- Download and install Prometheus v2.53.4
- Configure prometheus.yml with proper scrape intervals
- Create systemd service for Prometheus
- Verify web interface at http://:9090
- Node Exporter Configuration
- Install node_exporter v1.9.1 on all monitored servers
- Set up systemd service for node_exporter
- Add node_exporter targets to Prometheus config
- Alert Rules Implementation
- Create two rule files (rule1.yml, rule2.yml) with:
- Instance down detection
- High CPU/Memory usage alerts
- Disk space warnings
- Validate rules using promtool
- Configure in prometheus.yml
- Grafana Installation
- Install and configure Grafana
- Add Prometheus as data source
- Import Node Exporter dashboard (ID: 1860)
- Verification
- Confirm all targets are UP in Prometheus
- Validate alert rules are evaluating correctly
- Check Grafana dashboards display metrics properly
Download Prometheus in the backup server
wget https://github.com/prometheus/prometheus/releases/download/v2.53.4/prometheus-2.53.4.linux-amd64.tar.gz
Unpack the tar file and move into the extracted directory
tar -xvf prometheus-2.53.4.linux-amd64.tar.gz
cd prometheus-2.53.4.linux-amd64
Start Prometheus
sudo ./prometheus --config.file=prometheus.yml
Access the Prometheus UI at http://13.75.179.26:9090/
Create a systemd service for Prometheus. First, copy the Prometheus files to /usr/local/bin/prometheus
sudo cp -r . /usr/local/bin/prometheus
Then create a service file at /etc/systemd/system/prometheus.service
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/prometheus/prometheus \
--config.file=/usr/local/bin/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/data \
--web.console.templates=/usr/local/bin/prometheus/consoles
RestartSec=5s
[Install]
WantedBy=multi-user.target
sudo mkdir -p /var/lib/prometheus/data
`
Create a user and group for Prometheus
sudo useradd --no-create-home --shell /bin/false prometheus
Ensure that appropriate ownerships are set
sudo chown -R prometheus:prometheus /var/lib/prometheus
Reload systemd, start Prometheus and check its status
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl status prometheus
Visit http://13.75.179.26:9090/targets to access the Prometheus dashboard. It should show in this instance, one target
Enter a metric name in the text box to view a summary of the collected data for the selected metric
promhttp_metric_handler_requests_total
Set up a Node Exporter by first downloading Node Exporter onto the backup server
wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz
Unpack the tarbell
tar -xzf node_exporter-1.9.1.linux-amd64.tar.gz
Cd into the unpacked directory and run the command ./node_exporter
Copy node_exporter executable into /user/local/bin/
sudo cp node_exporter /usr/local/bin/
Create a systemd service f ile to manage the exporter service
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=nodeuser
Group=nodeuser
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Create a user without the login shell for nodeuser
sudo useradd --no-create-home --shell /usr/sbin/nologin nodeuser
Set the permissions
sudo chown nodeuser:nodeuser /usr/local/bin/node_exporter
Reload systemd and start node_exporter
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl status node_exporter
Test that the node_exporter service file is running on http://13.75.179.26:9100/
Configure the new target in the prometheus.yml file which can be found at /usr/local/bin/prometheus/prometheus.yml Add the following code under the scrape_configs section
- job_name: ’node-exporter’
static_configs:
- targets: [’localhost:9100’]
Restart Prometheus after saving the configuration
sudo systemctl restart prometheus
Check that the node-exporter target appears in the targets list on the Prometheus page
Add rule groups to Prometheus in the same directory that the prometheus.yml file is located and create 2 new files (rule1.yml, rule2.yml):
rule1.yml:
groups:
- name: record-rules
interval: 30s
rules:
- record: node_memory_MemFree_in_percent
expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
labels:
team: backend
- name: alert-rules
interval: 30s
rules:
- alert: InstanceDown
expr: instance:up == 0
for: 1m
labels:
severity: warning
team: infrastructure
annotations:
summary: "Instance [{{ $labels.instance }}] down"
description: "[{{ $labels.instance }}] of [{{ $labels.job }}] has been down for more than 1 minute."
rule2.yml:
groups:
- name: example-rules
interval: 30s
rules:
- record: job:http_inprogress_requests:sum
expr: sum(http_inprogress_requests) by (job)
labels:
team: backend
- alert: HighCPULoad
expr: instance:node_cpu_utilization:rate5m > 0.85
for: 10m
labels:
severity: warning
team: infrastructure
annotations:
summary: "High CPU load on {{ $labels.instance }}"
description: "CPU usage is above 85% for more than 10 minutes."
Check that both rules are correctly defined using Promtool
sudo ./promtool check rules rule1.yml
Add the rules to the prometheus.yml file:
Reload the daemon and restart Prometheus
sudo systemctl daemon-reload
sudo systemctl restart prometheus
Install and configure Grafana by first installing Grafana dependencies
# Install dependencies
sudo apt-get install -y apt-transport-https software-properties-common
# Add Grafana GPG key
sudo wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# Add Grafana APT repository
sudo echo "deb https://packages.grafana.com/oss/deb stable main" \
| sudo tee -a /etc/apt/sources.list.d/grafana.list
# Update package list
sudo apt-get update
# Install Grafana
sudo apt-get install grafana
sudo systemctl start grafana-server
Start and enable Grafana
sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server
Ensure that firewall allows incoming TCP traffic from port 3000 which is the default port that Grafana uses
sudo ufw allow 3000/tcp
Grafana is located at http://13.75.179.26:3000/login. The username and password by default will both be admin. I have changed the password to the same password that our servers are using
Next go to Connections > Data Sources > Add a data source and select Prometheus. Enter the ip of the prometheus site. Click Save and Test to verify the connection.
Import the Node Exporter and dashboard by clicking the + icon to import a dashboard. Then, enter the dashboard ID 1860 and click "Load"
Select Prometheus as the data source and click on Import
Verify that the dashboard and node exporter targets are up
Did not find expected key error is usually caused by incorrect formatting. The yml file did not have correct indentation, after fixing the formatting of the file and re-running the checking command, the rules have been found
N/A
https://rt.dataraster.com/Ticket/Display.html?id=270