3 ‐ Alert - CPNV-ES-MON1/Prometheus GitHub Wiki
Alertmanager
Version used: 0.27.0
Prerequisites
Download the latest version of Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
--2024-06-10 07:11:32-- https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
Resolving github.com (github.com)... 140.82.116.4
Connecting to github.com (github.com)|140.82.116.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/11452538/18333c17-a97b-4a1d-84f7-3562435ca553?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20240610%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240610T071132Z&X-Amz-Expires=300&X-Amz-Signature=ba523c961ede794ff88a0ac58e8ccffd641bcc08f4cbd7f5ccbd348113159622&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=11452538&response-content-disposition=attachment%3B%20filename%3Dalertmanager-0.27.0.linux-amd64.tar.gz&response-content-type=application%2Foctet-stream [following]
--2024-06-10 07:11:32-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/11452538/18333c17-a97b-4a1d-84f7-3562435ca553?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20240610%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240610T071132Z&X-Amz-Expires=300&X-Amz-Signature=ba523c961ede794ff88a0ac58e8ccffd641bcc08f4cbd7f5ccbd348113159622&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=11452538&response-content-disposition=attachment%3B%20filename%3Dalertmanager-0.27.0.linux-amd64.tar.gz&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30866868 (29M) [application/octet-stream]
Saving to: ‘alertmanager-0.27.0.linux-amd64.tar.gz’
alertmanager-0.27.0.linux-amd 100%[=================================================>] 29.44M 22.9MB/s in 1.3s
2024-06-10 07:11:34 (22.9 MB/s) - ‘alertmanager-0.27.0.linux-amd64.tar.gz’ saved [30866868/30866868]
tar -xvf alertmanager-0.27.0.linux-amd64.tar.gz
alertmanager-0.27.0.linux-amd64/
alertmanager-0.27.0.linux-amd64/alertmanager
alertmanager-0.27.0.linux-amd64/alertmanager.yml
alertmanager-0.27.0.linux-amd64/NOTICE
alertmanager-0.27.0.linux-amd64/amtool
alertmanager-0.27.0.linux-amd64/LICENSE
cd alertmanager-0.27.0.linux-amd64
sudo mv alertmanager /usr/local/bin/
sudo mv amtool /usr/local/bin/
Create a user for Alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /app/alertmanager
sudo mkdir /var/lib/alertmanager
sudo chown -R alertmanager:alertmanager /app/alertmanager /var/lib/alertmanager
Setup the service
Create a configuration file for Alertmanager
sudo nano /app/alertmanager/alertmanager.yml
Configuration for sending notifications to Discord
global:
resolve_timeout: 5m
route:
receiver: 'discord'
group_wait: 5s
group_interval: 5s
repeat_interval: 3m
receivers:
- name: 'discord'
discord_configs:
- webhook_url: '<WEBHOOK_URL>'
send_resolved: true
title: '{{ template "discord.notification.title" . }}'
message: '{{ template "discord.notification.description" . }}'
templates:
- '/app/alertmanager/alertmanager-templates.tmpl'
For the Webhook URL:
- On Discord, create a new server or use an existing one
- Set up a dedicated channel for receiving alerts
- Click on the settings for the channel you created
- Go to the "Integrations" section and create a new webhook
- Copy the webhook URL
Template for notifications
Create a template to customize notification messages
sudo nano /app/alertmanager/alertmanager-templates.tmpl
{{ define "discord.notification.title" }}
Alerte: {{ .CommonLabels.alertname }}
{{ end }}
{{ define "discord.notification.description" }}
Bonjour,
{{ if gt ( .Alerts.Firing | len) 0}}
Nous avons détecté un problème sur la machine **{{ .CommonLabels.machine_name }}**.
La machine est momentanément indisponible.
Nous travaillons à la résolution du problème.
{{ end }}
{{ if gt ( .Alerts.Resolved | len) 0}}
L'alerte concernant **{{ .CommonLabels.alertname }}** sur la machine **{{ .CommonLabels.machine_name }}** a été résolue.
La machine est fonctionnelle.
{{ end }}
Cordialement,
L'équipe IT.
{{ end }}
sudo chown alertmanager:alertmanager /app/alertmanager/alertmanager.yml
Create a systemd service file for Alertmanager
sudo nano /etc/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
ExecStart=/usr/local/bin/alertmanager --config.file=/app/alertmanager/alertmanager.yml --storage.path=/var/lib/alertmanager
Restart=always
[Install]
WantedBy=multi-user.target
Start the service and enable it
Reload systemd and enable Alertmanager service
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
Created symlink /etc/systemd/system/multi-user.target.wants/alertmanager.service → /etc/systemd/system/alertmanager.service.
sudo systemctl start alertmanager
Update Prometheus server
Modify Prometheus configuration (prometheus.yml
) to add Alertmanager
sudo nano /app/prometheus2.51.2/prometheus.yml
Add under alerting
alerting:
alertmanagers:
- static_configs:
- targets:
- 'localhost:9093'
Also add under rule_files the alert rules file
rule_files:
- "/app/prometheus2.51.2/alert.rules.yml"
Rules file
Create the alert rules file
sudo nano /app/prometheus2.51.2/alert.rules.yml
groups:
- name: DebianMemory
rules:
- alert: DebianMemoryUsageOver60%
expr: ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes) * 100 > 60
for: 30s
labels:
severity: warning
machine_name: "[{{ $labels.machinename }}]"
description_client: "Utilisation mémoire trop élevée"
annotations:
summary: "Instance [{{ $labels.instance }}] > 60% memory usage"
description: "Memory usage is at {{ $value }}%"
- name: WindowsMemory
rules:
- alert: WindowsMemoryUsageOver60%
expr: ((windows_cs_physical_memory_bytes - windows_os_physical_memory_free_bytes) / windows_cs_physical_memory_bytes) * 100 > 60
for: 30s
labels:
severity: warning
machine_name: "[{{ $labels.machinename }}]"
description_client: "Utilisation mémoire trop élevée"
annotations:
summary: "Instance [{{ $labels.instance }}] > 60% memory usage"
description: "Memory usage is at {{ $value }}%"
sudo systemctl restart prometheus