Alerting with Prometheus - toge510/homelab GitHub Wiki

Alerting with Prometheus is separated into two parts. Alerting rules in Prometheus servers send alerts to an Alertmanager. The Alertmanager then manages those alerts, including silencing, inhibition, aggregation and sending out notifications via methods such as email, on-call notification systems, and chat platforms.

flowchart LR;
  A[Prometheus] -->|Alerts| C[Alertmanager];
  C -->|Notifications| D[Email,Slack,etc.];
Loading

I'd like to share the steps to setting up alerting and notifications with an example.

Table of contents


Create alerting rules in Prometheus

Create the alerting rule as shown below in rules.yml to specify the conditions when you would like to be alerted.

groups:
- name: example
  rules:
    - alert: DiskSpaceAlert
      expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"}) > 90
      for: 10s  # Trigger alert if condition met for 10s 
      labels:
        severity: critical
      annotations:
        summary: "Instance {{ $labels.instance }} : High Disk Space Usage Instance"
        description: "Instance {{ $labels.instance }} of job {{ $labels.job }} : Disk space usage on root filesystem is above 90%"

*Evaluating the metrics in Prometheus UI to see which instances are running.

image

Set rules file in prometheus config file(prometheus.yaml).

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules.yml"

You can reload a prometheus configuration by sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled).

curl -i -XPOST localhost:9090/-/reload

Access the Prometheus UI(localhost:9090) and confirm the alert rules are set.

image

image


Configure Prometheus to talk to the Alertmanager

Run Alertmanager and set its configuration in prometheus configuration file.
An alertmanager_config section specifies Alertmanager instances the Prometheus server sends alerts to.

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

Reload a prometheus configuration to reflect the Alertmanager configuration. and update the criteria of alert condition in rules.yml from 90% to 80% to make the alert fired as a test.

Prometheus UI(localhost:9090)

image

Alertmanager(localhost:9093)

image


Setup and configure the Alertmanager

Update alertmanager.yml.

Slack alerts

Use Incoming Webhooks that are a simple way to post messages from apps into Slack. Creating an Incoming Webhook gives you a unique URL to which you send a JSON payload with the message text and some options.

global:
  resolve_timeout: 1m
  slack_api_url: '<slack_api_url>'

route:
  receiver: 'slack-notifications'

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#monitoring'
    send_resolved: true
image

Gmail alerts

global:
  resolve_timeout: 1m

route:
  receiver: 'gmail-notifications'

receivers:
- name: 'gmail-notifications'
  email_configs:
  - to: <email address>@gmail.com
    from: <email address>@gmail.com
    smarthost: smtp.gmail.com:587
    auth_username: <email address>@gmail.com
    auth_identity: <email address>@gmail.com
    auth_password: <app password>
    send_resolved: true

image


References

⚠️ **GitHub.com Fallback** ⚠️