Prometheus - pranavkumarpk01/MD-DevOps GitHub Wiki


๐Ÿš€ What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit built for reliability and scalability. It is designed for recording real-time metrics in a time series database, built using a pull-based model over HTTP.


โ“ Why is Prometheus Required?

Modern applications are:

  • Distributed across multiple services and environments.
  • Dynamic, with autoscaling and microservices.
  • Resilient, but need real-time observability for performance and error monitoring.

โœ… Prometheus Solves:

  • Monitoring CPU, memory, network, disk usage, etc.
  • Tracking application metrics (latency, error rate, request count).
  • Triggering alerts based on threshold breaches.
  • Providing a query language (PromQL) for detailed metric analysis.

๐Ÿง  Prometheus Core Concepts

Component Description
Time Series Data points with a timestamp, metric name, and optional labels
Labels Key-value pairs attached to metrics (e.g., instance="server1")
PromQL Prometheus Query Language to filter, aggregate, and analyze time series
Targets The endpoints Prometheus scrapes metrics from
Exporter Tool to expose metrics from a system in Prometheus format (e.g., node_exporter)
Scraping Prometheus collects metrics via HTTP from exporters
Alertmanager Component to manage alerts triggered by rules

โš™๏ธ Prometheus Architecture

+----------------+     +---------------+     +---------------------+
|   Exporters    | <-- |   Prometheus  | --> |   Alertmanager      |
| (e.g., node)   |     |    Server     |     | (Email, Slack, etc) |
+----------------+     +---------------+     +---------------------+
         โ†‘                        |
         |                        โ†“
     App / Service         Prometheus UI / Grafana

๐Ÿ”Œ Exporters

Exporters expose metrics in a format Prometheus understands.

Common Exporters:

Exporter Monitors
node_exporter CPU, memory, disk, network of Linux/Unix nodes
blackbox_exporter HTTP, HTTPS, DNS, TCP endpoints
mysqld_exporter MySQL metrics
cadvisor Container metrics (CPU, memory, I/O)

๐Ÿ› ๏ธ Installing Prometheus (Manual Setup)

1๏ธโƒฃ Download Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
tar -xvf prometheus-2.51.0.linux-amd64.tar.gz
cd prometheus-2.51.0.linux-amd64

2๏ธโƒฃ Start Prometheus

./prometheus --config.file=prometheus.yml

Prometheus starts at http://localhost:9090


๐Ÿ“ prometheus.yml โ€“ Configuration File

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
  • scrape_interval: How often to collect data (default 15s).
  • job_name: Logical name for the target.
  • targets: List of exporters.

๐Ÿ”Ž Prometheus Web UI

  • Access: http://localhost:9090
  • Key features:
    • Graph Explorer
    • Target Status
    • Alerts Page
    • PromQL Query Editor

๐Ÿ“ˆ Prometheus + Grafana = โค๏ธ

Prometheus stores data, but Grafana visualizes it beautifully.

๐Ÿงฐ Grafana Setup:

  • Add Prometheus as a data source
  • Import dashboards (e.g., Node Exporter Full)
  • Create custom graphs using PromQL

๐Ÿ” PromQL Examples

Query Description
up Returns whether the targets are reachable (1 = up, 0 = down)
node_cpu_seconds_total Total CPU seconds per mode
rate(http_requests_total[1m]) Requests per second
avg(rate(container_cpu_usage_seconds_total[5m])) by (container) Average CPU per container

๐Ÿšจ Alerting with Prometheus

Prometheus can trigger alerts based on rules and send them to Alertmanager.

๐Ÿ“„ Sample Alert Rule:

groups:
- name: example
  rules:
  - alert: HighCPUUsage
    expr: avg(rate(node_cpu_seconds_total{mode="user"}[1m])) > 0.7
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"

๐Ÿ“จ Alertmanager Overview

  • Handles alert delivery
  • Supports routing, grouping, inhibition
  • Supports Email, Slack, PagerDuty, Webhooks

โœ… Steps:

  1. Download & configure Alertmanager.
  2. Connect it with Prometheus in prometheus.yml.
  3. Define alert rules and receivers.

๐Ÿงช Real Use Cases

  1. Infrastructure Monitoring:

    • Servers, VMs, containers using node_exporter, cadvisor.
  2. Application Monitoring:

    • Request count, latency, error rate.
  3. Kubernetes Monitoring:

    • With kube-prometheus-stack (by Prometheus + Grafana + exporters).
  4. CI/CD Monitoring:

    • Monitor Jenkins, deployments, build failures.
  5. Business Metrics:

    • Track user signups, API usage, revenue.

๐Ÿ” Security Considerations

  • Prometheus has no built-in auth (use reverse proxy like Nginx).
  • Enable HTTPS using reverse proxy.
  • Limit access to Prometheus and exporters via firewalls or VPN.

๐Ÿ“š Bonus Tools

Tool Use Case
PromLens Visual builder for PromQL queries
Thanos Long-term storage for Prometheus
VictoriaMetrics Alternative TSDB for Prometheus
Pushgateway For ephemeral jobs (e.g., cronjobs)
kube-prometheus Prometheus stack for Kubernetes

๐Ÿ”š Final Thoughts

  • Prometheus is perfect for cloud-native, Kubernetes-based, or microservice systems.
  • It is lightweight, scalable, and powerful with PromQL and Grafana integration.
  • Combined with Alertmanager, it enables automated alerting and observability.