Setup Prometheus - nimbo3/Keenbo GitHub Wiki

Installing Prometheus Server

Use wget to download the latest build of the Prometheus server and time-series database from GitHub. You can download it from this link.

$ wget https://github.com/prometheus/prometheus/releases/download/v2.11.1/prometheus-2.11.1.linux-amd64.tar.gz

Use tar to extract prometheus-2.11.1.linux-amd64.tar.gz.

$ tar -xvzf prometheus-2.11.1.linux-amd64.tar.gz

This completes the installation of Prometheus server. Verify the installation by typing in:

$ ./prometheus --version

You should see the following message on your screen:

Output:
prometheus, version 2.11.1 (branch: HEAD, revision: e5b22494857deca4b806f74f6e3a6ee30c251763)
  build user:       root@d94406f2bb6f
  build date:       20190710-13:51:17
  go version:       go1.12.7

Installing Node Exporter

Use wget to download the latest build of Node Exporter which is available on GitHub. You can download it from this link.

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz

You can now use the tar command to extract node_exporter-0.11.0.linux-amd64.tar.gz.

$ tar -xvzf node_exporter-0.11.0.linux-amd64.tar.gz

Running Node Exporter

Start Node Exporter with this command:

$ ./node_exporter

After Node Exporter starts, use a browser to view its web interface available at http://your_server_ip:9100/metrics. You should see a page with a lot of text:

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.00023853100000000002
go_gc_duration_seconds{quantile="0.25"} 0.00023998700000000002
go_gc_duration_seconds{quantile="0.5"} 0.00028122
. . .

Starting Prometheus Server

Before you start Prometheus, you must first create a configuration file for it called prometheus.yml.

$ nano ~/Prometheus/server/prometheus.yml

Copy the following code into the file.

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'node'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9100', 'slave-1:9100', 'slave-2:9100', 'slave-3:9100']
  - job_name: 'crawler'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9101', 'slave-1:9101', 'slave-2:9101', 'slave-3:9101']
  - job_name: 'collector'
    scrape_interval: "15s"
    static_configs:   
    - targets: ['master:9107', 'slave-1:9107', 'slave-2:9107', 'slave-3:9107']
  - job_name: 'hbase'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9500']
  - job_name: 'kafka'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9308']
  - job_name: 'zookeeper'
    scrape_interval: "15s"
    static_configs:
    - targets: ['slave-1:9103']
  - job_name: 'redis'
    scrape_interval: "15s"
    static_configs:
    - targets: ['slave-1:9104', 'slave-2:9104', 'slave-3:9104']
  - job_name: 'elasticsearch'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9105']
  - job_name: 'hadoop'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9106']
  - job_name: 'page-collector'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9107']
  - job_name: 'shuffler'
    scrape_interval: "15s"
    static_configs:
    - targets: ['master:9108', 'slave-1:9108', 'slave-2:9108', 'slave-3:9108']

You could name your job anything you want, but calling it "node" allows you to use the default console templates of Node Exporter.

Save the file and exit.

Start the Prometheus server with this command:

$ ./prometheus > prometheus.log 2>&1

Note that you redirected the output of the Prometheus server to a file called prometheus.log. You can view the last few lines of the file using the tail command:

$ tail ./prometheus.log

Once the server is ready, you will see the following messages in the file:

INFO[0000] Starting target manager...         file=targetmanager.go line=75
INFO[0000] Listening on :9090                 file=web.go line=118

Use a browser to visit Prometheus's homepage available at http://your_server_ip:9090.

To make sure that Prometheus is scraping data from Node Exporter, click on the Graph tab at the top of the page. On the page that opens, type in the name of a metric (like node_procs_running, for example) in the text field that says Expression. Then, press the blue Execute button. Click Graph (next to Console) just below, and you should see a graph for that metric:

Prometheus has console templates that let you view graphs of a few commonly used metrics. These console template are accessible only if you set the value of job_name to node in Prometheus's configuration.

Visit http://your_server_ip:9090/consoles/node.html to access the Node Console and click on your server, localhost:9100, to view its metrics:

AlertManager

run:

rm alertmanager-0.18.0.linux-amd64.tar.gz 
tar xf alertmanager-0.18.0.linux-amd64.tar.gz 
wget https://github.com/prometheus/alertmanager/releases/download/v0.18.0/alertmanager-0.18.0.linux-amd64.tar.gz

config:

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:4567/alert'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

edit configs of prometheus:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093 # add alertmanager to /etc/hosts or use localhost :)
rule_files:
  - "rules/*.yml"

edit rules/HbaseDeadRegion.yml :

groups:
- name: HBase Dead Region
  rules:
  - alert: HBaseDeadRegion
    expr: Hadoop_HBase_numDeadRegionServers == 0
    for: 1m

and run killall -HUP prometheus to reload prometheus configs.

⚠️ **GitHub.com Fallback** ⚠️