Plugin Descriptions - exasol/nagios-monitoring GitHub Wiki

Plugin Overview

Plugin Script Purpose
check_logservice.py converts logservice entries to nagios events
check_db_diskspace.py checks if there is enough space left for the DB instance
check_db_performance.py checks if connection to DB instance is possible (via ODBC) and provides performance data
check_backup.py checks if a valid backup newer than a week is available
check_nodes.py checks the number of nodes
check_services.py checks if the cluster services are running

You can find those plugins here: https://github.com/EXASOL/nagios-monitoring/tree/master/opt/exasol/monitoring

Logservice Plugin (check_logservice.py)

The logservice plugin converts log messages with the priorities "Warning" and "Error" into nagios events. If there is an entry logged with priority "Warning" a warning event will be created, log lines with "Error" priority will be converted to critical events. You need to setup a logservice which defines all necessary and interessting informations for you first, so you can choose for instance if you want to get error messages from the "Load" service or not.

If this plugin finds a critical entry it will show a short message about checking the log service output. It will also provide the full output of the log lines as nagios "long text lines" (see https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/pluginapi.html, Plugin Output Spec).

The plugin is capable to accumulate events so you will only get one nagios event when a bunch of EXASOL logservice events are fired. The priority of the nagios event will be the highest priority from the logservice lines.

If you want use this plugin in nagios like monitoring systems we recommend the following parameters:

        service_description     Log Service
        max_check_attempts      1
        normal_check_interval   10
        retry_check_interval    5
        flap_detection_enabled  0

Keep in mind that your monitoring system may shows that the logservice state is okay (again) and you were not able to see the change. This is because if the check is retrying and there are no further log messages, it will turn back to OK again. Be sure that you set up your notification system correctly to receieve those informations.

Blacklist File

A blacklist file can be used to filter out all kinds of events which are may not interessting to you. The filter is quite simple: the filter is running case sensitive, it will directly match so you cannot set wildcards or things like that and you can only define a filter rule per line.

Plugin Parameters

/opt/exasol/monitoring/check_logservice.py -h

EXAoperation XMLRPC log service monitor (version 17.08)
  Options:
    -h                      shows this help
    -V                      shows the plugin version
    -H <license server>     domain of IP of your license server
    -i <logservice id>      interger id of the used logservice
    -u <user login>         EXAoperation login user
    -p <password>           EXAoperation login password
    -b <blacklist file>     Blacklist all unwanted logservice lines

Disk Space Check Plugin (check_db_diskspace.py)

This plugin check the remaining disk space which is available for the given database instance. You can find more information about to calculate the free space behind the following link: https://www.exasol.com/support/browse/SOL-366

The plugin also will throw an error if the database instance is not available or started.

Plugin Parameters

/opt/exasol/monitoring/check_db_diskspace.py  -h

EXAoperation XMLRPC database disk usage monitor (version 16.09)
  Options:
    -h                      shows this help
    -V                      shows the plugin version
    -H <license server>     domain of IP of your license server
    -d <db instance>        the name of your DB instance
    -u <user login>         EXAoperation login user
    -p <password>           EXAoperation login password
    -w <0..100>             warning treshold for disk image usage of you db instance (optional)
    -c <0..100>             critical treshold for disk usage of your db instance (optional)

DB Performance Metering (check_db_performance.py)

This plugin checks if the given database instance is connectable and will provide some performance data:

  • system load
  • cpu load
  • temporary DB ram usage
  • HDD I/O
  • network I/O
  • swap usage
  • number of users
  • number of running queries
  • number of transaction conflicts
  • hightest duration of transaction conflicts (creates a warning if duration > 3600 seconds)

Please note that you have to set up the EXASOL ODBC driver and the pyodbc package for Python if you want to use this plugin in your on environment. The driver path is currently hard coded and looks like this:

odbcDriver              = '/opt/exasol/EXASOL_ODBC-6.0.4/lib/linux/x86_64/libexaodbc-uo2214lv2.so'

Plugin Parameters

/opt/exasol/monitoring/check_db_performance.py -h

EXAoperation XMLRPC backup run check (version 16.12)
  Options:
    -h                      shows this help
    -V                      shows the plugin version
    -H <license server>     domain of IP of your license server
    -d <db instance>        the name of your DB instance
    -u <user login>         EXAoperation login user
    -p <password>           EXAoperation login password
    -l <dbuser login>       DB instance login user
    -a <dbuser passwd>      DB instance login password

Backup Plugin (check_backup.py)

The backup plugin only checks if there is a fresh backup available. Currently the amount of days is hardcoded to a week but it is easy to change. You will get a nagios critical event if the backup is outdated (older than 7 days) or your backup is not restorable anymore (e.g. level 1 backup without level 0 base). You will not get informed about failed backups by this plugin! If your backup fails you will get a message from your log service for this.

Plugin Parameters

/opt/exasol/monitoring/check_backup.py -h

EXAoperation XMLRPC database disk usage monitor (version 02.06)
  Options:
    -h                      shows this help
    -V                      shows the plugin version
    -H <license server>     domain of IP of your license server
    -d <db instance>        the name of your DB instance
    -u <user login>         EXAoperation login user
    -p <password>           EXAoperation login password

Available Cluster Nodes (check_nodes.py)

Simple plugin to check the number of nodes in your cluster. This check will get critical if one or more nodes are missing.

Available Cluster Services (check_services.py)

Simple plugin that checks if all services are running or not. If somethings went wrong on your EXASOL cluster you will get a critical alert from this plugin.

⚠️ **GitHub.com Fallback** ⚠️