Plugin Descriptions - exasol/nagios-monitoring GitHub Wiki
Plugin Script | Purpose |
---|---|
check_logservice.py | converts logservice entries to nagios events |
check_db_diskspace.py | checks if there is enough space left for the DB instance |
check_db_performance.py | checks if connection to DB instance is possible (via ODBC) and provides performance data |
check_backup.py | checks if a valid backup newer than a week is available |
check_nodes.py | checks the number of nodes |
check_services.py | checks if the cluster services are running |
You can find those plugins here: https://github.com/EXASOL/nagios-monitoring/tree/master/opt/exasol/monitoring
The logservice plugin converts log messages with the priorities "Warning" and "Error" into nagios events. If there is an entry logged with priority "Warning" a warning event will be created, log lines with "Error" priority will be converted to critical events. You need to setup a logservice which defines all necessary and interessting informations for you first, so you can choose for instance if you want to get error messages from the "Load" service or not.
If this plugin finds a critical entry it will show a short message about checking the log service output. It will also provide the full output of the log lines as nagios "long text lines" (see https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/pluginapi.html, Plugin Output Spec).
The plugin is capable to accumulate events so you will only get one nagios event when a bunch of EXASOL logservice events are fired. The priority of the nagios event will be the highest priority from the logservice lines.
If you want use this plugin in nagios like monitoring systems we recommend the following parameters:
service_description Log Service
max_check_attempts 1
normal_check_interval 10
retry_check_interval 5
flap_detection_enabled 0
Keep in mind that your monitoring system may shows that the logservice state is okay (again) and you were not able to see the change. This is because if the check is retrying and there are no further log messages, it will turn back to OK again. Be sure that you set up your notification system correctly to receieve those informations.
A blacklist file can be used to filter out all kinds of events which are may not interessting to you. The filter is quite simple: the filter is running case sensitive, it will directly match so you cannot set wildcards or things like that and you can only define a filter rule per line.
/opt/exasol/monitoring/check_logservice.py -h
EXAoperation XMLRPC log service monitor (version 17.08)
Options:
-h shows this help
-V shows the plugin version
-H <license server> domain of IP of your license server
-i <logservice id> interger id of the used logservice
-u <user login> EXAoperation login user
-p <password> EXAoperation login password
-b <blacklist file> Blacklist all unwanted logservice lines
This plugin check the remaining disk space which is available for the given database instance. You can find more information about to calculate the free space behind the following link: https://www.exasol.com/support/browse/SOL-366
The plugin also will throw an error if the database instance is not available or started.
/opt/exasol/monitoring/check_db_diskspace.py -h
EXAoperation XMLRPC database disk usage monitor (version 16.09)
Options:
-h shows this help
-V shows the plugin version
-H <license server> domain of IP of your license server
-d <db instance> the name of your DB instance
-u <user login> EXAoperation login user
-p <password> EXAoperation login password
-w <0..100> warning treshold for disk image usage of you db instance (optional)
-c <0..100> critical treshold for disk usage of your db instance (optional)
This plugin checks if the given database instance is connectable and will provide some performance data:
- system load
- cpu load
- temporary DB ram usage
- HDD I/O
- network I/O
- swap usage
- number of users
- number of running queries
- number of transaction conflicts
- hightest duration of transaction conflicts (creates a warning if duration > 3600 seconds)
Please note that you have to set up the EXASOL ODBC driver and the pyodbc package for Python if you want to use this plugin in your on environment. The driver path is currently hard coded and looks like this:
odbcDriver = '/opt/exasol/EXASOL_ODBC-6.0.4/lib/linux/x86_64/libexaodbc-uo2214lv2.so'
/opt/exasol/monitoring/check_db_performance.py -h
EXAoperation XMLRPC backup run check (version 16.12)
Options:
-h shows this help
-V shows the plugin version
-H <license server> domain of IP of your license server
-d <db instance> the name of your DB instance
-u <user login> EXAoperation login user
-p <password> EXAoperation login password
-l <dbuser login> DB instance login user
-a <dbuser passwd> DB instance login password
The backup plugin only checks if there is a fresh backup available. Currently the amount of days is hardcoded to a week but it is easy to change. You will get a nagios critical event if the backup is outdated (older than 7 days) or your backup is not restorable anymore (e.g. level 1 backup without level 0 base). You will not get informed about failed backups by this plugin! If your backup fails you will get a message from your log service for this.
/opt/exasol/monitoring/check_backup.py -h
EXAoperation XMLRPC database disk usage monitor (version 02.06)
Options:
-h shows this help
-V shows the plugin version
-H <license server> domain of IP of your license server
-d <db instance> the name of your DB instance
-u <user login> EXAoperation login user
-p <password> EXAoperation login password
Simple plugin to check the number of nodes in your cluster. This check will get critical if one or more nodes are missing.
Simple plugin that checks if all services are running or not. If somethings went wrong on your EXASOL cluster you will get a critical alert from this plugin.