monitoring scratch servers - raeker/ARC-Wiki-Test GitHub Wiki
The scratch boxes, scr-oss[0-3]-10g, aren't under configuration management (i.e Ansible)
Sensu RPMs
Process Overview:
- Download the client
- Install the client
- Configure the client * Create client.json * Enable sensu-client service * Fix Ruby gems source * Install the requisite plugins
- Create the check-oss-mounts.sh file
- Get the version of the RPM we're using
- At the time of this writing, we're using version 0.26.5 of the Sensu client
- One can use wget on nyxb as its public facing and also can scp the resultant rpm to the aforementioned scratch servers/hosts:
[root@nyxb ~]# cd /tmp [root@nyxb tmp]# wget https://sensu.global.ssl.fastly.net/yum/6/x86_64/sensu-0.26.5-2.el6.x86_64.rpm [root@nyxb tmp]# scp sensu-0.26.5-2.el6.x86_64.rpm root@scr-oss0-10g:/root
- In the previous snippet we used scp to copy our RPM to our box
- Now we can install the rpm on the box
- As a precaution (since these boxes are unique) let's see if we have
any depencies that might cause us issues using yum deplist
<pathToRpm>:
- At the time of this writing, there shouldn't be any dependencies that the system can't meet
root@scr-oss0 ~]# yum deplist sensu-0.26.5-2.el6.x86_64.rpm Loaded plugins: fastestmirror, security, versionlock Finding dependencies: Loading mirror speeds from cached hostfile package: sensu.x86_64 1:0.26.5-2 dependency: /bin/sh Unsatisfied dependency
- Do a dry-run style install using yum localinstall --assumeno
<pathToRpm> to be safe:
- The --assumeno option means "assume the answer to this request is no" where the question, with yum localinstall, is "are you sure you wish to install this package?"
[root@scr-oss0 ~]# yum localinstall --assumeno sensu-0.26.5-2.el6.x86_64.rpm Loaded plugins: fastestmirror, security, versionlock Setting up Local Package Process Examining sensu-0.26.5-2.el6.x86_64.rpm: 1:sensu-0.26.5-2.x86_64 Marking sensu-0.26.5-2.el6.x86_64.rpm to be installed Loading mirror speeds from cached hostfile Resolving Dependencies --> Running transaction check ---> Package sensu.x86_64 1:0.26.5-2 will be installed --> Finished Dependency Resolution Dependencies Resolved
Install 1 Package(s) Total size: 93 M Installed size: 93 M Exiting on user Command Your transaction was saved, rerun it with: yum load-transaction /tmp/yum_save_tx-2017-04-10-13-120jaoZW.yumtx
- We see that the dry-run install is as benign as we believed it to be
- So it's safe to install this package:
[root@scr-oss0 ~]# yum localinstall sensu-0.26.5-2.el6.x86_64.rpm -y
- Now that we have the client installed, we'll need to configure it to talk with the Sensu server.
- This is done by way of the client.json file which lives in /etc/sensu/conf.d:
[root@scr-oss0 ~]# cd /etc/sensu/conf.d/ [root@scr-oss0 conf.d]# touch client.json
- Now we need to put actual content in the file:
[root@scr-oss0 conf.d]# vim client.json ... { "client": { "subscriptions": [ "scratchServers" ], "keepalive": { "handlers": [ "slack" ], "slack": { "channels": [ "#sensu-critical", "#sensu-notify" ] }, "thresholds": { "warning": 120, "critical": 180 } } }, "rabbitmq": { "host": "flux-admin02.arc-ts.umich.edu", "port": 5672, "vhost": "/sensu", "user": "sensu", "password": "sensu" }, "transport": { "name": "rabbitmq", "host": "flux-admin02.arc-ts.umich.edu", "reconnect_on_error": true } }
- The basics of the above file (which also should be covered in the
"General Overview" section of this documentation):
- Define the client itself and its subscriptions
- NOTE: subscription definitions (i.e. which checks are applied to subscribers) are defined on the sensu server
- Define the transport layer to use (in our case, RabbitMQ)
- Define the handlers for keepalives (because we want to be
notified ASAP when a box of this caliber goes down)
- Define the channels the handler will alert to
- Define the client itself and its subscriptions
- Start sensu-client and tail -f /var/log/sensu/sensu-client.log to
see if we're talking to the server
- It's OK if we're missing plugins; that step will come later
[root@scr-oss0 conf.d]# service sensu-client start Starting sensu-client [ OK ]
- Now that we have our definitions in place, and our client runnint, we need to enable sensu-client to run at reboot (and check our work):
[root@scr-oss0 conf.d]# chkconfig sensu-client on [root@scr-oss0 conf.d]# chkconfig --list|grep sensu-client sensu-client 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Because Sensu uses embedded ruby and it defaults to rubygems.org as its
de-facto server for installing gems, we'll need to remedy this. We will
remove the current source and then replace it with our own proxy:
[root@scr-oss0 ~]# cd /opt/sensu/embedded/bin/ [root@scr-oss0
bin]# ./gem source --list *** CURRENT SOURCES ***
https://rubygems.org/ [root@scr-oss0
bin]# ./gem source --remove
https://rubygems.org/>
https://rubygems.org/> removed from sources
[root@scr-oss0 bin]# ./gem source --add
https://registry.arc-ts.umich.edu/repository/rubygems/
- By default we need plugins/checks that the client can execute at the behest of the server
- Normally, our configuration management would do this but we have to do it by hand
- Our base batch of plugins are:
sensu-plugins-network-checks sensu-plugins-disk-checks sensu-plugins-filesystem-checks sensu-plugins-cpu-checks sensu-plugins-memory-checks sensu-plugins-process-checks sensu-plugins-uptime-checks sensu-plugins-load-checks
- Install the plugins using the embedded ruby gem binary:
[root@scr-oss0 ~]# cd /opt/sensu/embedded/bin/ [root@scr-oss0 bin]# for i in sensu-plugins-network-checks sensu-plugins-disk-checks sensu-plugins-filesystem-checks sensu-plugins-cpu-checks sensu-plugins-memory-checks sensu-plugins-process-checks sensu-plugins-uptime-checks sensu-plugins-load-checks; do ./gem install $i; done
- You can, of course, do this one-by-one, example:
[root@scr-oss0 ~]# cd /opt/sensu/embedded/bin/ [root@scr-oss0 bin]# ./gem install sensu-plugins-network-checks
- Check /var/log/sensu/sensu-client.log to see if checks are now being seen
- Can also check in Uchiwa
Being that these machines aren't under configuration management, we will
need to create some scripts that Sensu will rely on.
All scripts must reside in /opt/sensu/embedded/bin on the node on which
it will run and must be 755 viz. permissions. If the permissions aren't
set Sensu will note that it can't run the script (in Uchiwa, or in the
logs)
Script content:
#!/bin/bash OUTPUT=$(cat /proc/mounts |grep /lustre/scratch|wc -l) if
[ "${OUTPUT}" -eq "15" ]; then echo "Number of instances of
/lustre/scratch in /proc/mounts is at threshold" exit 0 # indicate ok
to sensu because we believe 15 to be the fixed number of mounts elif [
"${OUTPUT}" -lt "15" ]; then echo "Number of instances of
/lustre/scratch in /proc/mounts is below threshold" exit 2 # indicate
critical to sensu because anything less than fixed number could indicate
issues elif [ "${OUTPUT}" -gt "15" ]; then echo "Number of instances
of /lustre/scratch in /proc/mounts is above threshold" exit 1 #
indicate warning to sensu; warn because anything more than 15 could
indicate something changed we need to know about else echo "Unknown"
exit 3 # indicate unknown to sensu fi