monitoring scratch servers - raeker/ARC-Wiki-Test GitHub Wiki

Advanced Research Computing : Monitoring Scratch Servers

The scratch boxes, scr-oss[0-3]-10g, aren't under configuration management (i.e Ansible)

Installing Sensu client

Sensu RPMs
Process Overview:

  1. Download the client
  2. Install the client
  3. Configure the client * Create client.json * Enable sensu-client service * Fix Ruby gems source * Install the requisite plugins
  4. Create the check-oss-mounts.sh file

1. Download the client

  • Get the version of the RPM we're using
    • At the time of this writing, we're using version 0.26.5 of the Sensu client
    • One can use wget on nyxb as its public facing and also can scp the resultant rpm to the aforementioned scratch servers/hosts:

[root@nyxb ~]# cd /tmp [root@nyxb tmp]# wget https://sensu.global.ssl.fastly.net/yum/6/x86_64/sensu-0.26.5-2.el6.x86_64.rpm [root@nyxb tmp]# scp sensu-0.26.5-2.el6.x86_64.rpm root@scr-oss0-10g:/root

2. Install the client

  • In the previous snippet we used scp to copy our RPM to our box
  • Now we can install the rpm on the box
  • As a precaution (since these boxes are unique) let's see if we have any depencies that might cause us issues using yum deplist <pathToRpm>:
    • At the time of this writing, there shouldn't be any dependencies that the system can't meet

root@scr-oss0 ~]# yum deplist sensu-0.26.5-2.el6.x86_64.rpm Loaded plugins: fastestmirror, security, versionlock Finding dependencies: Loading mirror speeds from cached hostfile package: sensu.x86_64 1:0.26.5-2 dependency: /bin/sh Unsatisfied dependency

  • Do a dry-run style install using yum localinstall --assumeno <pathToRpm> to be safe:
    • The --assumeno option means "assume the answer to this request is no" where the question, with yum localinstall, is "are you sure you wish to install this package?"

[root@scr-oss0 ~]# yum localinstall --assumeno sensu-0.26.5-2.el6.x86_64.rpm Loaded plugins: fastestmirror, security, versionlock Setting up Local Package Process Examining sensu-0.26.5-2.el6.x86_64.rpm: 1:sensu-0.26.5-2.x86_64 Marking sensu-0.26.5-2.el6.x86_64.rpm to be installed Loading mirror speeds from cached hostfile Resolving Dependencies --> Running transaction check ---> Package sensu.x86_64 1:0.26.5-2 will be installed --> Finished Dependency Resolution Dependencies Resolved

Package Arch Version Repository Size

Installing: sensu x86_64 1:0.26.5-2 /sensu-0.26.5-2.el6.x86_64 93 M Transaction Summary

Install 1 Package(s) Total size: 93 M Installed size: 93 M Exiting on user Command Your transaction was saved, rerun it with: yum load-transaction /tmp/yum_save_tx-2017-04-10-13-120jaoZW.yumtx

  • We see that the dry-run install is as benign as we believed it to be
  • So it's safe to install this package:

[root@scr-oss0 ~]# yum localinstall sensu-0.26.5-2.el6.x86_64.rpm -y

3. Configure the client

Create the client.json

  • Now that we have the client installed, we'll need to configure it to talk with the Sensu server.
  • This is done by way of the client.json file which lives in /etc/sensu/conf.d:

[root@scr-oss0 ~]# cd /etc/sensu/conf.d/ [root@scr-oss0 conf.d]# touch client.json

  • Now we need to put actual content in the file:

[root@scr-oss0 conf.d]# vim client.json ... { "client": { "subscriptions": [ "scratchServers" ], "keepalive": { "handlers": [ "slack" ], "slack": { "channels": [ "#sensu-critical", "#sensu-notify" ] }, "thresholds": { "warning": 120, "critical": 180 } } }, "rabbitmq": { "host": "flux-admin02.arc-ts.umich.edu", "port": 5672, "vhost": "/sensu", "user": "sensu", "password": "sensu" }, "transport": { "name": "rabbitmq", "host": "flux-admin02.arc-ts.umich.edu", "reconnect_on_error": true } }

  • The basics of the above file (which also should be covered in the "General Overview" section of this documentation):
    • Define the client itself and its subscriptions
      • NOTE: subscription definitions (i.e. which checks are applied to subscribers) are defined on the sensu server
    • Define the transport layer to use (in our case, RabbitMQ)
    • Define the handlers for keepalives (because we want to be notified ASAP when a box of this caliber goes down)
      • Define the channels the handler will alert to

Enable sensu-client service

  • Start sensu-client and tail -f /var/log/sensu/sensu-client.log to see if we're talking to the server
    • It's OK if we're missing plugins; that step will come later

[root@scr-oss0 conf.d]# service sensu-client start Starting sensu-client [ OK ]

  • Now that we have our definitions in place, and our client runnint, we need to enable sensu-client to run at reboot (and check our work):

[root@scr-oss0 conf.d]# chkconfig sensu-client on [root@scr-oss0 conf.d]# chkconfig --list|grep sensu-client sensu-client 0:off 1:off 2:on 3:on 4:on 5:on 6:off

Fix Ruby gems source

Because Sensu uses embedded ruby and it defaults to rubygems.org as its de-facto server for installing gems, we'll need to remedy this. We will remove the current source and then replace it with our own proxy:
[root@scr-oss0 ~]# cd /opt/sensu/embedded/bin/ [root@scr-oss0 bin]# ./gem source --list *** CURRENT SOURCES *** https://rubygems.org/ [root@scr-oss0 bin]# ./gem source --remove https://rubygems.org/> https://rubygems.org/> removed from sources [root@scr-oss0 bin]# ./gem source --add https://registry.arc-ts.umich.edu/repository/rubygems/

Install the requisite plugins

  • By default we need plugins/checks that the client can execute at the behest of the server
  • Normally, our configuration management would do this but we have to do it by hand
  • Our base batch of plugins are:

sensu-plugins-network-checks sensu-plugins-disk-checks sensu-plugins-filesystem-checks sensu-plugins-cpu-checks sensu-plugins-memory-checks sensu-plugins-process-checks sensu-plugins-uptime-checks sensu-plugins-load-checks

  • Install the plugins using the embedded ruby gem binary:

[root@scr-oss0 ~]# cd /opt/sensu/embedded/bin/ [root@scr-oss0 bin]# for i in sensu-plugins-network-checks sensu-plugins-disk-checks sensu-plugins-filesystem-checks sensu-plugins-cpu-checks sensu-plugins-memory-checks sensu-plugins-process-checks sensu-plugins-uptime-checks sensu-plugins-load-checks; do ./gem install $i; done

  • You can, of course, do this one-by-one, example:

[root@scr-oss0 ~]# cd /opt/sensu/embedded/bin/ [root@scr-oss0 bin]# ./gem install sensu-plugins-network-checks

  • Check /var/log/sensu/sensu-client.log to see if checks are now being seen
  • Can also check in Uchiwa

4. Create the check-oss-mounts.sh file

Being that these machines aren't under configuration management, we will need to create some scripts that Sensu will rely on.
All scripts must reside in /opt/sensu/embedded/bin on the node on which it will run and must be 755 viz. permissions. If the permissions aren't set Sensu will note that it can't run the script (in Uchiwa, or in the logs)
Script content:
#!/bin/bash OUTPUT=$(cat /proc/mounts |grep /lustre/scratch|wc -l) if [ "${OUTPUT}" -eq "15" ]; then echo "Number of instances of /lustre/scratch in /proc/mounts is at threshold" exit 0 # indicate ok to sensu because we believe 15 to be the fixed number of mounts elif [ "${OUTPUT}" -lt "15" ]; then echo "Number of instances of /lustre/scratch in /proc/mounts is below threshold" exit 2 # indicate critical to sensu because anything less than fixed number could indicate issues elif [ "${OUTPUT}" -gt "15" ]; then echo "Number of instances of /lustre/scratch in /proc/mounts is above threshold" exit 1 # indicate warning to sensu; warn because anything more than 15 could indicate something changed we need to know about else echo "Unknown" exit 3 # indicate unknown to sensu fi

⚠️ **GitHub.com Fallback** ⚠️