Ticket #251: Add Additional NRPE Monitoring Checks for All Servers - SupaHotBall/OE2-Group-D GitHub Wiki

Task

Extend the NRPE setup to include additional health and performance checks across all monitored servers: db-d, apps-d, and backup-d. The goal is to ensure full visibility of each server’s state via Nagios by adding and verifying a wider range of NRPE checks.

Success Criteria:

  • All 3 servers are reporting the following via NRPE to Nagios:
  • Disk space (check_sdb1)
  • Logged-in users (check_users)
  • Load average (check_load)
  • Swap usage (check_swap)
  • Total processes (check_procs)
  • Zombie processes (check_zombie_procs)
  • Any critical or warning thresholds are correctly flagged in the Nagios UI.
  • All NRPE config and service definitions are consistently managed using Puppet.

Steps Taken

Include commands for the additional checks that we would like to have. In our configuration file, swap is missing from the list so I will define that command

command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 20% -c 10%

image

Add a nagios service configuration for each monitoring service in the nagios config.pp file

nagios_service { "check-users-nrpe":
  use                 => "generic-service",
  hostgroup_name      => "remote-disks",
  service_description => "Logged-in Users",
  check_command       => "check_nrpe!check_users",
  target              => "/etc/nagios4/conf.d/services.cfg",
  mode                => "0444",
}

nagios_service { "check-load-nrpe":
  use                 => "generic-service",
  hostgroup_name      => "remote-disks",
  service_description => "System Load",
  check_command       => "check_nrpe!check_load",
  target              => "/etc/nagios4/conf.d/services.cfg",
  mode                => "0444",
}

nagios_service { "check-swap-nrpe":
  use                 => "generic-service",
  hostgroup_name      => "remote-disks",
  service_description => "Swap Usage",
  check_command       => "check_nrpe!check_swap",
  target              => "/etc/nagios4/conf.d/services.cfg",
  mode                => "0444",
}

nagios_service { "check-procs-nrpe":
  use                 => "generic-service",
  hostgroup_name      => "remote-disks",
  service_description => "Total Processes",
  check_command       => "check_nrpe!check_total_procs",
  target              => "/etc/nagios4/conf.d/services.cfg",
  mode                => "0444",
}

nagios_service { "check-zombies-nrpe":
  use                 => "generic-service",
  hostgroup_name      => "remote-disks",
  service_description => "Zombie Processes",
  check_command       => "check_nrpe!check_zombie_procs",
  target              => "/etc/nagios4/conf.d/services.cfg",
  mode                => "0444",
}

image

Restart both the Nagios server and the nagios-nrpe-server. Restarting Nagios is required because the nagios config.pp file was edited and restarting nrpe server is also required because the nagios cfg file was edited

sudo systemctl restart nagios4 sudo systemctl restart nagios-nrpe-server

Once that is done, ensure that the module is applied to the client server

sudo puppet agent --test

image

After the server has applied the module configuration, test that the swap command now works from the management server

/usr/lib/nagios/plugins/check_nrpe -H db-d -c check_swap

image

The configuration changes have been successfully applied because previously neither the check_procs or check_swap commands were defined.

image

Now we can see that the commands are working after the editing the config.pp and nagios cfg files

image


Challenges

N/A


External Resources

N/A


Ticket Reference

https://rt.dataraster.com/Ticket/Display.html?id=251