Ticket #251: Add Additional NRPE Monitoring Checks for All Servers - SupaHotBall/OE2-Group-D GitHub Wiki
Task
Extend the NRPE setup to include additional health and performance checks across all monitored servers: db-d, apps-d, and backup-d. The goal is to ensure full visibility of each server’s state via Nagios by adding and verifying a wider range of NRPE checks.
Success Criteria:
- All 3 servers are reporting the following via NRPE to Nagios:
- Disk space (check_sdb1)
- Logged-in users (check_users)
- Load average (check_load)
- Swap usage (check_swap)
- Total processes (check_procs)
- Zombie processes (check_zombie_procs)
- Any critical or warning thresholds are correctly flagged in the Nagios UI.
- All NRPE config and service definitions are consistently managed using Puppet.
Steps Taken
Include commands for the additional checks that we would like to have. In our configuration file, swap is missing from the list so I will define that command
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 20% -c 10%
Add a nagios service configuration for each monitoring service in the nagios config.pp file
nagios_service { "check-users-nrpe":
use => "generic-service",
hostgroup_name => "remote-disks",
service_description => "Logged-in Users",
check_command => "check_nrpe!check_users",
target => "/etc/nagios4/conf.d/services.cfg",
mode => "0444",
}
nagios_service { "check-load-nrpe":
use => "generic-service",
hostgroup_name => "remote-disks",
service_description => "System Load",
check_command => "check_nrpe!check_load",
target => "/etc/nagios4/conf.d/services.cfg",
mode => "0444",
}
nagios_service { "check-swap-nrpe":
use => "generic-service",
hostgroup_name => "remote-disks",
service_description => "Swap Usage",
check_command => "check_nrpe!check_swap",
target => "/etc/nagios4/conf.d/services.cfg",
mode => "0444",
}
nagios_service { "check-procs-nrpe":
use => "generic-service",
hostgroup_name => "remote-disks",
service_description => "Total Processes",
check_command => "check_nrpe!check_total_procs",
target => "/etc/nagios4/conf.d/services.cfg",
mode => "0444",
}
nagios_service { "check-zombies-nrpe":
use => "generic-service",
hostgroup_name => "remote-disks",
service_description => "Zombie Processes",
check_command => "check_nrpe!check_zombie_procs",
target => "/etc/nagios4/conf.d/services.cfg",
mode => "0444",
}
Restart both the Nagios server and the nagios-nrpe-server. Restarting Nagios is required because the nagios config.pp file was edited and restarting nrpe server is also required because the nagios cfg file was edited
sudo systemctl restart nagios4
sudo systemctl restart nagios-nrpe-server
Once that is done, ensure that the module is applied to the client server
sudo puppet agent --test
After the server has applied the module configuration, test that the swap command now works from the management server
/usr/lib/nagios/plugins/check_nrpe -H db-d -c check_swap
The configuration changes have been successfully applied because previously neither the check_procs or check_swap commands were defined.
Now we can see that the commands are working after the editing the config.pp and nagios cfg files
Challenges
N/A
External Resources
N/A
Ticket Reference
https://rt.dataraster.com/Ticket/Display.html?id=251