Ticket #264: Add Remaining NRPE‐Based System Health Checks to Nagios Website ( Part 2 check_swap, check_procs, check_zombies ) - SupaHotBall/OE2-Group-D GitHub Wiki
Task
Define the following nagios_service resources in Puppet
- Swap Usage (check_nrpe!check_swap)
- Total Processes (check_nrpe!check_total_procs)
- Zombie Processes (check_nrpe!check_zombie_procs)
Assign these services to the remote-disks hostgroup so they apply to all client nodes Ensure all definitions follow the standard format (including max_check_attempts, notification_period, etc.). Apply the Puppet changes on the management server.
Restart Nagios to activate the new service checks.
Verify that the services appear and are running for all target hosts in the Nagios web interface.
Steps Taken
Added remaining nagios health checks to nagios config.pp
nagios_service { "check-swap-nrpe":
service_description => "Swap Usage",
hostgroup_name => "remote-disks",
target => "/etc/nagios4/conf.d/services.cfg",
check_command => "check_nrpe!check_swap",
max_check_attempts => 3,
retry_check_interval => 1,
normal_check_interval => 5,
check_period => "24x7",
notification_interval => 30,
notification_period => "24x7",
notification_options => "w,u,c",
contact_groups => "admins",
mode => "0444",
}
nagios_service { "check-procs-nrpe":
service_description => "Total Processes",
hostgroup_name => "remote-disks",
target => "/etc/nagios4/conf.d/services.cfg",
check_command => "check_nrpe!check_total_procs",
max_check_attempts => 3,
retry_check_interval => 1,
normal_check_interval => 5,
check_period => "24x7",
notification_interval => 30,
notification_period => "24x7",
notification_options => "w,u,c",
contact_groups => "admins",
mode => "0444",
}
nagios_service { "check-zombies-nrpe":
service_description => "Zombie Processes",
hostgroup_name => "remote-disks",
target => "/etc/nagios4/conf.d/services.cfg",
check_command => "check_nrpe!check_zombie_procs",
max_check_attempts => 3,
retry_check_interval => 1,
normal_check_interval => 5,
check_period => "24x7",
notification_interval => 30,
notification_period => "24x7",
notification_options => "w,u,c",
contact_groups => "admins",
mode => "0444",
}
Applying the puppet modules
- Do
sudo puppet agent --test
in the mgmt server first - Restart the nagios4 service in the mgmt server
- Do
sudo puppet agent --test
again in App, DB and backup server
Challenges
External Resources
Ticket Reference
https://rt.dataraster.com/Ticket/Display.html?id=264