Ticket #264: Add Remaining NRPE‐Based System Health Checks to Nagios Website ( Part 2 check_swap, check_procs, check_zombies ) - SupaHotBall/OE2-Group-D GitHub Wiki

Task

Define the following nagios_service resources in Puppet

Swap Usage (check_nrpe!check_swap)
Total Processes (check_nrpe!check_total_procs)
Zombie Processes (check_nrpe!check_zombie_procs)

Assign these services to the remote-disks hostgroup so they apply to all client nodes Ensure all definitions follow the standard format (including max_check_attempts, notification_period, etc.). Apply the Puppet changes on the management server.

Restart Nagios to activate the new service checks.

Verify that the services appear and are running for all target hosts in the Nagios web interface.

Steps Taken

Added remaining nagios health checks to nagios config.pp

nagios_service { "check-swap-nrpe":
  service_description     => "Swap Usage",
  hostgroup_name          => "remote-disks",
  target                  => "/etc/nagios4/conf.d/services.cfg",
  check_command           => "check_nrpe!check_swap",
  max_check_attempts      => 3,
  retry_check_interval    => 1,
  normal_check_interval   => 5,
  check_period            => "24x7",
  notification_interval   => 30,
  notification_period     => "24x7",
  notification_options    => "w,u,c",
  contact_groups          => "admins",
  mode                    => "0444",
}
 
nagios_service { "check-procs-nrpe":
  service_description     => "Total Processes",
  hostgroup_name          => "remote-disks",
  target                  => "/etc/nagios4/conf.d/services.cfg",
  check_command           => "check_nrpe!check_total_procs",
  max_check_attempts      => 3,
  retry_check_interval    => 1,
  normal_check_interval   => 5,
  check_period            => "24x7",
  notification_interval   => 30,
  notification_period     => "24x7",
  notification_options    => "w,u,c",
  contact_groups          => "admins",
  mode                    => "0444",
}
 
nagios_service { "check-zombies-nrpe":
  service_description     => "Zombie Processes",
  hostgroup_name          => "remote-disks",
  target                  => "/etc/nagios4/conf.d/services.cfg",
  check_command           => "check_nrpe!check_zombie_procs",
  max_check_attempts      => 3,
  retry_check_interval    => 1,
  normal_check_interval   => 5,
  check_period            => "24x7",
  notification_interval   => 30,
  notification_period     => "24x7",
  notification_options    => "w,u,c",
  contact_groups          => "admins",
  mode                    => "0444",
}

Applying the puppet modules

Do sudo puppet agent --test in the mgmt server first
Restart the nagios4 service in the mgmt server

Do sudo puppet agent --test again in App, DB and backup server

Challenges

External Resources

Ticket Reference

https://rt.dataraster.com/Ticket/Display.html?id=264