Issues for NRPE setup - KeegMitch/Operations-Engineering-group-c GitHub Wiki

Problems we're facing

For this lab

  1. Getting puppet to generate nrpe.cfg in the DB server

image

image

/usr/lib/nagios/plugins/check_nrpe -H mgmt-c -c check_sda1 /usr/lib/nagios/plugins/check_nrpe -H db-c -c check_sda1

image

Update: turns out we're meant to have the nagios-nrpe-plugin installed on the mgmt server (not the db) and on the nagios module not the nrpe module on puppet!

image

  1. As it stand our nagios3 service isn't running properly because of nagios-plugins config errors:

image

Fixed: I fixed error #2 in the problems section, I got nagios3 running again, turns out when i comment out the hostgroups and services for the nrpe lab (did this to fix other errors) it was still in the ppt_hostgroups.cfg and ppt_services.cfg file so i edited those files and just removed the remote disk service and the associated hostgroup.

image

image

  1. Currently we have our Remote-Disks hostgroup showing up on the web server with just db-c for now:

image

Issue here is that when we apply the remote disk service, not only it doesn't apply immediately but when restarting nagios the whole thing crashes with errors. The issue is likely something to do with our remotedisk service in config.pp but at this stage we don't know what's causing it to crash.

We can manually run the check_nrpe command with the warning, however doesn't show up on our web server:

image

Mgmt-c is showing the critical warning on the dashboard, I think this is because it can not run the check_sda1 on itself

image

image

Attempted solutions

This is regarding the connection error

iptables firewall

Double check that connections are allowed in port 5666

sudo iptables -L -n

image

Adding the rule from db server:

sudo iptables -I INPUT -p tcp --dport 5666 -s 10.2.0.6 -j ACCEPT

sudo iptables-save

Check:

image

Restart the nagios-nrpe-server

Tried the ufw firewall but that didn't work either

image

The reason it wasn't working is because the nrpe service was not installed on the mgmt server, had to update the site.pp to include nrpe under mgmt-c and remove the ensure nagios user and group from the nrpe install.pp because it caused a conflict with nagios install.pp

image

image