Issues for Nagios Notifications - KeegMitch/Operations-Engineering-group-c GitHub Wiki
Referring to Lab 7.1
Problems we're facing
- Nagios ppt_contactgroups and ppt_contacts not consistent with permissions which breaks the server
Solution: Added permissions to those files that are being created via the nagios puppet module
- Another issue we had is that the manual command notification works but not the nagios one, turns out in the nagios puppet config you have to add contact_groups as "slackgroup", and the root disk check service contact group to 'slackgroup'
nagios_host { "db-c":
target => "/etc/nagios3/conf.d/ppt_hosts.cfg",
alias => "db",
check_period => "24x7",
max_check_attempts => 3,
check_command => "check-host-alive",
notification_interval => 30,
notification_period => "24x7",
notification_options => "d,u,r",
mode => "0444",
contact_groups => 'slackgroup',
}
In the services on I changed the contact_group from "admins" to "slackgroup"
nagios_service {"root_disk_check":
service_description => "Root Disk Space",
hostgroup_name => "Remote-Disks",
target => "/etc/nagios3/conf.d/ppt_services.cfg",
check_command => "check_nrpe!check_sda1",
max_check_attempts => 3,
retry_check_interval => 1,
normal_check_interval => 5,
check_period => "24x7",
notification_interval => 30,
notification_period => "24x7",
notification_options => "w,u,c",
contact_groups => "slackgroup",
mode => "0444",
}
- The command for the sda check isn't changing back when applying the puppet agent after the testing the root disk check warning
-
Tried editing it on db server but it keeps changing back to 92% and 90%
-
Tried deleting the file and reapplying the changes in puppet but that still didn't work
- Connection issues with mgmt server with port 5666, the nrpe port, and port 8140 (the puppet port)
Referring to this concern, hence the null information on mgmt-c's disk using: /usr/lib/nagios/plugins/check_nrpe -H mgmt-c -c check_sda1
but works on the other 3 servers:
Potentially traffic blocking 8140 with the way the security rules are set up on Azure?
Done some more research and it seems to be pointing to either changing an existing rule that's blocking access or creating a new rule to allow access to the ports 5666 and 8140. So far haven't found a way to create or change the rules form the mgmt-c command line without being able to access the azure portal.