[ Lab 6.1 ] Nagios Hosts, Host Groups, and Services - smitja21/group-a-oe2 GitHub Wiki

[!NOTE] #57: Nagios Hosts, Host Groups, and Services

Part 1: Complete the Host Inventory

Task 1: Verify your current host state

Checking what nagios_host resources are already in config.pp

What has Puppet already written to the hosts file

How many hosts does Nagios currently know about

How many hosts are currently defined? List their names: Which hosts are missing from Nagios monitoring?

There are 3 hosts - localhost, mgmt-a.oe2.org.nz, db-a.oe2.org.nz

We are missing, backup-a and app-a

Task 2: Add all four hosts to config.pp

Look at the four nagios_host blocks above. Every one is identical except for the hostname and alias. In software engineering this violates the DRY (Don’t Repeat Yourself) principle. What problems could arise if you had 20 hosts and needed to change notification_interval from 30 to 15?

It would be hard to maintain. A lot of donkey work and repeated code to have to edit. You would have to edit that value 20 times.

Task 3: Refactor with a Puppet defined type

refactored config.pp

defined resource type in monitored_host.pp

Validated .pp files changed and applied puppet conifg

all 4 hosts appear

Validate Nagios understands the config

  1. Document the output of sudo nagios4 -v ...| grep “hosts defined”. How many hosts does Nagios now know about?

"Checked 5 hosts."

  1. If you needed to change notification_interval to 15 minutes for all hosts, how many files do you edit with the defined type approach vs the original four-block approach?

Would modify the class "monitored_host.pp" only having to change one line when before we had to change a line for each resource.

Part 2: Host Groups

Task 4: Define host groups in config.pp

Created a host group for ssh checking and removed the old ssh check.

Applying config

Verifying the host groups file was created

Validate the config

Document the contents of ppt_hostgroups.cfg. Does the Nagios-generated syntax match what you specified in Puppet? What is the difference between the Puppet DSL representation and the Nagios configuration file syntax?

sudo cat /etc/nagios4/conf.d/ppt_hostgroups.cfg

HEADER: This file was autogenerated at 2026-03-30 20:24:55 +0000

HEADER: by puppet. While it can still be managed manually, it

HEADER: is definitely not recommended.

define hostgroup { alias All SSH Servers hostgroup_name my-ssh-servers members db-a.oe2.org.nz,app-a.oe2.org.nz,backup-a.oe2.org.nz,mgmt-a.oe2.org.nz }

define hostgroup { alias Database Servers hostgroup_name my-db-servers members db-a.oe2.org.nz }

define hostgroup { alias Management Servers hostgroup_name my-mgmt-servers members mgmt-a.oe2.org.nz }

Puppet:

--- Host Groups ---

Group all hosts that run SSH (db, app, backup, and mgmt)

members must exactly match the nagios_host resource titles

nagios_hostgroup { 'my-ssh-servers': target => '/etc/nagios4/conf.d/ppt_hostgroups.cfg', alias => 'All SSH Servers', members => 'db-a.oe2.org.nz,app-a.oe2.org.nz,backup-a.oe2.org.nz,mgmt-a.oe2.org.nz', }

It is very similar, but it defines the hostgroup slightly different.

Part 3: Service Checks

Task 5: Pre-test all check plugins before declaring them in Puppet

Testing SSH

http test on mgmt-a

testing mariadb-specific plugin

For each SSH check, report the output. What does a successful result look like (exit code 0)? Did the MariaDB TCP check (check_tcp -p 3306) succeed? group-a@mgmt-a:~$ sudo -u nagios /usr/lib/nagios/plugins/check_ssh db-a.oe2.org.nz SSH OK - OpenSSH_9.6p1 Ubuntu-3ubuntu13.14 (protocol 2.0) | time=0.014827s;;;0.000000;10.000000

Task 6: Define the SSH service check

Appling ssh check

Task 7: Define the MariaDB service check

Creating user nagios can authenticate with

verify user was created

Had to change the bind-address to 0.0.0.0 to allow the connection from the other server

group-a@mgmt-a:/opt/puppetlabs/puppet/modules/mariadb/files/etc/mysql/mariadb.conf.d$ sudo nano 50-server.cnf

Pulling changes from mgmt

Now works

Firewall rule was applied to allow connections from the mgmt server. Will need to be updated in future if other servers need access to db-a

Paste the output of the manual check_mysql command. Does it show “DB OK”? If it fails, what error does it produce?

group-a@mgmt-a:$ sudo -u nagios /usr/lib/nagios/plugins/check_mysql -H db-a.oe2.org.nz -u nagios -p 'NagiosMonitor1' Uptime: 516 Threads: 1 Questions: 66 Slow queries: 0 Opens: 33 Open tables: 26 Queries per second avg: 0.127|Connections=35c;;; Open_files=58;;; Open_tables=26;;; Qcache_free_memory=16759592;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=27c;;; Qcache_queries_in_cache=0;;; Queries=66c;;; Questions=66c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=516c;;; group-a@mgmt-a:$

If the check fails with “Access denied”, what does this tell you? How would you diagnose whether the issue is network, authentication, or the plugin itself?

To see if it's a network issue we could ping the db server ip and port to see if we can access it.

Authentication check we could verify user exists and login mariadb -h db-a.oe2.org.nz -u nagios -p

Checking if it's a plugin issue, we could verify the service is running correctly and the command was entered in correctly.

Step 3: Add the Nagios credentials file

create credentials file for nagios user

Step 4: Declare the MariaDB service in Puppet

Task 8: Define the HTTP/HTTPS service check on mgmt-x

validate all manifests and apply

Verify all three service definitions were written

sudo nagios4 -v /etc/nagios4/nagios.cfg 2>&1 | tail -5

  1. Paste the last 5 lines of sudo nagios4 -v .... Does it say “Things look okay”? How many services are defined?

group-a@mgmt-a:/$ sudo nagios4 -v /etc/nagios4/nagios.cfg 2>&1 | tail -5

Total Warnings: 4 Total Errors: 0

Things do look okay, those warnings are only notifcation related. There are 13 services defined.

  1. Run cat /etc/nagios4/conf.d/ppt_services.cfg and paste the output. How does Nagios represent a service applied to a host group vs one applied to a specific host?

group-a@mgmt-a:/$ sudo cat /etc/nagios4/conf.d/ppt_services.cfg

HEADER: This file was autogenerated at 2026-03-30 22:12:04 +0000

HEADER: by puppet. While it can still be managed manually, it

HEADER: is definitely not recommended.

define service { ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) ssh-check check_command check_ssh check_interval 5 check_period 24x7 contact_groups admins hostgroup_name my-ssh-servers max_check_attempts 3 notification_interval 30 notification_options w,u,c,r notification_period 24x7 retry_interval 1 service_description SSH }

define service { ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) mariadb-check check_command check_mysql_cmdlinecred!nagios!NagiosMonitor1 check_interval 5 check_period 24x7 contact_groups admins hostgroup_name my-db-servers max_check_attempts 3 notification_interval 30 notification_options w,u,c,r notification_period 24x7 retry_interval 1 service_description MariaDB }

define service { ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) https-nagios check_command check_http! -S 1.2+ --sni -u /nagios4/ -e "HTTP/1.1 401" check_interval 5 check_period 24x7 contact_groups admins hostgroup_name my-mgmt-servers max_check_attempts 3 notification_interval 30 notification_options w,u,c,r notification_period 24x7 retry_interval 1 service_description HTTPS Nagios Interface }

A service applied to a host runs only on that specific host, while a service applied to a host group runs on all hosts in that group.

Part 4: Interpret the Nagios Dashboard

Task 9: Navigate the web interface and interpret host and service states

  1. List all four hosts and their current status (PENDING/UP/DOWN/UNREACHABLE). Take a screenshot of the host list.
Name Status
app-a.oe2.org.nz UP
backup-a.oe2.org.nz UP
db-a.oe2.org.nz UP
mgmt-a.oe2.org.nz UP
  1. Navigate to Services → Service Detail. How many service instances does the SSH check create? (Remember: it is applied to a host group with multiple members.) Take a screenshot.

It created 4 SSH instances for each host.

  1. What status does the MariaDB check show? If it shows CRITICAL or UNKNOWN, paste the output from the “Status Information” column — this tells you exactly why it failed. How does this compare to your manual check_mysql test?
  1. What does the HTTPS check show? Is the Nagios web interface correctly returning HTTP 401? What would a status of CRITICAL on this check tell an on-call engineer?

We see status 200 showing that our check for status code 401 is passing. A critical would tell an on-call engineer there is an issue with the Nagios web interface.