[ Lab 6.1 ] Nagios Hosts, Host Groups, and Services - smitja21/group-a-oe2 GitHub Wiki
Part 1: Complete the Host Inventory
Task 1: Verify your current host state
Checking what nagios_host resources are already in config.pp
What has Puppet already written to the hosts file
How many hosts does Nagios currently know about
How many hosts are currently defined? List their names: Which hosts are missing from Nagios monitoring?
There are 3 hosts - localhost, mgmt-a.oe2.org.nz, db-a.oe2.org.nz
We are missing, backup-a and app-a
Task 2: Add all four hosts to config.pp
Look at the four nagios_host blocks above. Every one is identical except for the hostname and alias. In software engineering this violates the DRY (Don’t Repeat Yourself) principle. What problems could arise if you had 20 hosts and needed to change notification_interval from 30 to 15?
It would be hard to maintain. A lot of donkey work and repeated code to have to edit. You would have to edit that value 20 times.
Task 3: Refactor with a Puppet defined type
refactored config.pp
defined resource type in monitored_host.pp
Validated .pp files changed and applied puppet conifg
all 4 hosts appear
Validate Nagios understands the config
- Document the output of sudo nagios4 -v ...| grep “hosts defined”. How many hosts does Nagios now know about?
"Checked 5 hosts."
- If you needed to change notification_interval to 15 minutes for all hosts, how many files do you edit with the defined type approach vs the original four-block approach?
Would modify the class "monitored_host.pp" only having to change one line when before we had to change a line for each resource.
Part 2: Host Groups
Task 4: Define host groups in config.pp
Created a host group for ssh checking and removed the old ssh check.
Applying config
Verifying the host groups file was created
Validate the config
Document the contents of ppt_hostgroups.cfg. Does the Nagios-generated syntax match what you specified in Puppet? What is the difference between the Puppet DSL representation and the Nagios configuration file syntax?
sudo cat /etc/nagios4/conf.d/ppt_hostgroups.cfg
HEADER: This file was autogenerated at 2026-03-30 20:24:55 +0000
HEADER: by puppet. While it can still be managed manually, it
HEADER: is definitely not recommended.
define hostgroup { alias All SSH Servers hostgroup_name my-ssh-servers members db-a.oe2.org.nz,app-a.oe2.org.nz,backup-a.oe2.org.nz,mgmt-a.oe2.org.nz }
define hostgroup { alias Database Servers hostgroup_name my-db-servers members db-a.oe2.org.nz }
define hostgroup { alias Management Servers hostgroup_name my-mgmt-servers members mgmt-a.oe2.org.nz }
Puppet:
--- Host Groups ---
Group all hosts that run SSH (db, app, backup, and mgmt)
members must exactly match the nagios_host resource titles
nagios_hostgroup { 'my-ssh-servers': target => '/etc/nagios4/conf.d/ppt_hostgroups.cfg', alias => 'All SSH Servers', members => 'db-a.oe2.org.nz,app-a.oe2.org.nz,backup-a.oe2.org.nz,mgmt-a.oe2.org.nz', }
It is very similar, but it defines the hostgroup slightly different.
Part 3: Service Checks
Task 5: Pre-test all check plugins before declaring them in Puppet
Testing SSH
http test on mgmt-a
testing mariadb-specific plugin
For each SSH check, report the output. What does a successful result look like (exit code 0)? Did the MariaDB TCP check (check_tcp -p 3306) succeed? group-a@mgmt-a:~$ sudo -u nagios /usr/lib/nagios/plugins/check_ssh db-a.oe2.org.nz SSH OK - OpenSSH_9.6p1 Ubuntu-3ubuntu13.14 (protocol 2.0) | time=0.014827s;;;0.000000;10.000000
Task 6: Define the SSH service check
Appling ssh check
Task 7: Define the MariaDB service check
Creating user nagios can authenticate with
verify user was created
Had to change the bind-address to 0.0.0.0 to allow the connection from the other server
group-a@mgmt-a:/opt/puppetlabs/puppet/modules/mariadb/files/etc/mysql/mariadb.conf.d$ sudo nano 50-server.cnf
Pulling changes from mgmt
Now works
Firewall rule was applied to allow connections from the mgmt server. Will need to be updated in future if other servers need access to db-a
Paste the output of the manual check_mysql command. Does it show “DB OK”? If it fails, what error does it produce?
group-a@mgmt-a:$ sudo -u nagios /usr/lib/nagios/plugins/check_mysql -H db-a.oe2.org.nz -u nagios -p 'NagiosMonitor1'
Uptime: 516 Threads: 1 Questions: 66 Slow queries: 0 Opens: 33 Open tables: 26 Queries per second avg: 0.127|Connections=35c;;; Open_files=58;;; Open_tables=26;;; Qcache_free_memory=16759592;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=27c;;; Qcache_queries_in_cache=0;;; Queries=66c;;; Questions=66c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=516c;;;
group-a@mgmt-a:$
If the check fails with “Access denied”, what does this tell you? How would you diagnose whether the issue is network, authentication, or the plugin itself?
To see if it's a network issue we could ping the db server ip and port to see if we can access it.
Authentication check we could verify user exists and login mariadb -h db-a.oe2.org.nz -u nagios -p
Checking if it's a plugin issue, we could verify the service is running correctly and the command was entered in correctly.
Step 3: Add the Nagios credentials file
create credentials file for nagios user
Step 4: Declare the MariaDB service in Puppet
Task 8: Define the HTTP/HTTPS service check on mgmt-x
validate all manifests and apply
Verify all three service definitions were written
sudo nagios4 -v /etc/nagios4/nagios.cfg 2>&1 | tail -5
- Paste the last 5 lines of sudo nagios4 -v .... Does it say “Things look okay”? How many services are defined?
group-a@mgmt-a:/$ sudo nagios4 -v /etc/nagios4/nagios.cfg 2>&1 | tail -5
Total Warnings: 4 Total Errors: 0
Things do look okay, those warnings are only notifcation related. There are 13 services defined.
- Run cat /etc/nagios4/conf.d/ppt_services.cfg and paste the output. How does Nagios represent a service applied to a host group vs one applied to a specific host?
group-a@mgmt-a:/$ sudo cat /etc/nagios4/conf.d/ppt_services.cfg
HEADER: This file was autogenerated at 2026-03-30 22:12:04 +0000
HEADER: by puppet. While it can still be managed manually, it
HEADER: is definitely not recommended.
define service { ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) ssh-check check_command check_ssh check_interval 5 check_period 24x7 contact_groups admins hostgroup_name my-ssh-servers max_check_attempts 3 notification_interval 30 notification_options w,u,c,r notification_period 24x7 retry_interval 1 service_description SSH }
define service { ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) mariadb-check check_command check_mysql_cmdlinecred!nagios!NagiosMonitor1 check_interval 5 check_period 24x7 contact_groups admins hostgroup_name my-db-servers max_check_attempts 3 notification_interval 30 notification_options w,u,c,r notification_period 24x7 retry_interval 1 service_description MariaDB }
define service { ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) https-nagios check_command check_http! -S 1.2+ --sni -u /nagios4/ -e "HTTP/1.1 401" check_interval 5 check_period 24x7 contact_groups admins hostgroup_name my-mgmt-servers max_check_attempts 3 notification_interval 30 notification_options w,u,c,r notification_period 24x7 retry_interval 1 service_description HTTPS Nagios Interface }
A service applied to a host runs only on that specific host, while a service applied to a host group runs on all hosts in that group.
Part 4: Interpret the Nagios Dashboard
Task 9: Navigate the web interface and interpret host and service states
- List all four hosts and their current status (PENDING/UP/DOWN/UNREACHABLE). Take a screenshot of the host list.
| Name | Status |
|---|---|
| app-a.oe2.org.nz | UP |
| backup-a.oe2.org.nz | UP |
| db-a.oe2.org.nz | UP |
| mgmt-a.oe2.org.nz | UP |
- Navigate to Services → Service Detail. How many service instances does the SSH check create? (Remember: it is applied to a host group with multiple members.) Take a screenshot.
It created 4 SSH instances for each host.
- What status does the MariaDB check show? If it shows CRITICAL or UNKNOWN, paste the output from the “Status Information” column — this tells you exactly why it failed. How does this compare to your manual check_mysql test?
- What does the HTTPS check show? Is the Nagios web interface correctly returning HTTP 401? What would a status of CRITICAL on this check tell an on-call engineer?
We see status 200 showing that our check for status code 401 is passing. A critical would tell an on-call engineer there is an issue with the Nagios web interface.