Ticket ID#346: System Hardening & Patch Management Implementation - GriffinKat/group-a GitHub Wiki
SSH Log Monitoring via Nagios (Puppet-Based)
Summary
As part of the system-wide security hardening tasks, I was responsible for implementing log-based SSH brute-force detection using Nagios monitoring on all assigned servers. The goal was to detect repeated failed SSH login attempts by parsing /var/log/auth.log
and alerting through Nagios.
This was automated using Puppet to:
-
Distribute the monitoring script across all hosts
-
Register the NRPE command
-
Configure Nagios service checks
-
Grant necessary permissions for log access
Implementation Steps
Puppet Module to Distribute Script
Created a custom Puppet module ssh_log_monitoring with the following structure:
init.pp
Manifest
This ensures the plugin script is copied to the correct Nagios plugins path with appropriate ownership and permissions:
class ssh_log_monitoring {
# Ensure the plugin script is deployed to the right location
file { '/usr/lib/nagios/plugins/check_ssh_failed_logins.sh':
ensure => file,
mode => '0755',
owner => 'nagios',
group => 'nagios',
source => 'puppet:///modules/ssh_log_monitoring/check_ssh_failed_logins.sh',
}
}
check_ssh_failed_logins.sh
Script (Monitors /var/log/auth.log)
#!/bin/bash
# ==========================
# SSH Failed Login Monitor
# Monitors /var/log/auth.log for failed SSH login attempts in the last 5 minutes.
# Exits with:
# 0 (OK) → No failed logins
# 1 (WARNING) → Between 1 and MAX_ATTEMPTS-1 failed logins
# 2 (CRITICAL) → MAX_ATTEMPTS or more failed logins
# ==========================
# Maximum number of failed SSH login attempts allowed in 5 minutes before alerting CRITICAL
MAX_ATTEMPTS=4
# Count how many failed SSH logins occurred in the last 5 minutes
# Explanation:
# - date generates a timestamp for 5 minutes ago
# - awk filters lines newer than that timestamp AND containing "sshd" and "Failed password"
# - wc -l counts those matching lines
COUNT=$(sudo cat /var/log/auth.log | grep "sshd" | grep "Failed password" | awk -v d="$(date --date='5 minutes ago' '+%b %e %H:%M')" '$0 > d' | wc -l)
# Decision logic for Nagios-style exit codes
if [ "$COUNT" -ge "$MAX_ATTEMPTS" ]; then
# Too many failed attempts → CRITICAL
echo "CRITICAL - $COUNT failed SSH logins in last 5 minutes"
exit 2
elif [ "$COUNT" -ge 1 ]; then
# Some failed attempts → WARNING
echo "WARNING - $COUNT failed SSH logins in last 5 minutes"
exit 1
else
# No failed attempts → OK
echo "OK - No failed SSH logins in last 5 minutes"
exit 0
fi
How the script works:
The check_ssh_failed_logins.sh
script scans /var/log/auth.log
for failed SSH login attempts within the last 5 minutes. It uses grep
and awk
to filter log entries matching "sshd" and "Failed password" that are newer than the 5-minute cutoff. If the number of failed attempts exceeds a defined threshold (default: 4), it returns a CRITICAL status to Nagios; fewer attempts return WARNING or OK based on severity. The script is designed to work with Nagios exit codes for alerting and requires sudo access to read the log file.
Register NRPE Command
Modified the NRPE config on each remote host via the nrpe
Puppet module:
command[check_ssh_failed_logins]=/usr/lib/nagios/plugins/check_ssh_failed_logins.sh
Define Nagios Command for Local Host
To monitor the check_ssh_failed_logins.sh
script directly on the Nagios server (localhost), I added a new command definition in commands.cfg
:
define command {
command_name check_ssh_failed_logins
command_line /usr/lib/nagios/plugins/check_ssh_failed_logins.sh
}
Define Nagios Service Checks
In the nagios
Puppet module, I added two nagios_service resources to monitor:
-
Remote hosts (via NRPE)
-
Localhost (direct script execution)
# SSH alerts for failed logins on remote-hosts
nagios_service { "ssh-failed-login-alerts-remote":
service_description => "SSH Failed Login Alerts",
hostgroup_name => "remote-disks",
check_command => "check_nrpe!check_ssh_failed_logins",
max_check_attempts => 3,
retry_interval => 0.5,
check_interval => 1,
check_period => "24x7",
notification_interval => 30,
notification_period => "24x7",
notification_options => "w,u,c,r",
contact_groups => "admins,slackgroup",
target => "/etc/nagios4/conf.d/ppt_services.cfg",
mode => "0644",
}
# SSH alerts for failed logins on localhost(mgmt-a)
nagios_service { "ssh-failed-login-alerts-local":
service_description => "SSH Failed Login Alerts",
host_name => "localhost",
check_command => "check_ssh_failed_logins",
max_check_attempts => 3,
retry_interval => 0.5,
check_interval => 1,
check_period => "24x7",
notification_interval => 30,
notification_period => "24x7",
notification_options => "w,u,c,r",
contact_groups => "admins,slackgroup",
target => "/etc/nagios4/conf.d/ppt_services.cfg",
mode => "0644",
}
Update Sudo Permissions via Puppet
Used the sudo
Puppet module to edit /etc/sudoers, allowing the nagios
user to run the script and read the log file without a password:
nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugin/check_ssh_failed_logins.sh, /usr/bin/cat /var/log/auth.log
This allows secure, non-interactive execution of the script by the Nagios agent.
nagios
User to adm
Group
Add To allow nagios
read access to /var/log/auth.log
:
sudo usermod -aG adm nagios
Group modifications applied after reboot on all hosts.
Apply Puppet Modules
All changes were applied using:
- On the localhost(mgmt-a)
sudo /opt/puppetlabs/bin/puppet agent --test
- On the remote hosts (apps-a, db-a, backup-a)
sudo /opt/puppetlabs/puppet/bin/puppet agent --server=mgmt-a --no-daemonize --verbose --onetime
Validation
- Simulated Failed Login Attempts:
Repeatedly attempted SSH logins using invalid credentials on each monitored host.
This populated /var/log/auth.log with multiple Failed password entries from sshd
.
-
Verified Using Command Line (From Nagios Server)
-
Ran the following command from the Nagios server:
/usr/lib/nagios/plugins/check_nrpe -H apps-a -c check_ssh_failed_logins /usr/lib/nagios/plugins/check_nrpe -H db-a -c check_ssh_failed_logins /usr/lib/nagios/plugins/check_nrpe -H backup-a -c check_ssh_failed_logins
-
Also ran the script directly for
localhost
:/usr/lib/nagios/plugins/check_ssh_failed_logins.sh
-
-
Output reflected the number of failed attempts, with appropriate Nagios-style status:
- Verified Using Nagios Web UI