Ticket ID#346: System Hardening & Patch Management Implementation - GriffinKat/group-a GitHub Wiki

SSH Log Monitoring via Nagios (Puppet-Based)

Summary

As part of the system-wide security hardening tasks, I was responsible for implementing log-based SSH brute-force detection using Nagios monitoring on all assigned servers. The goal was to detect repeated failed SSH login attempts by parsing /var/log/auth.log and alerting through Nagios.

This was automated using Puppet to:

  • Distribute the monitoring script across all hosts

  • Register the NRPE command

  • Configure Nagios service checks

  • Grant necessary permissions for log access


Implementation Steps

Puppet Module to Distribute Script

Created a custom Puppet module ssh_log_monitoring with the following structure:

image

init.pp Manifest

This ensures the plugin script is copied to the correct Nagios plugins path with appropriate ownership and permissions:

class ssh_log_monitoring {

  # Ensure the plugin script is deployed to the right location
  file { '/usr/lib/nagios/plugins/check_ssh_failed_logins.sh':
    ensure  => file,
    mode    => '0755',
    owner   => 'nagios',
    group   => 'nagios',
    source  => 'puppet:///modules/ssh_log_monitoring/check_ssh_failed_logins.sh',
  }
}

image

check_ssh_failed_logins.sh Script (Monitors /var/log/auth.log)

#!/bin/bash

# ==========================
# SSH Failed Login Monitor
# Monitors /var/log/auth.log for failed SSH login attempts in the last 5 minutes.
# Exits with:
#   0 (OK)       → No failed logins
#   1 (WARNING)  → Between 1 and MAX_ATTEMPTS-1 failed logins
#   2 (CRITICAL) → MAX_ATTEMPTS or more failed logins
# ==========================

# Maximum number of failed SSH login attempts allowed in 5 minutes before alerting CRITICAL
MAX_ATTEMPTS=4

# Count how many failed SSH logins occurred in the last 5 minutes
# Explanation:
# - date generates a timestamp for 5 minutes ago
# - awk filters lines newer than that timestamp AND containing "sshd" and "Failed password"
# - wc -l counts those matching lines
COUNT=$(sudo cat /var/log/auth.log | grep "sshd" | grep "Failed password" | awk -v d="$(date --date='5 minutes ago' '+%b %e %H:%M')" '$0 > d' | wc -l)

# Decision logic for Nagios-style exit codes
if [ "$COUNT" -ge "$MAX_ATTEMPTS" ]; then
    # Too many failed attempts → CRITICAL
    echo "CRITICAL - $COUNT failed SSH logins in last 5 minutes"
    exit 2
elif [ "$COUNT" -ge 1 ]; then
    # Some failed attempts → WARNING
    echo "WARNING - $COUNT failed SSH logins in last 5 minutes"
    exit 1
else
    # No failed attempts → OK
    echo "OK - No failed SSH logins in last 5 minutes"
    exit 0
fi

image

How the script works:

The check_ssh_failed_logins.sh script scans /var/log/auth.log for failed SSH login attempts within the last 5 minutes. It uses grep and awk to filter log entries matching "sshd" and "Failed password" that are newer than the 5-minute cutoff. If the number of failed attempts exceeds a defined threshold (default: 4), it returns a CRITICAL status to Nagios; fewer attempts return WARNING or OK based on severity. The script is designed to work with Nagios exit codes for alerting and requires sudo access to read the log file.


Register NRPE Command

Modified the NRPE config on each remote host via the nrpe Puppet module:

command[check_ssh_failed_logins]=/usr/lib/nagios/plugins/check_ssh_failed_logins.sh

image


Define Nagios Command for Local Host

To monitor the check_ssh_failed_logins.sh script directly on the Nagios server (localhost), I added a new command definition in commands.cfg:

define command {
    command_name    check_ssh_failed_logins
    command_line    /usr/lib/nagios/plugins/check_ssh_failed_logins.sh
}

image


Define Nagios Service Checks

In the nagios Puppet module, I added two nagios_service resources to monitor:

  • Remote hosts (via NRPE)

  • Localhost (direct script execution)

  # SSH alerts for failed logins on remote-hosts
  nagios_service { "ssh-failed-login-alerts-remote":
    service_description     => "SSH Failed Login Alerts",
    hostgroup_name          => "remote-disks",
    check_command           => "check_nrpe!check_ssh_failed_logins",
    max_check_attempts      => 3,
    retry_interval          => 0.5,
    check_interval          => 1,
    check_period            => "24x7",
    notification_interval   => 30,
    notification_period     => "24x7",
    notification_options    => "w,u,c,r",
    contact_groups          => "admins,slackgroup",
    target                  => "/etc/nagios4/conf.d/ppt_services.cfg",
    mode                    => "0644",
  }

    # SSH alerts for failed logins on localhost(mgmt-a)
  nagios_service { "ssh-failed-login-alerts-local":
    service_description     => "SSH Failed Login Alerts",
    host_name               => "localhost",
    check_command           => "check_ssh_failed_logins",
    max_check_attempts      => 3,
    retry_interval          => 0.5,
    check_interval          => 1,
    check_period            => "24x7",
    notification_interval   => 30,
    notification_period     => "24x7",
    notification_options    => "w,u,c,r",
    contact_groups          => "admins,slackgroup",
    target                  => "/etc/nagios4/conf.d/ppt_services.cfg",
    mode                    => "0644",
  }

image


Update Sudo Permissions via Puppet

Used the sudo Puppet module to edit /etc/sudoers, allowing the nagios user to run the script and read the log file without a password:

nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugin/check_ssh_failed_logins.sh, /usr/bin/cat /var/log/auth.log

This allows secure, non-interactive execution of the script by the Nagios agent.


Add nagios User to adm Group

To allow nagios read access to /var/log/auth.log:

sudo usermod -aG adm nagios

Group modifications applied after reboot on all hosts.


Apply Puppet Modules

All changes were applied using:

  • On the localhost(mgmt-a)
sudo /opt/puppetlabs/bin/puppet agent --test
  • On the remote hosts (apps-a, db-a, backup-a)
sudo /opt/puppetlabs/puppet/bin/puppet agent --server=mgmt-a --no-daemonize --verbose --onetime

image

image


Validation

  • Simulated Failed Login Attempts:

Repeatedly attempted SSH logins using invalid credentials on each monitored host.

This populated /var/log/auth.log with multiple Failed password entries from sshd.

image

  • Verified Using Command Line (From Nagios Server)

    • Ran the following command from the Nagios server:

      /usr/lib/nagios/plugins/check_nrpe -H apps-a -c check_ssh_failed_logins
      /usr/lib/nagios/plugins/check_nrpe -H db-a -c check_ssh_failed_logins
      /usr/lib/nagios/plugins/check_nrpe -H backup-a -c check_ssh_failed_logins
      
    • Also ran the script directly for localhost:

      /usr/lib/nagios/plugins/check_ssh_failed_logins.sh
      
  • Output reflected the number of failed attempts, with appropriate Nagios-style status:

image

  • Verified Using Nagios Web UI

image


Sub-Ticket for related tasks- https://github.com/GriffinKat/group-a/wiki/Sub%E2%80%90Ticket-ID-%23351-%E2%80%90-%23346:-System-Hardening-&-Patch-Management-Implementation

Ticket Reference- https://rt.dataraster.com/Ticket/Display.html?id=346