Ticket #929: Create a Nagios Check to Monitor SSH Logs - KeegMitch/Operations-Engineering-group-c GitHub Wiki

Ticket: #929

Create script for SSH failed logins on all 4 servers

  1. Do this command to create the same script for all 4 servers: sudo vim /usr/lib/nagios/plugins/check_ssh_logins.sh

  2. Add the following inside the scripts:

#!/bin/bash

# Define variables
LOG_FILE="/var/log/auth.log"
THRESHOLD=5
TIME_PERIOD="10 minutes"

# Get the current time and time period ago in seconds since epoch
CURRENT_TIME=$(date +%s)
TIME_PERIOD_AGO=$(date --date="$TIME_PERIOD ago" +%s)

# Count failed login attempts in the specified time period
FAILED_ATTEMPTS=$(sudo awk -v current_time="$CURRENT_TIME" -v time_period_ago="$TIME_PERIOD_AGO" '
BEGIN { count=0 }
/Failed password for/ {
    # Extract the timestamp from the log
    split($1" "$2" "$3, timestamp, " ");
    log_time=mktime(gensub(/[:-]/, " ", "g", timestamp[1]) " " timestamp[2] " " timestamp[3] " " timestamp[4] " " timestamp[5] " " timestamp[6]);

    # Count only if the log time is within the specified period
    if (log_time >= time_period_ago && log_time <= current_time) {
        count++;
    }
}
END {
    print count
}' "$LOG_FILE")

# Debug output (optional)
echo "Current Time: $(date -d @$CURRENT_TIME)"
echo "Time Period Ago: $(date -d @$TIME_PERIOD_AGO)"
echo "Failed Attempts: $FAILED_ATTEMPTS"

# Check if failed attempts exceed the threshold or if there are no failed attempts
if [ ! "$FAILED_ATTEMPTS" =~ ^[0-9]+$ ](/KeegMitch/Operations-Engineering-group-c/wiki/-!-"$FAILED_ATTEMPTS"-=~-^[0-9]+$-); then
    echo "UNKNOWN: Unable to determine failed login attempts"
    exit 3
elif [ "$FAILED_ATTEMPTS" -ge "$THRESHOLD" ]; then
    echo "CRITICAL: $FAILED_ATTEMPTS failed login attempts in the last $TIME_PERIOD"
    exit 2
elif [ "$FAILED_ATTEMPTS" -eq 0 ]; then
    echo "OK: No failed login attempts in the last $TIME_PERIOD"
    exit 0
else
    echo "WARNING: $FAILED_ATTEMPTS failed login attempts in the last $TIME_PERIOD"
    exit 1
fi

In order to not get the UNKNOWN status in your Nagios server:

  1. Open the sudoers configuration for editing: sudo visudo -f /etc/sudoers.d/nagios
  2. Add the following line to allow the nagios user to run the required commands without a password: nagios ALL=(ALL) NOPASSWD: /usr/bin/awk, /bin/date, /usr/lib/nagios/plugins/check_ssh_logins.sh to test
  3. After setting up the sudoers configuration, test running the script as the sudo -u nagios /usr/lib/nagios/plugins/check_ssh_logins.sh

image

Apply to all the nrpe.cfg files

In mgmt, db, app, and backup

  1. sudo vim /etc/nagios/nrpe.cfg

  2. Add this line to your other nrpe commands: command[check_ssh_logins]=/usr/lib/nagios/plugins/check_ssh_logins.sh

image

  1. Inside the mgmt server also add the line above to /etc/puppetlabs/code/modules/nrpe/files/nrpe.cfg

image

Note: If your Nagios check is coming up UNKNOWN and the message is something along the lines of "check_ssh_login is not defined", double check the nrpe.cfg in the nrpe puppet module (no, not the one inside /etc/nagios) and see if the check command is inside there, otherwise it won't work properly

  1. Apply the changes: sudo /opt/puppetlabs/puppet/bin/puppet agent --test or our alias test_puppet_agent

Make Nagios check inside puppet module

  1. Go inside the nagios puppet config: sudo vim /etc/puppetlabs/code/modules/nagios/manifests/config.pp

  2. Add a new Nagios hostgroup for failed SSH logins:

nagios_hostgroup {"Check-SSH-Logins":
  target => "/etc/nagios3/conf.d/ppt_hostgroups.cfg",
  mode => "0444",
  alias => "Check failed ssh logins",
  members => "db-c, mgmt-c, app-c, backup-c",
}

  1. Add a new NRPE service check:
nagios_service { "ssh_failed_logins":
  service_description => "SSH Failed Logins",
  hostgroup_name => "Check-SSH-Logins",
  target => "/etc/nagios3/conf.d/ppt_services.cfg",
  check_command => "check_nrpe!check_ssh_logins",
  max_check_attempts => 3,
  retry_check_interval => 1,
  normal_check_interval => 5,
  check_period => "24x7",
  notification_interval => 30,
  notification_period => "24x7",
  notification_options => "w,u,c,r",
  contact_groups => "slackgroup",
  mode => "0444",
}

  1. Apply the changes: sudo /opt/puppetlabs/puppet/bin/puppet agent --test or the alias test_puppet_agent

  2. Restart nagios: sudo systemctl restart nagios3 (we have an alias for this called restart_nagios)

Your output in Nagios should look like this:

image