Watchdogs - MailCleaner/MailCleaner8 GitHub Wiki
MailCleaner has a number of Watchdogs which report common errors to to the system administrator via an optional Home->System warnings
field which will appear if any problems exist, and via e-mail during the Configuration->General settings->Periodic tasks->Run daily tasks at
time, if you have configured a Configuration->General settings->Defaults->System support address
.
If you are using MailCleaner Enterprise Edition, these reports will also be sent to MailCleaner so that our staff can periodically review the warnings on your behalf. Note that there are also a few extra watchdogs that are only included with Enterprise Edition.
Below are a list of the current Watchdogs as well as recommendations on the possible resolutions.
Notes:
- many of the watchdogs that are specifically checking a daemon's status by monitoring the current day's logs. If you fix a problem, this may not be noticed until the next day when no errors remain in the current log file.
- At most the watchdog reporting is only generated every 15 minutes, so you will not see them immediately disappear.
- If you would like to refresh all of the watchdogs immediately, you can use:
/usr/mailcleaner/bin/watchdog/watchdogs.pl All ; /usr/mailcleaner/bin/watchdog/watchdogs_report.sh
- If you do not care about a particular watchdog, it can be disabled to avoid future alerts by creating a
.disabled
file in the watchdog configuration directory like:touch /usr/mailcleaner/etc/watchdog/MC_mod_detect_F2B.disabled
This indicates an issue with one of the DKIM keys on your machine.
Note: This watchdog is brand new and an exception was not made for a blank version of the 'default.pkey' file. If you have an 'invalid' message for this key file, check to see if that file is blank before reading further:
cat /var/mailcleaner/spool/tmp/mailcleaner/dkim/default.pkey
If it is, you can ignore the warning for that file. This watchdog will be patched shortly to ignore a blank version of this file if you have not yet generated one.
One or both of the following issues can appear:
- Short DKIM key length:
The first indicates that those keys are shorter than 1024 bits which is the recommended standard. Some services are beginning to reject, ignore, or otherwise penalize messages signed with an out-dated (short) key. To resolve this issue for 'default.pkey' you need to generate a new key with:
Configuration->SMTP->DKIM->Generate new private key...
For any domain-specific key you need to generate a new one from:
Configuration->Domains->[select domain]->Outgoing Relay->Generate new private key...
- Invalid DKIM key:
This error indicates that the private key file cannot actually be read/decoded by OpenSSL. The files are located in the directory:
/var/mailcleaner/spool/tmp/mailcleaner/dkim/
You can see what OpenSSL sees by running:
openssl rsa -in /var/mailcleaner/spool/tmp/mailcleaner/dkim/<domain.pkey> -noout -text
The Watchdog is specifically looking for the line containing:
Private-Key: (**N** bit)
where N is the length. If you see this error, it is likely that OpenSSL will output an error, having not read the key at all. Otherwise, it is possible that the output does not include this line, but this is not a case that is known to us. You will probably need to generate a new key as described in the other case.
This indicates that the last time the automatic update script (/root/Updater4MC/updater4mc.sh) ran, it exited with an error showing that the Git repository couldn't be synced due to local corruption. You can try to diagnose the issue by moving to the MailCleaner repository and checking the status and/or attempting to pull changes:
cd /usr/mailcleaner
git status
git pull
This test simply reports whether the version of the host is seen to be Community. This watchdog is only downloaded when a machine is registered for Enterprise and will report if the machine that had been registered now appears to be using Community. Some events, such as running a 'reset' on the Git tree can cause a registered appliance to revert to Community.
If you believe the appliance should still be registered, log in to the web interface of the affected host and use the Configuration->Base System->Registration form to register the host again. If this does not resolve the issue, you can contact support.
If you intentionally unregistered the appliance, it is possible that this script failed to be removed. You can simply delete the script and its configuration file:
rm /usr/mailcleaner/bin/watchdog/MC_mod_detect_Community.sh /usr/mailcleaner/etc/watchdog/MC_mod_detect_Community.conf
This indicates that either the Fail2Ban server is not running at all, or there are too many processes running (ie. there is not exactly 1). The former is much more likely. You can start it with:
/usr/mailcleaner/etc/init.d/fail2ban start
We are working on a solution to automatically start the service if it is not currently running. In the unlikely case that there are multiple fail2ban processes running, using 'restart' instead of 'start' should kill all of them and start only one.
Checks to see if the Kaspersky license is expired, if installed. Contact your sales representative if your license has expired. If you trialled Kaspersky or used it for a time but no longer wish to use it you can uninstall the package with:
apt-get remove kaspersky-64-2.0
Searches for MessageSniffer logs without a current database.
If you are still subscribed to MessageSniffer, re-download with:
cd /opt/messagesniffer/share/snf-server
wget http://www.sortmonster.net/Sniffer/Updates/`echo "select licenseid from MessageSniffer;" | mc_mysql -m mc_config | tail -n 1`.snf --user=sniffer --password=ki11sp8m
If you are no longer subscribed to messagesniffer, uninstall the package and remove from the database:
apt-get remove mc-messagesniffer
echo "delete from prefilter where name = 'MessageSniffer'; delete from MessageSniffer;" | mc_mysql -m mc_config
In either case, restart mailscanner:
/usr/mailcleaner/etc/init.d/mailscanner restart
Checks for redundant PrefTDaemon processes
Restart the Preferences daemon to kill any orphaned processes. This can be done from Monitoring->Status (you will need to expand the Status column with "Show more..."), or with the command:
/usr/mailcleaner/etc/init.d/preftdaemon restart
NOTE: An update to the init scripts should have removed the chance of seeing this error during normal operation. If you are regularly seeing this watchdog it is likely that your system has not fully updated.
Detects the unsuccessful installation of the new Python libraries being used to develop new tools. These libraries are not yet used for any component of the system other than Fail2Ban, so it is not critical to fix at this time. We will monitor the reports that we receive as it nears broader usage.
If you would like to try installing these libraries manually to remove the error, you can do so with:
/usr/mailcleaner/install/install_pyenv_3-7-7.sh
pip3 install --upgrade pip
pip3 install mailcleaner-library==1.0.2 --index https://repository.mailcleaner.net/python/ --extra-index https://pypi.org/simple/
Start Fail2Ban to confirm that this was successful ('ok' means success, even if there are warnings):
/usr/mailcleaner/etc/init.d/fail2ban start
If successful, remove the log that originally spawned this watchdog:
rm /var/mailcleaner/log/mailcleaner/install_pyenv.log
Checks for redundant Spam_Handler processes
Restart the SpamHandler daemon to kill any orphaned processes. This can be done from Monitoring->Status (SpamHandler is identified as "Filtering Engine"), or with the command:
/usr/mailcleaner/etc/init.d/spamhandler restart
NOTE: An update to the init scripts should have removed the chance of seeing this error during normal operation. If you are regularly seeing this watchdog it is likely that your system has not fully updated.
Checks for redundant StatsDaemon processes
Restart the StatsDaemon to kill any orphaned processes. This can be done from Monitoring->Status, or with the command:
/usr/mailcleaner/etc/init.d/statsdaemon restart
NOTE: An update to the init scripts should have removed the chance of seeing this error during normal operation. If you are regularly seeing this watchdog it is likely that your system has not fully updated.
Reports an inability for the summaries to be sent due to an inability to connect to the database on the previous day. This normally indicates that the Summaries have conflicted with a MailCleaner restart. This will happen if the summaries are sent too soon after the nightly updates at 10:30pm, or if they are still running prior to the updates. Consider changing the daily task timing from Configuration->General Settings->Periodic Tasks.
Less severe than detect_bad_git. This simply reports that there is something in your Git tree that has diverged from origin/master. This will happen if you or a MailCleaner staff member is testing experimental commits or making other changes that have been commited but not merged upstream. You will have to navigate to /usr/mailcleaner and reset the tree to match origin/master for this to disappear.
Reports if the / or /var partitions are over 85% used.
You may choose to clear space, reduce the retention time for quarantined items, or increase the size of your disk.gg
Same as previous except that it is based on the inode count (available unique files) instead of the actual bytes used.
This watchdog should be deprecated. So long as your machine is fully up-to-date, this can only mean that it apparently failed to be removed from your appliance. Simply remove it with:
rm /usr/mailcleaner/bin/watchdog/MC_mod_exim_4.92.sh /usr/mailcleaner/etc/watchdog/MC_mod_exim_4.92.conf
This watchdog should also be deprecated. Upon updating for Exim 4.96, the name of the watchdog was changed to exim_current, see below. If you still see this error your machine needs to be updated.
Checks to make sure that you are using the latest supported version of Exim.
There must be something blocking your update. Run /root/Updater4MC/updater4mc.sh
and watch the output, or check the output from the last automatic attempt in /var/mailcleaner/log/mailcleaner/updater4mc.log.0
. It will include detailed instructions. Usually this will look like:
ATTENTION: Cannot install Exim 4.98
Your existing exim configuration template{s):
/usr/mailcleaner/etc/exim/exim_stage1.conf_template_4.97
contain modifications.
The modifications will not be carried over to the new template files automatically.
Please port your modifications to version of each of these files ending in '_4.98' or confirm that you are okay abandoning these modifications.
Once you are statisfied with the state of the '_4.98' templates run the following to force this update:
bash /root/Updater4MC/updates/91_mc_exim_4.98.update --force
Follow those instructions, or roll back your customizations and run Updater4MC again.
The last automatic update was unable to pull changes to the git tree because there are conflicting changes that could not be stashed and restored. This is probably because you've modified files that have since been changed upstream. Manually resolving the merge conflict is generally the only solution. You can find which files cause the issue from the last update log (/root/Updater4MC/updater_.log) which should contain:
error: The following untracked working tree files would be overwritten by merge:
followed by the list of files. Once these files are known, you can use the 'git diff' command to find out what changes you made:
cd /usr/mailcleaner
git diff path/to/file
You can then reset this file(s):
git reset -- path/to/file
then try manualy running the updater again:
/root/Updater4MC/updater4mc.sh
and reapply your modifications to the updated file, if possible/necessary.
In order to allow immediate SSH access to a new appliance, MailCleaner VA images ship with a generic set of SSH host keys. These keys are the same for every installation. This creates a potential for a man-in-the-middle attack to be able to read SSH session information in plain text if it could intercept the connections using this known set of host keys. Given that the MailCleaner firewall only allows SSH connections from your local IP network by default, this threat should be limited to compromised or malicious devices within the same LAN as your MailCleaner machine(s).
These keys should be rotated out for a unique set on your appliance by patch '92', so if you are seeing this watchdog, it appears that your machine has not yet installed that update successfully, or the change was reverted.
You can resolve this by running the following on all of your machines:
rm /etc/ssh/ssh_host_*
ssh-keygen -A
Note: You may find that the -A
flag is not supported by your version of ssh-keygen
. If this is the case, you will need to run:
apt-get update
apt-get dist-upgrade -y
then try again.
An impact of this is that you will be prompted, upon your next login, with a warning that the host keys of the MailCleaner machine have changed. This is to be expected and you can follow the instructions in that prompt to remove the old key from your clients' .known_hosts files, then approve the new key on your next connection.
Reports missing or excessive internal keys in /root/.ssh/authorized_keys
You can generate, propagate and install missing keys with: /usr/mailcleaner/bin/internal_access -gpi
If there are multiple, you should clean all but the last 'mailcleaner-internal' from /root/.ssh/authorized_keys
Warning: Making these changes can cause you to lose access to the appliance, so be careful that you take a backup of the file first and ensure that connections can still be established before logging out of the existing session.
Checks for modifications to: /usr/mailcleaner/etc/exim/exim_stage1.conf_template
Historically many clients have customized this template in order to change Exim settings not available in the UI. Unfortunately, this file is tracked by Git, so changes would be overwitten any time that an upstream change to the file was published. This alert allows us to know who will be impacted by an upstream change.
Currently, this customization is rarely necessary as there are dedicated files to hold customization to all of the most frequently modified settings in the /usr/mailcleaner/etc/exim/stage1 directory. These will not be overwritten since the upstream copy should never change.
Reports when there are a greater number of messages in each of the Exim queues than are expected during normal operation. Reasons for queuing vary. Open the queued item list from Monitoring->Status by clicking on the queue count. If the issue is resolved, you can try to flush the queue directly:
https://support.mailcleaner.net/boards/3/topics/49
If you have several machines and only one with a backlog, you may wish to temporarily stop the Inbound MTA on that machine until it clears.
There are 2 errors you will see for this report:
- "Port 25 authentication is open (not used today)"
This simply indicates that SMTP authentication has not been strictly prohibited on port 25, but this is not being actively used. Best practice is to disable authentication on this port, but it is enabled by default to support legacy installations. You can disable this setting from:
Configuration->SMTP->SMTP Checks->Block authenticated relaying on port 25
- "Port 25 authentication is in use ( occurences)"
This indicates that not only is port 25 authentication enabled, but you have users who are successfully using this port for authenticated relaying. Best practice would be to locate who by searching for:
"Accepting authenticated session from .* on port 25"
in:
/var/mailcleaner/log/exim_stage1/mainlog
have those users change to port 587, then close the port as mentioned above when there are no more users on port 25.
Checks if the Slave DB is in sync. This can happen for a number of reasons during normal operations. The database syncronization has been updated to automatically restore itself during the main Cron tasks. If you see this persist, you can manually run the resync command:
/usr/mailcleaner/bin/resync_db.sh
It is likely that you will be told that the DB is already synced, indicating that this was one of the transient issues mentioned.
Checks to ensure that TesseractOcr dependencies have been installed. These will be necessary when the TesseractOcr module is added to SpamAssassin. The dependencies require about 250MB, so the upgrade is scheduled to avoid installing them if there won't be room to spare.
Check do see available disk space:
df -h /
If you have at least 400MB, you can install the dependencies manually:
apt-get install tesseract-ocr libopencv-dev libswitch-perl
Reports troubles with last automatic update. Will report
- if the update exited unsuccessfully
- if the main /etc/mailcleaner.conf file is invalid and thus cannot provide the necessary details to do an update
- if it started running over 1 hour ago and never finished.
You can try manually running the updater (/root/Updater4MC/updater4mc.sh) or inspect the last log (/root/Updater4MC/updater_.log).
May appear alongside other watchdogs with a better description of the issue.
Checks for TLS issues during the update process. This is likely to be an issue on our end. Contact us if you see this error.
Watchogs are added and removed regularly. Check to make sure you are on the latest version because you could be seeing a watchdog that is no longer relevant. If you do believe that you are on the latest version and see an error that isn't listed, report it to us as we may be required to update the documentatino. Thanks!
If you are considering writing your own custom module, please consider submitting it as a Pull Request to our GitHub page. The documentation that follows will instruct you to use the 'CUSTOM_mod_' naming convention. These will be included in the WebUI but will not be reported to us, ever if you use Enterprise Edition, so it is a way to have modules that only exist and report locally. This will also ensure that you don't risk having a naming conflict with watchdogs that exist in the Git tree or that would be fetched from our Enterprise servers. If you are satisfied with the results of the custom module and believe that it will be useful for others, please rename it to a 'MC_mod_' and submit a Pull Request with that name.
You can create your own watchdogs that will be displayed in the WebUI. To do this, copy one of the 'MC_mod_SKELETON.pl' or 'MC_mod_SKELETON.sh' scripts with a filename starting with 'CUSTOM_mod_' then change this file to accomplish whatever task you need it to check. The output file should contain:
A text description of the error (optional)
RC : 0
EXEC : 0
where RC is the return code (0 meaning no error, any other value meaning there was an error). If you are going to submit a module to GitHub, please ensure that there is a comment indicating the significance of the return code, or that the significance is obvious from context. This will allow us to no better monitor user reports. EXEC should simply be the execution time in seconds as presented in the example code. If one is to be provided, the text description should be printed first on one line. The order of the other two does not matter.
Once the script is complete, create an accompanying config file in /usr/mailcleaner/etc/watchdog/ with the same name other than the extension being substituted for '.conf'. The contents of this file should be:
TAGS=oneday
EXEC_MODE=Parralel
The TAGS are the list of groups that this watchdog will be included in. The built-in tags are:
dix - Run every 10 minutes oneday - Run 3 times a day (used to be once) all - (Redundant) to be included with manual evocations of 'watchdogs.pl All'
You may use your own TAG value for custom modules if you do want to run on a different schedule. These existing TAGS are run by the root user's Crontab. Edit with:
crontab -e
The EXEC_MODE can be:
Parrallel - All run at once and before sequential processes. Sequence - Run one after another in alphabetical order (useful if an existing watchdog might cause an error if it has yet to complete).
The config file can also have a TIMEOUT value (in seconds) if you anticipate that it could stall.