Training of Spam Filter - Udera/vexim2 GitHub Wiki

Spamassassin includes a Bayes filter that can be used after some training with spam mails and ham mails. More information can be found in the official documentation. We want to use a sytem-wide Setup for all users.

After an initial training with 100 mails, this filter needs continuous training that should be done which we want to simplify with some scripts. Every user can create training folders train_spam and train_ham where he can put false-positive or false-negative mails. A script will collect all these mails and pass it to the spamassassin.

sa-lern Documentation

script:

  • collect mails from folders train_spam and train_ham
  • pass through sa-learn
  • delete files
  • put script into cronjob
  • information for admin

Create folder for scripts and learn-mails:

mkdir -p /var/vmail/{scripts,ham-learn,spam-learn}

Create script spam-learn.sh:

#!/bin/sh
# Use mails in train_spam and train_ham to train spammassassin.
# All mails will be deleted from these folder after.
learn="/usr/local/bin/sa-learn"
SpamLearnDirs=`find /var/vmail/ -name "*train_spam" -type d`
HamLearnDirs=`find /usr/vmail/ -name "*train_ham" -type d`
SpamDirs=`find /usr/vmail/ -name "*Spam" -type d`

for spamdir in $SpamLearnDirs; do
$learn --spam $spamdir/cur
$learn --spam $spamdir/new
rm -f $spamdir/cur/*
rm -f $spamdir/new/*
done

for hamdir in $HamLearnDirs; do
$learn --ham $hamdir/cur
$learn --ham $hamdir/new
rm -f $hamdir/cur/*
rm -f $hamdir/new/*
done

# Delete Spam-Mails in Spam-Folder after 15 days
for dspamdir in $SpamDirs; do
find $dspamdir/cur -type f -mtime +15 -exec rm {} \;
find $dspamdir/new -type f -mtime +15 -exec rm {} \;
done

Make script executable and create a cronjob:

chmod +x /var/vmail/scripts/spam-learn.sh
sudo -u vmail crontab -e

Put this line into your cronjobs (runs every 6h):

10 */6 * * * /var/vmail/scripts/spam-learn.sh