Custom SpamC Rules and Adjustments - SpamTagger/SpamTagger-Plus GitHub Wiki

For a description of many of the most commonly hit built-in rules, see this Wiki.

Adjust the score for an existing rule

We strongly suggest that you take a statistical approach to adjusting SpamC scores. Lowering the score of a SpamC rule may seem a good idea if it was a major contributor to a false-positive but it may be unwise to do so before knowing how often it is also contributing to successfully catching true-positives. If you lower the score, you may find that you now have a new false-negative problem.

For Enterprise Edition, we provide curated rules and scores which are adjusting in small increments over a long period of time as we see them contribute more or less to the various spam and non-spam reports that we receive to find a good balance. However, your mail can be very specific, so the option does exist for you can to adjust the scores on your own.

Finding the rule that needs to be disabled/adjusted

If a message was delivered to the inbox, you can find which rules hit either by:

  • Opening the email headers and looking for the X-MailCleaner-SpamCheck header. This will list all of the rules that hit (for SpamC and Newsl), the score that the rule applied, and the gap between the required score (should always be 5), and the actual sum of scores for rules hit.
  • In Management->Tracing you can search for the message and click the magnifying glass icon when you find it. This will show all of the logs including a line in the Filtering Engine section with the same information as the header which was just discussed.

If a message was trapped as spam, you can access the message preview with the envelope icon from the spam quarantine and expand headers section to find the X-MailCleaner-SpamCheck there. You correlate the Rule IDs to their descriptions in the Rules score section (SpamC only; Newsl descriptions are not yet shown here).

Disable/adjust a rule globally

If you want to disable a rule, it is best to set its score to 0.001. This will cause it to not contribute to catching any more messages, while still showing up in the filtering results for investigations later on. If after the modification, you can see this rule hits for many false-negatives then you will know it have been useful after all.

If you do want to disable the rule completely, you would instead set the score to 0.0, which will cause it not to be evaluated. Note that if there are other meta rules (discussed below) which depend on this rule, then it will not be possible for them to trigger either. This is another good reason to leave it with a score of 0.001.

To do either of these actions globally, you will need to create a score adjustment in a custom file in /usr/mailcleaner/share/spamassassin (/usr/mailcleaner/share/newsld/siteconfig for newsletter rules).

These files are read in lexical order (numeric, then alphabetical), and the last instance of score RULE_NAME will be used, so it is important that your custom file be very late in that order (especially after 99_mc_custom_rules.cf for Enterprise Edition). Since all of our provided rule are prepended with a number for sorting purposes, we will assume that you are using a file like my_custom_scores.cf to ensure that yours comes last.

You must not apply the changes within a file tracked by this Git repository or provided with Enterprise Edition, since these will be overwritten, or your changes could block updates.

The contents of this file should look like:

score NON_SCORING_RULE 0.001
score COMPLETELY_DISABLED 0.0
score REDUCED_SCORE 1.0

where the ALL_CAPS indicate the existing rule ID.

Disable/adjust for a particular domain or user

In the web interface there are options to override the scores of existing rules for a recipient domain. See that link for more details.

It is possible to mimic what the domain override mechanism is doing in order to apply the override for a specific recipient address, for a specific sending domain, etc. This falls into the general discussion of meta rules below.

Creating new SpamC rules

Creating a custom rule should be done carefully. Creating too many rules, or clumsy rules with complicated Regular Expressions can slow down the filtering process and cause your MailCleaner to have worse performances. A rule that is not created carefully can also hit many messages that it was not intended to hit and could result in way more false-positives (or false-negatives for a rule with a negative score).

As mentioned in the previous section, you will need to create your rules in custom file in /usr/mailcleaner/share/spamassassin (/usr/mailcleaner/share/newsld/siteconfig for newsletter rules). These files are read in lexical order, so if you will be creating a meta rules, dependent on other existing rules, you should use a file that is sorted after the file where those rules are defined.

Rule formatting

Official SpamAssassin Documentation

General rules follow the structure:

type     RULE_NAME    /search pattern/
describe RULE_NAME    Human-readable description of what it is looking for
score    RULE_NAME    1.0

The type should be substituted for the actual rule type. The primary options being:

body - If you are searching for content in the human-readable body of the email rawbody - If you are searching for content in the encoded body of the email (eg. unicode patterns like \u1234). header - If you are searching for content within any of the headers. See below for more. meta - If you are want to combine results from other rules that have hit to synthesize a new rule. See more below.

describe and score should be used literally and hitespace does not matter.

The pattern you are searching for in anything but meta is a Regular Expression (RegEx). In most cases, if you are just searching for text, you can just place that exact text between the /s and it will work:

body    MY_RULE    /I'm looking for this sentence/

However, some special characters have specific purposes in RegEx. Most commonly:

  • . matches any single character
  • ^ matches the start of the line
  • $ matches the end of the line
  • ( ) creates a search clause
  • | acts as "or" within a search clause
  • [ ] creates a "class" (a list of characters, any of which can match)
  • * matches any number of the previous character
  • + matches one or more of the previous character
  • / denotes the start and end of the regular expression
  • \ acts as the escape character to use any of the other special characters literally

Ignoring these and treating them literally would would look like this:

body    LITERAL_RULE    /\*\* Offering now for \$1,000 \(USD\) \*\*/

Examples will follow of using RegEx. We also recommend that you test your RegEx using the content of an actual email before saving any rules in MailCleaner.

body and rawbody

These rules are fairly straight-forward. Follow the formatting described above.

header

header rules require one extra element, the actual header you would like to search:

header    SUBJECT_RULE    Subject =~ /search pattern/
header    SENDER_RULE     From =~ /.*@domain.com/

Here we have the first simple RegEx example. .* will match "any character" (.) any number of times (*) followed by exactly @domain.com. ie. It will hit for any sender at that domain.

meta

meta rules have special formatting to use boolean logic and counts for other rules to apply a new rule (and score). For example, if the following rules already exist:

header    SUBJECT_RULE    Subject =~ /search pattern/
body      BODY_RULE       /search pattern/

and you think that an additional score should be applied when both of those rules hit, you would create the rule:

meta    SUBJECT_AND_BODY    ( SUBJECT_RULE && BODY_RULE )

If the first two rule exist as well as the rule:

body    BODY_RULE2    /another option/

and you'd like to hit if the subject hits and either or both (inclusive OR) of the bodies hit, you would create the rule:

meta    SUB_AND_BODIES    ( SUBJECT_RULE && ( BODY_RULE || BODY_RULE2 ) )

You can also perform arithmetic with the rules, where the value of each rule in this context is simply 1 or 0 if it hit or did not hit. So you could have it hit if 2 or more of the rules hit:

meta    TWO_RULES    ( SUBJECT_RULE + BODY_RULE + BODY_RULE2 >= 2 )

You could also use this to make the previous inclusive OR rule into an exclusive OR (to hit if either of the body rules hits, but not if both hit):

meta    SUB_AND_ONE_BODY    ( SUBJECT_RULE && ( BODY_RULE + BODY_RULE2 = 1) )

In an arithmetic comparison, you can also provide a weight to rules if you would like for one to be more important than others, like:

meta    ARITHMETIC_RULE    ( (2.0 * SUBJECT_RULE) + (1.5 * BODY_RULE) + (0.5 * BODY_RULE2 >= 2) )

The result of an arithmetic operation is still just a true (1) or false (0), so you can nest arithmetic operations within other boolean operations or even other arithmetic operations.

The brackets denote the order of operations, the same as normal arithmetic.

Note that in many cases, if you are creating rules which are to be used as components of a meta rule and you don't need those rules to do anything on their own, you would prefix them with __ and do not at a score. With that prefix SpamAssassin will know that it should evaluate the rule regardless of it not having a score and it will also omit that rule from the list of rules that were hit.

Examples

Ignore a rule for a specific recipient

Make a rule that matches the recipient:

header __RCPT_BOB_DOMAIN_COM TO =~ /[email protected]/

Make a meta rule that hits when that recipient hits, and the rule in question hits:

meta     NO_FREEMAIL_FOR_BOB    ( MC_FREEMAIL && __RCPT_BOB_DOMAIN_COM )
score    NO_FREEMAIL_FOR_BOB    -2.0
describe NO_FREEMAIL_FOR_BOB    Negate MC_FREEMAIL for [email protected]

In this example, we are assuming that the original score for MC_FREEMAIL was 2.0 and are negating it with the same negative score, as discussed with the overrides for recipient domains.

Ignore a rule for a specific sender address

This works the same as the previous, except that we need the first rule to match the sender. However, note that the From header is easily spoofed, so you may wish to match the "Envelope" sender in the Received header:

header   __ENVF_BOB_DOMAIN_COM    Received =~ /from.*[email protected]/
meta     NO_FREEMAIL_FOR_BOB      ( MC_FREEMAIL && __ENVF_BOB_DOMAIN_COM )
score    NO_FREEMAIL_FOR_BOB      -2.0
describe NO_FREEMAIL_FOR_BOB      Negate MC_FREEMAIL for [email protected]

or either:

header   __FROM_BOB_DOMAIN_COM    From =~ /[email protected]/
header   __ENVF_BOB_DOMAIN_COM    Received =~ /from.*[email protected]/
meta     NO_FREEMAIL_FOR_BOB      ( MC_FREEMAIL && ( __ENVF_BOB_DOMAIN_COM || __FROM_BOB_DOMAIN_COM ) )
score    NO_FREEMAIL_FOR_BOB      -2.0
describe NO_FREEMAIL_FOR_BOB      Negate MC_FREEMAIL for [email protected]

Testing rules

To test if a new rule would have hit the email that you are targetting, you can copy that email to MailCleaner, or find the path to it in the Quarantine (/var/mailcleaner/spam/...). You can then run the email directly against a specific rule file (and standard rules) using:

/usr/local/bin/spamassassin -p rules_file.cf < email.eml

Or you can simulate the real MailCleaner scan by testing against all the rules

/usr/local/bin/spamassassin --siteconfigpath=/usr/mailcleaner/share/spamassassin/ -t -d < email.eml

LOTS_OF_MONEY rules

There are several rules associated with the main rule LOTS_OF_MONEY. It is generally considered to be pretty suspicious for users to receive emails quoting specific high value money figures, so these are generally very good rules to have. However, you may have one or more people in the company who regularly deal with invoices or other emails which end up getting hit by these rules. Because of this, there is a built-in (but hidden) feature to address this.

On each node of a cluster, add the recipient addresses or domains for the users who are encountering this problem into this file: /usr/mailcleaner/share/spamassassin/mails_without_LOM. This will cause 2.0 points to be removed any time one of these rules hits for one of those recipients. This is a little simpler than having to create your own custom rules or overrides.

Applying the new rules

If you have a cluster, make sure that you synchronize your custom file between all machines.

The new rules will not take effect until the SpamC daemon (spamd) has been restarted. You can do this from Monitoring->status by clicking the restart icon next to "SpamAssassin Daemon" in the services list, or with:

/usr/mailcleaner/etc/init.d/spamd restart