Common SpamC Rules - SpamTagger/SpamTagger-Plus GitHub Wiki

Note: If you would like to override the score of a built-in rule or create your own see this guide

This article includes a list of some of the most common SpamC rules which will hit, including a brief explanation of each.

Ideally, every rule would have an explanation, as defined in the describe section of the rule definition file and as presented to the user in the message preview from the Rules score section. However, many built-in rules do not include these description, or a short sentence is not enough to convey what they actually look for.

Note that if these descriptions are insufficient, or you don't find the rule you are looking for, you can always go straight to the source and find the rule definitions.

MailCleaner includes several upstream rule sets and plugins which are located in /var/lib/spamassassin/3.004000/updates_spamassassin_org/.

We also provide our own rules and plugins in /usr/mailcleaner/share/spamassassin. More for Enterprise Edition customers.

You should be able to use grep -R RULE_NAME /var/lib/spamassassin/3.004000/updates_spamassassin_org/ /usr/mailcleaner/share/spamassassin to locate the rule definition. If it is a meta rule, this may require you to search further for the component rules.

Rules change much too quickly for us to maintain a full up-to-date list, so these are just those that we see the most questions about.

Upstream rules

These rules are provided by the upstream SpamAssassin libraries, installed within the relevant Perl modules (nested in /usr/local/share/perl).

AXB_X_FF_SEZ_S

This rules applies when there is an header X-Forefront-Antispam-Report in a mail. Here you can have more information on why this header was added.

DCC_CHECK

The DCC or Distributed Checksum Clearinghouse is a system of servers collecting and counting checksums of millions of mail messages. The counts can be used by SpamAssassin to detect and reject or filter spam.

Because simplistic checksums of spam can be easily defeated, the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves.

DEAR_SOMETHING

This detects subjects / mail beginning like "Dear Mister". This is rarely used in ham and corresponds to specific spams waves.

DKIM_ADSP_ALL

The sender's domain says that it uses DKIM on all email, but no valid signature was found. That suggests that the message might not have originated with the purported sender.

DKIM_SIGNED

Gives minor points to DKIM signed messages. If the DKIM signature is valid, those points will be nullified by the DKIM_VALID_AU.

DKIM_VALID_AU

Message has a valid DKIM or DK signature from author's domain. You will generally see this in combination with several other DKIM rules. When the DKIM is valid, all of the rules should cancel out.

DYN_RDNS_AND_INLINE_IMAGE

The mail contains an image attachment, and the message was received by the last trusted relay from an IP address with a reverse DNS name that suggests it is dynamically allocated.

FR_3TAG_3TAG

An HTML balise of 3 characters is opened and closed right after

FUZZY_XPILL

FuzzyOCR module detected a message contains the name of a pharmaceutical product written in an obfuscated way

HTML_FONT_FACE_BAD_BODY

The mail contains an inexistent font face definition.

HTML_IMAGE_RATIO_04

This may indicate a message using an image instead of words in order to sidestep text-based filtering

HTTPS_HTTP_MISMATCH

This rule is triggered when a link presents its text as an HTTPS link while the real target is HTTP (not S). For example:

<a href="http://spammersite.com/virus">https://www.email-service.com/login</a>

KHOP_BIG_TO_CC

Mail was sent to a large number of person (To and Cc).

MIME_HEADER_CTYPE_ONLY

The mail is malformed : the specified Content-type for the mail is something other than "text/plain", so the headers should have conformed to the MIME specification. This suggests that the message was generated by a badly-written mailout program rather than by a normal email client.

MISSING_DATE

The date header is missing.

MISSING_MID

Mail doesn't contain a message-ID header

MPART_ALT_DIFF

The mail contains alternative parts which are supposed to be identical so that the same text is displayed in text or HTML mode. Here the 2 parts are different, this is most of the time a spam technique

PHP_ORIG_SCRIPT

Identifies the email came from a PHP script. This is probably from a poorly secured PHP server being exploited.

PYZOR_CHECK

Pyzor is a HashSharingSystem. That is to say that it detects mails with a close signature of known spams.

RCVD_IN_****

A server which relayed the message is listed in a RBL (Relay BlackList). eg. RCVD_IN_BRBL_LASTEXT indicates that the last external IP in the Received headers is listed in Barracuda RBL (bb.barracudacentral.org).

RDNS_DYNAMIC

The full circle name used by the sending server is dynamic.

RDNS_NONE

MailCleaner checks that the sending server is using a "Full Circle DNS" name. This can be checked here.

SARE_ADLTSUB10

Mail subject contains a (maybe obfuscated) string based on the rape word --Since this may involve obfuscating techniques, it is sometime hard to find out what lured SpamC.

SINGLE_HEADER_1K SINGLE_HEADER_2K SINGLE_HEADER_3K SINGLE_HEADER_4K SINGLE_HEADER_5K

Headers contain between xK and (x+1)K characters total. Single headers should be limited to a max of 998 characters, and even that many is suspicious.

SUBJECT_NEEDS_ENCODING

The Subject: header line contains characters outside of the US-ASCII range that have not been encoded with Base64 or Quoted-Printable encoding. This violates the RFC standards for mail headers. Properly behaved MUAs would be expected not to do this.

SUBJ_ALL_CAPS

The mail subject is entirely in caps.

SUBJ_ILLEGAL_CHARS

The Subject header contains 8-bit and other illegal characters that should be MIME encoded, as described in RFC 2045

TVD_SPACE_RATIO_MINFP

This is about the ratio of spaces to non-spaces in each paragraph. Apparently messages where generally there are lots of spaces mean the message is spam.

T_DKIM_INVALID

The mail is DKIM signed but DKIM is invalid

T_FILL_THIS_FORM_SHORT

This rule detects mails including a short form asking for personal information.

URI_HEX

An URI is composed of a long hexadecimal sequence

URI_OBFU_WWW

A link contained in the mail is obfuscated.

MailCleaner rules (Community and Enterprise)

These rules are included within this repository (share/spamassassin; or /usr/mailcleaner/share/spamassassin on the appliance), and are used by all MailCleaner installations, unless they are overridden.

BOTNET_BADDNS

This rule indicates that the DNS configuration of the sending server is associated with a known botnet. This is a meta rule including a lot of different elements.

BOTNET_CLIENT

This rule adds points when several botnet-related rules have been hit.

BOTNET_CLIENTWORDS

The sending server hostname contains strings leading to think the mail was sent by an email client instead of a real mail server

BOTNET_IPINHOSTNAME

Hostname contains a subpart of its own IP address

DC_IMAGE_SPAM_HTML

The mail has at least one large image attachment and a comparatively small amount of text.

DC_IMAGE_SPAM_TEXT

Possible Image-only spam with little text.

DC_IMG_HTML_RATIO

Low body to pixel area ratio

DC_IMG_TEXT_RATIO

Low body to pixel area ratio

DC_PNG_MULTI_LARGO

Message has 2+ inline png covering lots of area

GENERIC_IXHASH

A fingerprint of the mail is performed and checked versus fingerprints of known spams. This is a network based test.

Enterprise Edition only

These rules are included as part of the Enterprise Edition premium data feeds. Community Edition users will not see these rules. These rules are installed in the same directory (/usr/mailcleaner/share/spamassassin), in a different set of files.

MC_ADULT_BDY_COQUIN_EN

Looks in the body of the mail for a word in the list horny horniest naughty naughtiest sluty slutiest

MC_ADULT_BDY_SEX

The body contains a word starting with "sex"

MC_ADULT_SUBJ_SEX

The subject contains a word starting with "sex"

MC_CONTAINS_ZERO1 MC_CONTAINS_ZERO2 MC_CONTAINS_ZERO3 MC_CONTAINS_ZERO4 MC_CONTAINS_ZERO5

The rules detect the use of specific/invisible characters usually used to trick parsers and users. These characters are meant to circumvent anti-spam rules by causing an otherwise valid pattern not to match. eg. a 'zero-width space' could be added to the middle of the word "viagra" like "viagra" and it would no longer hit a simple match rule, while still looking correct to the user.

More information at here.

MC_ESCURL

Detects bad characters in an URL of the message

MC_FREEMAIL_BODY

Detects the use of a "freemail" address in the body of a message. Freemail addresses are mails where one can easily register without giving real information about himself. (for example : gmail.com yahoo.com hotmail.com ...) Spams often contain such mails in the body and ask the recipient of the message to answer to this email address.

MC_KREDIT

The term "kredit" "credit" is present in the body of the mail

MC_MAILTO_WITH_SUBJ_ORDER

Contains a link to send an email with Subjet order/commande/bestellung

MC_MESSAGESNIFFER

This rule give a score when the message was identified as spam by our partner MessageSniffer.

MC_URI_EASYMONEY_LVL4

(MailCleaner rule) Message contains a sentence like "claim your free copy" or "Check secret story". This rule detects sentences done with this pattern "one of the words(claim see check) + your + one of the word(free full secret) + one of the word (copy story)"