Common SpamC Rules - SpamTagger/SpamTagger-Plus GitHub Wiki
Note: If you would like to override the score of a built-in rule or create your own see this guide
This article includes a list of some of the most common SpamC rules which will hit, including a brief explanation of each.
Ideally, every rule would have an explanation, as defined in the describe
section of the rule definition file and as presented to the user in the message preview from the Rules score
section. However, many built-in rules do not include these description, or a short sentence is not enough to convey what they actually look for.
Note that if these descriptions are insufficient, or you don't find the rule you are looking for, you can always go straight to the source and find the rule definitions.
MailCleaner includes several upstream rule sets and plugins which are located in /var/lib/spamassassin/3.004000/updates_spamassassin_org/
.
We also provide our own rules and plugins in /usr/mailcleaner/share/spamassassin
. More for Enterprise Edition customers.
You should be able to use grep -R RULE_NAME /var/lib/spamassassin/3.004000/updates_spamassassin_org/ /usr/mailcleaner/share/spamassassin
to locate the rule definition. If it is a meta
rule, this may require you to search further for the component rules.
Rules change much too quickly for us to maintain a full up-to-date list, so these are just those that we see the most questions about.
Upstream rules
These rules are provided by the upstream SpamAssassin libraries, installed within the relevant Perl modules (nested in /usr/local/share/perl
).
AXB_X_FF_SEZ_S
This rules applies when there is an header X-Forefront-Antispam-Report in a mail. Here you can have more information on why this header was added.
DCC_CHECK
The DCC or Distributed Checksum Clearinghouse is a system of servers collecting and counting checksums of millions of mail messages. The counts can be used by SpamAssassin to detect and reject or filter spam.
Because simplistic checksums of spam can be easily defeated, the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves.
DEAR_SOMETHING
This detects subjects / mail beginning like "Dear Mister". This is rarely used in ham and corresponds to specific spams waves.
DKIM_ADSP_ALL
The sender's domain says that it uses DKIM on all email, but no valid signature was found. That suggests that the message might not have originated with the purported sender.
DKIM_SIGNED
Gives minor points to DKIM signed messages. If the DKIM signature is valid, those points will be nullified by the DKIM_VALID_AU.
DKIM_VALID_AU
Message has a valid DKIM or DK signature from author's domain. You will generally see this in combination with several other DKIM rules. When the DKIM is valid, all of the rules should cancel out.
DYN_RDNS_AND_INLINE_IMAGE
The mail contains an image attachment, and the message was received by the last trusted relay from an IP address with a reverse DNS name that suggests it is dynamically allocated.
FR_3TAG_3TAG
An HTML balise of 3 characters is opened and closed right after
FUZZY_XPILL
FuzzyOCR module detected a message contains the name of a pharmaceutical product written in an obfuscated way
HTML_FONT_FACE_BAD_BODY
The mail contains an inexistent font face definition.
HTML_IMAGE_RATIO_04
This may indicate a message using an image instead of words in order to sidestep text-based filtering
HTTPS_HTTP_MISMATCH
This rule is triggered when a link presents its text as an HTTPS link while the real target is HTTP (not S). For example:
<a href="http://spammersite.com/virus">https://www.email-service.com/login</a>
KHOP_BIG_TO_CC
Mail was sent to a large number of person (To and Cc).
MIME_HEADER_CTYPE_ONLY
The mail is malformed : the specified Content-type for the mail is something other than "text/plain", so the headers should have conformed to the MIME specification. This suggests that the message was generated by a badly-written mailout program rather than by a normal email client.
MISSING_DATE
The date header is missing.
MISSING_MID
Mail doesn't contain a message-ID header
MPART_ALT_DIFF
The mail contains alternative parts which are supposed to be identical so that the same text is displayed in text or HTML mode. Here the 2 parts are different, this is most of the time a spam technique
PHP_ORIG_SCRIPT
Identifies the email came from a PHP script. This is probably from a poorly secured PHP server being exploited.
PYZOR_CHECK
Pyzor is a HashSharingSystem. That is to say that it detects mails with a close signature of known spams.
RCVD_IN_****
A server which relayed the message is listed in a RBL (Relay BlackList). eg. RCVD_IN_BRBL_LASTEXT indicates that the last external IP in the Received headers is listed in Barracuda RBL (bb.barracudacentral.org).
RDNS_DYNAMIC
The full circle name used by the sending server is dynamic.
RDNS_NONE
MailCleaner checks that the sending server is using a "Full Circle DNS" name. This can be checked here.
SARE_ADLTSUB10
Mail subject contains a (maybe obfuscated) string based on the rape word --Since this may involve obfuscating techniques, it is sometime hard to find out what lured SpamC.
SINGLE_HEADER_1K SINGLE_HEADER_2K SINGLE_HEADER_3K SINGLE_HEADER_4K SINGLE_HEADER_5K
Headers contain between xK and (x+1)K characters total. Single headers should be limited to a max of 998 characters, and even that many is suspicious.
SUBJECT_NEEDS_ENCODING
The Subject: header line contains characters outside of the US-ASCII range that have not been encoded with Base64 or Quoted-Printable encoding. This violates the RFC standards for mail headers. Properly behaved MUAs would be expected not to do this.
SUBJ_ALL_CAPS
The mail subject is entirely in caps.
SUBJ_ILLEGAL_CHARS
The Subject header contains 8-bit and other illegal characters that should be MIME encoded, as described in RFC 2045
TVD_SPACE_RATIO_MINFP
This is about the ratio of spaces to non-spaces in each paragraph. Apparently messages where generally there are lots of spaces mean the message is spam.
T_DKIM_INVALID
The mail is DKIM signed but DKIM is invalid
T_FILL_THIS_FORM_SHORT
This rule detects mails including a short form asking for personal information.
URI_HEX
An URI is composed of a long hexadecimal sequence
URI_OBFU_WWW
A link contained in the mail is obfuscated.
MailCleaner rules (Community and Enterprise)
These rules are included within this repository (share/spamassassin
; or /usr/mailcleaner/share/spamassassin
on the appliance), and are used by all MailCleaner installations, unless they are overridden.
BOTNET_BADDNS
This rule indicates that the DNS configuration of the sending server is associated with a known botnet. This is a meta
rule including a lot of different elements.
BOTNET_CLIENT
This rule adds points when several botnet-related rules have been hit.
BOTNET_CLIENTWORDS
The sending server hostname contains strings leading to think the mail was sent by an email client instead of a real mail server
BOTNET_IPINHOSTNAME
Hostname contains a subpart of its own IP address
DC_IMAGE_SPAM_HTML
The mail has at least one large image attachment and a comparatively small amount of text.
DC_IMAGE_SPAM_TEXT
Possible Image-only spam with little text.
DC_IMG_HTML_RATIO
Low body to pixel area ratio
DC_IMG_TEXT_RATIO
Low body to pixel area ratio
DC_PNG_MULTI_LARGO
Message has 2+ inline png covering lots of area
GENERIC_IXHASH
A fingerprint of the mail is performed and checked versus fingerprints of known spams. This is a network based test.
Enterprise Edition only
These rules are included as part of the Enterprise Edition premium data feeds. Community Edition users will not see these rules. These rules are installed in the same directory (/usr/mailcleaner/share/spamassassin
), in a different set of files.
MC_ADULT_BDY_COQUIN_EN
Looks in the body of the mail for a word in the list horny horniest naughty naughtiest sluty slutiest
MC_ADULT_BDY_SEX
The body contains a word starting with "sex"
MC_ADULT_SUBJ_SEX
The subject contains a word starting with "sex"
MC_CONTAINS_ZERO1 MC_CONTAINS_ZERO2 MC_CONTAINS_ZERO3 MC_CONTAINS_ZERO4 MC_CONTAINS_ZERO5
The rules detect the use of specific/invisible characters usually used to trick parsers and users. These characters are meant to circumvent anti-spam rules by causing an otherwise valid pattern not to match. eg. a 'zero-width space' could be added to the middle of the word "viagra" like "viagra" and it would no longer hit a simple match rule, while still looking correct to the user.
More information at here.
MC_ESCURL
Detects bad characters in an URL of the message
MC_FREEMAIL_BODY
Detects the use of a "freemail" address in the body of a message. Freemail addresses are mails where one can easily register without giving real information about himself. (for example : gmail.com yahoo.com hotmail.com ...) Spams often contain such mails in the body and ask the recipient of the message to answer to this email address.
MC_KREDIT
The term "kredit" "credit" is present in the body of the mail
MC_MAILTO_WITH_SUBJ_ORDER
Contains a link to send an email with Subjet order/commande/bestellung
MC_MESSAGESNIFFER
This rule give a score when the message was identified as spam by our partner MessageSniffer.
MC_URI_EASYMONEY_LVL4
(MailCleaner rule) Message contains a sentence like "claim your free copy" or "Check secret story". This rule detects sentences done with this pattern "one of the words(claim see check) + your + one of the word(free full secret) + one of the word (copy story)"