Improving the effectiveness and accuracy of SpamAssassin configuring RBL and WL rule scores on Debian

SpamAssassin effectiveness at non-english language spam may be less than desirable, and when you starting to use bayesian rules, you do not have enough spam and ham specimens for training.
Also, you can understand SpamAssassin as a multilayer protection, and you could configure each level for improving results.
At the side of spammers, you can think as sequential spam waves attacks, each during minutes to hours.
Your e-mail address may be listed for attack on a given spam wave attack. The first spam wave attack is the worst because the messages are still unkown to submission RBL  URIBL, spamtraps and rules, and your e-mail addresses may be scheduled to be on it at the very first minutes.
Each layer protection could start dealing with spam at a given wave onwards.
Each layer increments protection effectiveness and accuracy, but each one is more and more delayed in deployment and less accurate, broadly speaking.
You have bayesian filters, submission RBL and URIBL, then RBL URIBL and WL, SpamAssassin rules, automatic whitelists (auto email X ip scoring on ham/spam in spamassassin lingo), in-house rules.
Obviously it is an oversimplification. Borders are not so clear.
When you still dont have enough spam and ham samples for training bayesian filter, your first line of defense will be the antispam blacklists and whitelists.
Submission RBL and URIBL are fast to include new spam, but have noise (false positives). SpamAssassin under Usermin uses the following submission RBL and URIBL: SpamCop, Pyzor, Razor2. SpamCop for reading. Pyzor and Razor2 for submitting and reading.
It may take some time for the spam servers, botnets, and spamvertized sites to reach spam submission blacklists, spam traps and URI spamvertized blacklists.
The more conservative and less automatic, and thus more accurate with less false positives, the more time it takes to be listed on very high quality blacklists.
https://www.intra2net.com/en/support/antispam/index.php
False positives are more harmful than false negatives, as you could lose business or another important message.
So our focus will be on the RBL and WL with near zero false positives, the highest possible quality block/allow lists.
And reduce scores for low quality lists for brazilian portuguese spam.
The rule score is defined in 50_scores.cf:
less  /var/lib/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf
score URIBL_PH_SURBL 0 0.001 0 0.610 # n=0 n=2
These 4 scores are defined as local, net, with bayes, with bayes+net.
Net means you have network tests enabled, local means you don't have network tests enabled.
https://mail-archives.apache.org/mod_mbox/spamassassin-users/201112.mbox...
These scores are generated at SpamAssassin project, using spam and ham samples datasets to train a single layer neural network Perceptron and other data sets to verify effectiveness of scores.
The scores are the less ones enough to get 1 false positive at each 2500 samples.
http://spamassassin.apache.org/full/3.0.x/dist/masses/README.perceptron
https://en.wikipedia.org/wiki/Perceptron
We will use simple score approach at this SUGGESTION for rules scores. One score for all conditions for each rule. Native rules are less than 5 score. Our previous Automatic WhiteList (auto score) could reach -14 at start collecting statistics.
This is the third article with different antispam protection improvements for SpamAssassin deployments on Debian. Each improvement is cumulative.
The more spam and ham samples you have, you could comment out less quality RBL and URIBL to leave scores at SpamAssassin rule updates defaults.
We suggest to wait at least 3 months before starting to comment out custom scores and go back to use SpamAssassin native scores. Also, keep evaluating the results at YOUR servers each day until you feel confortable with them.
Also, read the other "Related Content" articles at this site regarding antispam and SpamAssassin linked.


# AFM 20150728 read updated inaccuracy statistics at https://www.intra2net.com/en/support/antispam/index.php
# AFM 20150728 http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_current_release_3.4.x/rules/50_scores.cf
# AFM 20150729 scores with conditions http://www.futurequest.net/docs/SA/
# score RCVD_IN_SORBS_DUL 3 ##sent directly from dynamic IP address
##score RCVD_IN_SORBS_WEB 3 ##abusable web server
# score RCVD_IN_SORBS_SMTP 3
# score RCVD_IN_SORBS_HTTP 3
##AFM 20150728 ## very high quality rbl http://www.spamhaus.org
score RCVD_IN_XBL 10
score RCVD_IN_PBL 10
score RCVD_IN_SBL 10
score __RCVD_IN_ZEN 10
##AFM 20140728 ##some spamhaus very high quality uribl https://svn.apache.org/repos/asf/spamassassin/trunk/rules/25_uribl.cf
score URIBL_DBL_SPAM 10 ##spamhaus dbl uri rbl
score RCVD_IN_SBL_CSS 10 ##spamhaus auto-expire ip range rbl
score URIBL_SBL 10
score URIBL_SBL_A 10
score URIBL_DBL_SPAM 10
score URIBL_DBL_PHISH 10
score URIBL_DBL_MALWARE 10
score URIBL_DBL_BOTNETCC 10
score URIBL_DBL_ABUSE_SPAM 10
##score URIBL_DBL_ABUSE_REDIR 1
score URIBL_DBL_ABUSE_PHISH 10
score URIBL_DBL_ABUSE_MALW 10
score URIBL_DBL_ABUSE_BOTCC 10
score RCVD_IN_BRBL_LASTEXT 7 ##high quality http://www.barracudacentral.org/
score RCVD_IN_RP_RNBL 5 ##quality list open relay Return Path http://senderscore.org
# score RAZOR2_CF_RANGE_E8_51 3
# score RAZOR2_CHECK 3
# score RAZOR2_CF_RANGE_51_100 2
# score RAZOR2_CF_RANGE_E8_51_100 3
score URIBL_BLACK 3.5 ##quality uribl http://www.uribl.com
score RCVD_IN_PSBL 7 ##a spamtrap passive rbl with self service removal
# score URIBL_RED 3
##score PYZOR_CHECK 3
# score RCVD_IN_BL_SPAMCOP_NET 3 ##user submission, 24h automatic delisting
##AFM 20150728 999 only triggers after and keep 99, it SUMS both results
# score BAYES_99 3.5
score BAYES_999 1.5
score RCVD_IN_DNSWL_MED 0 ##low quality whitelist
# score RCVD_IN_SORBS_SOCKS 3 ##open socks proxy server
# score RCVD_IN_SORBS_DUL 4
##score URIBL_GREY 3 ##doubtful uribl, leave scoring to SA experts
score URIBL_SC_SURBL 5 ##high quality spamvertized http://www.surbl.org/ http://wiki.apache.org/spamassassin/Rules/URIBL_SC_SURBL uribl https://www.intra2net.com/en/support/antispam/blacklist.php_dnsbl=URIBL_SC_SURBL.html
score URIBL_MW_SURBL 7 ##malware spreading uri
score URIBL_JP_SURBL 5 ##good quality uribl http://www.surbl.org/lists#jp
score URIBL_PH_SURBL 7 ##very high quality uribl http://www.surbl.org/lists#ph
score URIBL_WS_SURBL 5 ##high quality uribl list http://www.surbl.org/lists#ws
score URIBL_AB_SURBL 5
## score URI_OBFU_WWW 3 ##obfuscated URI
## score URIBL_RHS_DOB 3 ##day old URI sending mass mail
## score URIBL_DBL_ABUSE_REDIR 1 ##abused legit redirector like bit.ly
score RCVD_IN_DNSWL_LOW 0 ##is a low quality white list you must enable SA plugin to report spam and remove their whitelisting score
score RCVD_IN_DNSWL_NONE 0 ##is a low quality white list
##score RCVD_IN_MSPIKE_BL 2
# score RCVD_IN_MSPIKE 1 ##it sums with other mspike levels
##score RCVD_IN_MSPIKE_L3 3
##score RCVD_IN_MSPIKE_L4 3
# score RCVD_IN_MSPIKE_L5 4.9
score RCVD_IN_MSPIKE_WL 0 ##is a low quality white list
score RCVD_IN_MSPIKE_H2 -0.001 ##is a low quality white list
score RCVD_IN_MSPIKE_H3 -0.001 ##is a low quality white list
score RCVD_IN_MSPIKE_H4 -0.001 ##is a low quality white list
score RCVD_IN_MSPIKE_H5 -0.001 ##is a low quality white list
score RCVD_IN_IADB_LISTED 0 ##is a low quality white list http://www.isipp.com/iadb.php
score RCVD_IN_IADB_SPF 0 ##is a low quality white list http://www.isipp.com/iadb.php
score RCVD_IN_IADB_DK 0 ##is a low quality white list http://www.isipp.com/iadb.php
score URI_WP_DIRINDEX 5 ##compromised WordPress site, possibly malware
score RCVD_IN_IADB_RDNS 0 ##is a low quality white list http://www.isipp.com/iadb.php
score RCVD_IN_IADB_SENDERID 0 ##is a low quality white list http://www.isipp.com/iadb.php
score RCVD_IN_IADB_VOUCHED 0 ##is a low quality white list http://www.isipp.com/iadb.php
#score URI_PHISH 5 ##Phishing using web form
score RP_MATCHES_RCVD 0 ##low quality whitelist rule
score RCVD_IN_RP_CERTIFIED 0 ##low quality whitelist
score RCVD_IN_RP_SAFE 0 ##low quality whitelist




Bibliography
less  /var/lib/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf
https://mail-archives.apache.org/mod_mbox/spamassassin-users/201112.mbox...
scores
https://github.com/apache/spamassassin/blob/a7013d199051976ab3a67451b19d...
https://github.com/apache/spamassassin/blob/7d78a4bba51014e30c86a713e997...
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_uri_tests.cf
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_dnsbl_tests.cf
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/50_scores.cf
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/73_sandbox_man...
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/STATISTICS-set...
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/STATISTICS-set...
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/STATISTICS-set...
https://svn.apache.org/repos/asf/spamassassin/trunk/rules/STATISTICS-set...
http://wiki.apache.org/spamassassin/Rules/
http://spamassassin.apache.org/full/3.0.x/dist/masses/README.perceptron

Comentários

Postagens mais visitadas deste blog

Tutorial Cyrus IMAP aggregator (murder) 2.3.16 sobre Debian GNU Linux 5.x Lenny

How to configure multipath for high availability and performance on Debian and CentOS for storage at IBM DS8300 SAN

Como instalar Oracle Client no Debian e Ubuntu