SPAM Assassin

Fri May 10 17:37:29 UTC 2024

Wow! Plenty to digest here! Thanks Bob!

~Michael

------ Original Message ------
>From "Bob Proulx" <bob at proulx.com>
To nclug at nclug.org
Date 5/10/2024 11:25:18 AM
Subject Re: SPAM Assassin

>Hello Michael,
>
>Michael D Wailes wrote:
>>  My inbox is getting bombed with SPAM right now -- so bad that I've had to
>>  set the SPAM Assassin Threshold Score to 1 and am still getting 40-60 emails
>>  a day that are slipping through.
>
>Ouch!
>
>First, something is not working right with your SpamAssassin because a
>threshold of 1 should tag almost all emails as spam.  These days
>almost every email will have at least one point attached to it.  So if
>that isn't working I would debug things and try to figure that out.
>
>To debug spamassassin what I do is I run it on a single message with
>debug on and then look at what it is doing.  It's somewhat tedious but
>it's what I do.
>
>     | spamassassin -d -t -D all 2>&1 | less
>
>I do this from within a mutt running in a terminal so I end up in a
>terminal less.  But if you are using a graphical client then save the
>message to a file then then run spamassassin on the file.
>
>     spamassassin -d -t -D all < messagefile 2>&1 | less
>
>Pay particular attention to the trusted networks.  Is this working
>correctly for your network?
>
>What MTA are you using?  I am using Postfix and I have a number of
>anti-spam features in my configuration.  I'll share my full
>configuration here and then break it down somewhat.
>
>     inet_protocols = ipv4
>     smtpd_milters = unix:/var/run/opendkim/opendkim.sock
>     non_smtpd_milters = unix:/var/run/opendkim/opendkim.sock
>     smtpd_discard_ehlo_keywords = silent-discard, dsn, chunking
>     smtpd_data_restrictions = reject_unauth_pipelining
>     header_checks =
>             regexp:/etc/postfix/backscatter_header_checks.regexp,
>             regexp:/etc/postfix/sender_checks.regexp
>     smtpd_recipient_restrictions =
>             permit_mynetworks,
>             permit_sasl_authenticated,
>             reject_unauth_destination,
>             check_client_access hash:/etc/postfix/client-allow,
>             reject_invalid_hostname,
>             reject_non_fqdn_hostname,
>             reject_non_fqdn_sender,
>             reject_non_fqdn_recipient,
>             reject_unknown_sender_domain,
>             reject_unknown_recipient_domain,
>             check_recipient_access regexp:/etc/postfix/ext-access.regexp,
>             check_recipient_access hash:/etc/postfix/recipient-access,
>             check_recipient_access regexp:/etc/postfix/recipient-access.regexp,
>             check_helo_access hash:/etc/postfix/helo-access,
>             check_client_access hash:/etc/postfix/client-access,
>             check_sender_access hash:/etc/postfix/sender-access,
>             reject_rbl_client zen.spamhaus.org=127.0.0.[2..11],
>             reject_rhsbl_sender dbl.spamhaus.org=127.0.1.[2..99],
>             reject_rhsbl_helo dbl.spamhaus.org=127.0.1.[2..99],
>             reject_rhsbl_reverse_client dbl.spamhaus.org=127.0.1.[2..99],
>             warn_if_reject reject_rbl_client zen.spamhaus.org=127.255.255.[1..255]
>
>Don't just use that directly.  All of those files listed are files
>that I have on my system that you won't have on your system.  You
>would need to understand them and create them custom for your system.
>But there are a few things that I would immediately recommend.
>
>     inet_protocols = ipv4
>
>I only use IPv4 for SMTP for email.  Eventually it will be required
>that IPv6 be used but as of today that is not required and using IPv6
>requires some additional special handling.  Google is much more strict
>with incoming IPv6 for example due to the additional spam load.
>Easier to avoid it for the moment.
>
>     smtpd_milters = unix:/var/run/opendkim/opendkim.sock
>     non_smtpd_milters = unix:/var/run/opendkim/opendkim.sock
>
>These attach a configured OpenDKIM daemon to Postfix.  I use it to
>DKIM sign my outgoing email but it also verifies incoming email. I
>don't block due to invalid DKIM but I do use it with other rules to
>score incoming email.
>
>     smtpd_discard_ehlo_keywords = silent-discard, dsn, chunking
>     smtpd_data_restrictions = reject_unauth_pipelining
>
>These are needed due to recent abuse attacks.  Safe to add.
>
>     smtpd_recipient_restrictions =
>             permit_mynetworks,
>             permit_sasl_authenticated,
>             reject_unauth_destination,
>
>These are a good order and safe to add.
>
>             check_client_access hash:/etc/postfix/client-allow,
>
>I have an file specifically allowing certain clients by IP address
>that I never want to block.  In that file I list those as OK and any
>mail from them is accepted.  This must come first because some of
>these must be allowed because these internal systems violate the
>limits that I include next.
>
>             reject_invalid_hostname,
>             reject_non_fqdn_hostname,
>             reject_non_fqdn_sender,
>             reject_non_fqdn_recipient,
>             reject_unknown_sender_domain,
>             reject_unknown_recipient_domain,
>
>Those are all good and safe to add.
>
>             check_helo_access hash:/etc/postfix/helo-access,
>
>I found a lot of spammers tried to spoof my own email server.  Really?
>This is in my helo-access file.
>
>     # Reject anybody that HELO's as being in our own domains.
>     # Since this occurs after permit_mynetworks this does not
>     # reject local clients.
>     proulx.com      REJECT  You are not proulx.com.
>
>     # Somebody HELO'ing as 'localhost'?  Won't hit because localhost is not a FQDN.
>     localhost       REJECT  You are not localhost.
>
>     # Somebody HELO'ing as our IP address?
>     198.99.81.74   REJECT  You are not 198.99.81.74
>
>Then I use a Makefile to always keep helo-access.db up to date with
>regards to the source helo-access file.  But manually create it with.
>
>     postmap helo-access
>
>Continuing on...
>
>             reject_rbl_client zen.spamhaus.org=127.0.0.[2..11],
>             reject_rhsbl_sender dbl.spamhaus.org=127.0.1.[2..99],
>             reject_rhsbl_helo dbl.spamhaus.org=127.0.1.[2..99],
>             reject_rhsbl_reverse_client dbl.spamhaus.org=127.0.1.[2..99],
>             warn_if_reject reject_rbl_client zen.spamhaus.org=127.255.255.[1..255]
>
>These require that the system is running its own nameserver.  DNS
>queries are rate limited and if the ISP or other large system
>nameserver is used then it will be blocked by being rate limited.  I
>install bind9 and use it as a local caching nameserver.  Allowing the
>above to be used.  My /etc/resolv.conf has this.
>
>     search proulx.com
>     nameserver 127.0.0.1
>
>Using a local nameserfer the Spamhaus checks can be used and those are
>by far the biggest help in blocking incoming spam.  Highly
>recommended.  Don't think twice about it.  Just do it.
>
>The three lines for the Spamhaus DBL though I find only block a very
>few emails a month.  They just don't provide much grip.  But ZEN is a
>serious help.  Highly recommended.
>
>This does not remove the need to run SpamAssassin or other anti-spam
>after this point.  But it's the first stage in the pipeline of mail.
>
>For the mailing lists I am using SpamAssassin and a bunch of
>customization.  I still recommend that as a general statement.  But
>for my own personal email box I have actually dropped SpamAssassin
>from it entirely!  I am using *only* the CRM114 discriiminator now and
>it is doing very well for me.  But that's probably due to my
>customized email handling situation with custom procmail rules.  But
>regardless it is still necessary to have a good SpamAssassin
>installation.  Debug why it is failing for you.  It's necessary.
>
>SpamAssassin depends heavily upon the Bayes machine learning engine.
>I suspect it is the problem and is not working.  It is critical that
>the Bayes engine be trained on email.  The best training is to train
>on error.
>
>As email is classified as spam it goes into my spam folder.  I review
>the spam folder every day looking for misclassified messages.  If I
>find one then I remove it from the spam folder and send it through
>SpamAssassin for training as non-spam.
>
>     sa-learn --ham
>
>As email is classified as non-spam it goes into my inbox.  As I find
>spam in my inbox I remove it from there and send it through
>SpamAssassin for training as spam.
>
>     sa-learn --spam
>
>SpamAssassin itself I turn off the Bayes expiration with every message
>because it takes a long time.  I have this in my
>~/.spamassassin/user_prefs file.
>
>     # Stop SA from running bayes expires (takes long time) during message
>     # handling.  But now must have cronjob run sa-learn --force-expire!
>     bayes_auto_expire 0
>
>And then I have a personal cronjob that runs the expiration twice a day.
>
>     0 1,13 * * *    test -d $HOME && sa-learn --force-expire >/dev/null
>
>Some other important things to configure about SpamAssassin.
>
>     # Default is 150000 tokens which on this busy system is one day.
>     # Increase and then check if the tokens save more.
>     bayes_expiry_max_db_size 500000
>
>Give it a larger token database.  This helped quite a bit for me.
>
>     trusted_networks 198.99.81.74/22
>     internal_networks 198.99.81.74/22
>     trusted_networks 192.168.0.0/16
>     internal_networks 192.168.0.0/16
>     whitelist_bounce_relays joseki.proulx.com
>
>These above make sense for my network and everything depends upon what
>you have on your network.  While you are debugging you will see if the
>SpamAssassin DNSBL rules are working correctly.  You might need to
>make adjustments to trusted_networks, internal_networks, and for any
>whitelist_bounce_relays that you have in your setup.
>
>>  I've attempted to also blacklist the domains that are consistently sneaking
>>  through but don't think I have those set correctly. I'm hoping someone here
>>  can offer some insight.
>>
>>  Most of these spammers are using sub-domains such as, m.domain.com, so I've
>>  been setting the domain in the blacklist filter like this:
>>  *domain.com
>>
>>  Shouldn't that cover any and all traffic from the specified domain?
>
>I don't do any blacklisting in my .spamassassin/user_prefs rules.  I
>do all of my additional email filtering in my ~/.procmailrc rules.  Of
>which I have an extensive allowlist of rules.
>
>My philosophy for my own filtering is that I put things I want and
>find in my spam folder into an allowlist in my procmailrc file.  That
>means that things that fall through there are more likely to be spam
>due to it not being in my allow list.  Then feeding all of that more
>likely to be spam to SpamAssassin/CRM114 and let it sort out the
>remaining email.  This works well for me.
>
>I still must review my spam folder every day.  It's pretty easy to
>scan through the sea of spam that is filed there.  And then my eye can
>pick out the odd non-spam message that comes through there every so
>often.  And also if I am signing up for something and it sends me
>email and I don't see it in my inbox I always suspect it went to my
>spam folder and look there and only need to look at the most recent to
>find those easily.  YMMV.
>
>Bob