SPAM Assassin

Fri May 10 17:25:18 UTC 2024

Hello Michael,

Michael D Wailes wrote:
> My inbox is getting bombed with SPAM right now -- so bad that I've had to
> set the SPAM Assassin Threshold Score to 1 and am still getting 40-60 emails
> a day that are slipping through.

Ouch!

First, something is not working right with your SpamAssassin because a
threshold of 1 should tag almost all emails as spam.  These days
almost every email will have at least one point attached to it.  So if
that isn't working I would debug things and try to figure that out.

To debug spamassassin what I do is I run it on a single message with
debug on and then look at what it is doing.  It's somewhat tedious but
it's what I do.

    | spamassassin -d -t -D all 2>&1 | less

I do this from within a mutt running in a terminal so I end up in a
terminal less.  But if you are using a graphical client then save the
message to a file then then run spamassassin on the file.

    spamassassin -d -t -D all < messagefile 2>&1 | less

Pay particular attention to the trusted networks.  Is this working
correctly for your network?

What MTA are you using?  I am using Postfix and I have a number of
anti-spam features in my configuration.  I'll share my full
configuration here and then break it down somewhat.

    inet_protocols = ipv4
    smtpd_milters = unix:/var/run/opendkim/opendkim.sock
    non_smtpd_milters = unix:/var/run/opendkim/opendkim.sock
    smtpd_discard_ehlo_keywords = silent-discard, dsn, chunking
    smtpd_data_restrictions = reject_unauth_pipelining
    header_checks =
            regexp:/etc/postfix/backscatter_header_checks.regexp,
            regexp:/etc/postfix/sender_checks.regexp
    smtpd_recipient_restrictions =
            permit_mynetworks,
            permit_sasl_authenticated,
            reject_unauth_destination,
            check_client_access hash:/etc/postfix/client-allow,
            reject_invalid_hostname,
            reject_non_fqdn_hostname,
            reject_non_fqdn_sender,
            reject_non_fqdn_recipient,
            reject_unknown_sender_domain,
            reject_unknown_recipient_domain,
            check_recipient_access regexp:/etc/postfix/ext-access.regexp,
            check_recipient_access hash:/etc/postfix/recipient-access,
            check_recipient_access regexp:/etc/postfix/recipient-access.regexp,
            check_helo_access hash:/etc/postfix/helo-access,
            check_client_access hash:/etc/postfix/client-access,
            check_sender_access hash:/etc/postfix/sender-access,
            reject_rbl_client zen.spamhaus.org=127.0.0.[2..11],
            reject_rhsbl_sender dbl.spamhaus.org=127.0.1.[2..99],
            reject_rhsbl_helo dbl.spamhaus.org=127.0.1.[2..99],
            reject_rhsbl_reverse_client dbl.spamhaus.org=127.0.1.[2..99],
            warn_if_reject reject_rbl_client zen.spamhaus.org=127.255.255.[1..255]

Don't just use that directly.  All of those files listed are files
that I have on my system that you won't have on your system.  You
would need to understand them and create them custom for your system.
But there are a few things that I would immediately recommend.

    inet_protocols = ipv4

I only use IPv4 for SMTP for email.  Eventually it will be required
that IPv6 be used but as of today that is not required and using IPv6
requires some additional special handling.  Google is much more strict
with incoming IPv6 for example due to the additional spam load.
Easier to avoid it for the moment.

    smtpd_milters = unix:/var/run/opendkim/opendkim.sock
    non_smtpd_milters = unix:/var/run/opendkim/opendkim.sock

These attach a configured OpenDKIM daemon to Postfix.  I use it to
DKIM sign my outgoing email but it also verifies incoming email. I
don't block due to invalid DKIM but I do use it with other rules to
score incoming email.

    smtpd_discard_ehlo_keywords = silent-discard, dsn, chunking
    smtpd_data_restrictions = reject_unauth_pipelining

These are needed due to recent abuse attacks.  Safe to add.

    smtpd_recipient_restrictions =
            permit_mynetworks,
            permit_sasl_authenticated,
            reject_unauth_destination,

These are a good order and safe to add.

            check_client_access hash:/etc/postfix/client-allow,

I have an file specifically allowing certain clients by IP address
that I never want to block.  In that file I list those as OK and any
mail from them is accepted.  This must come first because some of
these must be allowed because these internal systems violate the
limits that I include next.

            reject_invalid_hostname,
            reject_non_fqdn_hostname,
            reject_non_fqdn_sender,
            reject_non_fqdn_recipient,
            reject_unknown_sender_domain,
            reject_unknown_recipient_domain,

Those are all good and safe to add.

            check_helo_access hash:/etc/postfix/helo-access,

I found a lot of spammers tried to spoof my own email server.  Really?
This is in my helo-access file.

    # Reject anybody that HELO's as being in our own domains.
    # Since this occurs after permit_mynetworks this does not
    # reject local clients.
    proulx.com      REJECT  You are not proulx.com.

    # Somebody HELO'ing as 'localhost'?  Won't hit because localhost is not a FQDN.
    localhost       REJECT  You are not localhost.

    # Somebody HELO'ing as our IP address?
    198.99.81.74   REJECT  You are not 198.99.81.74

Then I use a Makefile to always keep helo-access.db up to date with
regards to the source helo-access file.  But manually create it with.

    postmap helo-access

Continuing on...

            reject_rbl_client zen.spamhaus.org=127.0.0.[2..11],
            reject_rhsbl_sender dbl.spamhaus.org=127.0.1.[2..99],
            reject_rhsbl_helo dbl.spamhaus.org=127.0.1.[2..99],
            reject_rhsbl_reverse_client dbl.spamhaus.org=127.0.1.[2..99],
            warn_if_reject reject_rbl_client zen.spamhaus.org=127.255.255.[1..255]

These require that the system is running its own nameserver.  DNS
queries are rate limited and if the ISP or other large system
nameserver is used then it will be blocked by being rate limited.  I
install bind9 and use it as a local caching nameserver.  Allowing the
above to be used.  My /etc/resolv.conf has this.

    search proulx.com
    nameserver 127.0.0.1

Using a local nameserfer the Spamhaus checks can be used and those are
by far the biggest help in blocking incoming spam.  Highly
recommended.  Don't think twice about it.  Just do it.

The three lines for the Spamhaus DBL though I find only block a very
few emails a month.  They just don't provide much grip.  But ZEN is a
serious help.  Highly recommended.

This does not remove the need to run SpamAssassin or other anti-spam
after this point.  But it's the first stage in the pipeline of mail.

For the mailing lists I am using SpamAssassin and a bunch of
customization.  I still recommend that as a general statement.  But
for my own personal email box I have actually dropped SpamAssassin
from it entirely!  I am using *only* the CRM114 discriiminator now and
it is doing very well for me.  But that's probably due to my
customized email handling situation with custom procmail rules.  But
regardless it is still necessary to have a good SpamAssassin
installation.  Debug why it is failing for you.  It's necessary.

SpamAssassin depends heavily upon the Bayes machine learning engine.
I suspect it is the problem and is not working.  It is critical that
the Bayes engine be trained on email.  The best training is to train
on error.

As email is classified as spam it goes into my spam folder.  I review
the spam folder every day looking for misclassified messages.  If I
find one then I remove it from the spam folder and send it through
SpamAssassin for training as non-spam.

    sa-learn --ham

As email is classified as non-spam it goes into my inbox.  As I find
spam in my inbox I remove it from there and send it through
SpamAssassin for training as spam.

    sa-learn --spam

SpamAssassin itself I turn off the Bayes expiration with every message
because it takes a long time.  I have this in my
~/.spamassassin/user_prefs file.

    # Stop SA from running bayes expires (takes long time) during message
    # handling.  But now must have cronjob run sa-learn --force-expire!
    bayes_auto_expire 0

And then I have a personal cronjob that runs the expiration twice a day.

    0 1,13 * * *    test -d $HOME && sa-learn --force-expire >/dev/null

Some other important things to configure about SpamAssassin.

    # Default is 150000 tokens which on this busy system is one day.
    # Increase and then check if the tokens save more.
    bayes_expiry_max_db_size 500000

Give it a larger token database.  This helped quite a bit for me.

    trusted_networks 198.99.81.74/22
    internal_networks 198.99.81.74/22
    trusted_networks 192.168.0.0/16
    internal_networks 192.168.0.0/16
    whitelist_bounce_relays joseki.proulx.com

These above make sense for my network and everything depends upon what
you have on your network.  While you are debugging you will see if the
SpamAssassin DNSBL rules are working correctly.  You might need to
make adjustments to trusted_networks, internal_networks, and for any
whitelist_bounce_relays that you have in your setup.

> I've attempted to also blacklist the domains that are consistently sneaking
> through but don't think I have those set correctly. I'm hoping someone here
> can offer some insight.
>
> Most of these spammers are using sub-domains such as, m.domain.com, so I've
> been setting the domain in the blacklist filter like this:
> *domain.com
>
> Shouldn't that cover any and all traffic from the specified domain?

I don't do any blacklisting in my .spamassassin/user_prefs rules.  I
do all of my additional email filtering in my ~/.procmailrc rules.  Of
which I have an extensive allowlist of rules.

My philosophy for my own filtering is that I put things I want and
find in my spam folder into an allowlist in my procmailrc file.  That
means that things that fall through there are more likely to be spam
due to it not being in my allow list.  Then feeding all of that more
likely to be spam to SpamAssassin/CRM114 and let it sort out the
remaining email.  This works well for me.

I still must review my spam folder every day.  It's pretty easy to
scan through the sea of spam that is filed there.  And then my eye can
pick out the odd non-spam message that comes through there every so
often.  And also if I am signing up for something and it sends me
email and I don't see it in my inbox I always suspect it went to my
spam folder and look there and only need to look at the most recent to
find those easily.  YMMV.

Bob