[NCLUG] SpamAssassin Testimonials?

Bob Proulx bob at proulx.com
Wed Jul 23 22:19:28 MDT 2003


Sean Reifschneider wrote:
> Bob Proulx wrote:
> >SA folks are hoping the Bayesian Inferencing engine will compensate
> >here.  I am less confident in the current Bayes implementation.
> 
> I've been plugging stuff into the Bayes stuff, but I'm not sure it's
> been helping.

What I don't like about the current implementation is that it is only
looking at words.  If "mortgage" is seen and it has learned that 95%
of the time when that word is seen the message is likely spam then it
might think it is spam.  But I would like to see a markovian chain
used instead to find the probability that certain words together such
as "make money fast" are more likely to be spam.  (And by just
mentioning these words I wonder how many list members won't ever see
this message because it will be filed as spam. :-)

> I used to use a Razor as one of the settings, and I think
> that helped a lot.  I need to look into re-enabling that again.

It seems that some people get good use out of razor and others don't.
I don't.  When I looked at messages which were NOT tagged by SA but
were seen in Razor that number was around 1.5%.  That was just not
enough to really get excited about.  And Razor by itself had too many
false positives to use it as the primary method of tagging.  It seems
that some people will always file non-spam messages into it.

DCC is better because it makes no claim about spaminess.  All you know
with DCC is that someone else got the same message as you got.  Which
is a good indicator for some things like directed email but obviously
everyone on the mailing list got the same mailing list message.  But
basically it takes the false positive question out of the equation
entirely.  If it was in DCC then you know someone else saw the same
message and it is some type of bulk mailing.  You have to decide after
that if it is spam or not.  All of the controversy about trusting the
reporter to only report spam, and also the problem that one persons
junk is another persons treasure, which continues to hound Razor is
completely gone with DCC.  It does not address the issue of spamminess.

Also, when using the checksum methods, someone has to see the spam
first so that it can get filed.  There will always need to be someone
who is on the front line.  If you are the first person to receive the
spam then it won't be in any checksum database.

> I've been training all the spam that's been getting through for nearly a
> month now and it hasn't helped as much as I'd hoped.  I took a look at
> the spam that was getting through and dropping it from 6.5 to 5 really
> wouldn't have gotten rid of that much.  Maybe only 20%.  3 to 3.5 would
> get rid of about 65%, but a suprising number of the messages were coming
> in between 0 an 1, about 15%...

Would it be possible for you to craft improved rules that would catch
the class of spam which is getting through?  If those were suggested
back upstream to the SA developers it could be added to the current
checks.  Right now all of the rules are frozen and they are tweaking
scores in preparation for the upcoming SA-2.60 release.  This is a
cyclic lifecycle of big changes, little changes, freeze and tweak,
release, repeat.  At his moment things are almost ready for release.

> I'm ok with pushing it down fairly far, because users will now get
> immediate feedback if their mail has been marked as spam and can cause
> it to get delivered to me fairly easily.  So far over the last 2 days,
> 1417 messages have been delivered through and 2966 have been held in the
> spampit.  No pitted messages have been confirmed.

Impressively large numbers.  Are you using any RBLs too?  If not then
I suggest using a list of open relays and blocking them as well.  That
might cut your numbers down significantly.  With open relay lists
there is very little collateral damage since the tests are fully
automated.

I also use bl.spamcop.net which lists reported UBE.  Historically
these have been controversial.  But it works very well and spamcop has
been doing a reasonbly good job of it overall.  They ask for donations
to support the cause and I am a subscriber since they work so well for
me.

Here is last weeks stats of mail blocked using these on my server.
Also, order matters.  The first one in the list that matches stops
searching through the rest of the list.  Looking at this I should
reorder my checks.

     14 relays.ordb.org
    948 bl.spamcop.net
     65 list.dsbl.org
     52 relays.osirusoft.com

> >b) If you get email from your stock broker or an automated account it
> >will never be acked.
> 
> That's true.  If my stock broker sends me a message that gets caught by
> SpamAssasin, they probably have bigger problems than just having to
> confirm the message.

I disagree.  But it all depends upon your attitude toward email.  If
you count it as opportunistic only then you only lose out on the
opportunity and no big deal.  But if you count on it for more than
that then you are still out the message.

> >c) This tends to interact badly with mailing
> >lists (seen it several times now) so you need to make sure they are

> something anyway.  Plus, it's only SPAMMY messages that would trigger
> it, so subscribe confirmations should get through no problem, etc.

Actually the problem is a list message will look spammy, the TMDA will
send a confirmation message to the mailing list, everyone on the list
will gripe about it.  It is at least as bad as sending "testing,
ignore" messages to mailing lists.

> >by replying (with a tmda confirmation) the ack will most likely be
> >undeliverable and will add to the mail congestion.
> 
> Maybe if the congestion problem gets bad enough it'll cause something to
> actually be done about providers that are harboring spammers.

But the congestion will be on your mail servers.  If you can actually
get back to the spammer I am surprised.

> Goodbye Asia.  ;-)

Of course Asia is a huge source of spam.  But a lot of that spam is
still originating in the US by spammers which are using open relays in
Asia.  As an example of the problem a (trying to remember) Korean
school system recently set up all of their computers systems as a copy
of the same magic image.  Tragicomically the image was configured as
an open relay.  Which meant there were a few thousand open relays all
identical and easy to find.

> >f) If the reply works then you are likely to get on more spam lists.
> 
> Maybe.  Some spammers are using VERPs and the like so that replies to
> the sender address cause you to get taken off the list.

I suggested that myself once on one of the spam discussion lists.  The
general consensus, after the hysterical laughter died out, was that
spammers never take anyone off of their lists.  I think I agree since
I am still seeing addresses in my logs which were on machines I
decommissioned many years ago.  I had 244 of those last week alone.

> but I'm just not concerned with trying to stay off the lists.  I
> believe that's a futile endevor.

Agreed.

Really the program we are looking for is not SpamAssassin but instead
SpammerAssassin, which would be much more affective at dealing with
the problem.  And it would be consistent with the new US world policy
too.

Bob



More information about the NCLUG mailing list