[NCLUG] SpamAssassin Testimonials?

Mon Jul 21 22:59:02 MDT 2003

Sean Reifschneider wrote:
> I just added a front-end to SpamAssassin to help with blocking spam.  I
> had my threshold cranked up to 6.5 so that fewer legitimate messages get
> caught.  However, lately I've been getting about 30 message a day that
> get past SpamAssassin (out of about 300 I get a day).

That is not surprising given your higher threshold.  And unfortunately
it is true that there is not any more than one size available for SA
and it is not a one size fits all.  You might need to adjust the rules
for you and run your own mass checks across your own corpus of spam
and non-spam.  I wish that were easier than it is.  And of course the
SA folks are hoping the Bayesian Inferencing engine will compensate
here.  I am less confident in the current Bayes implementation.

[Sean of course knows this and is making a compromise.  But for those
on the list which might not and are trying to decide if they should
use SA or not, SA is tuned so that messages tagged with 4.9 or less
are not spam and 5.0 or more are expected to be spam.  It uses a
learning algorithm across a large corpus of past spam/non-spam.  It
churns and learns and adjusts the scores given to indicators in the
message until non-spam scores less than 5 and spam scores 5 or higher.
That particular value is a magical value because the learning
algorithm was trained for it.  Other values not magic.  And it is not
linear.  Therefore changing the threshold can cause disproportionate
changes in the amount of spam or the amount of non-spam which is
tagged.]

> Today I worked on building and setting up a system that will do TMDA
> sorts of things with mail marked as spam.  After some testing, I plan to
> crank the threshold down to around 4.

I would still stick with 5 but I would also engage the Bayesian engine
too.  I get huge amounts of spam as well although not that much,
around 30-50 a day.  But very few are getting past SA.  Around 2-3 a
week.

> The idea is that mail marked as spam will get saved off and a reply sent
> to the sender address with a special key that needs to be used to
> deliver the message through.  The sender can then send another message
> to a special address to unlock their original message and let it
> through.

This really has an attraction to it.  I am interested in your
experience with it.  After you have some usage I hope you will report
back to the list.

> So, this is kind of the best of both worlds...  Mail that isn't spammy
> goes right through, but mail that is caught can be acked by the sender
> to allow it through.

Unfortunately there is a worst of both worlds factor as well.  a) TMDA
really has an annoyance factor.  Many on the net are vocal about this.
b) If you get email from your stock broker or an automated account it
will never be acked.  c) This tends to interact badly with mailing
lists (seen it several times now) so you need to make sure they are
whitelisted appropriately which is tedious up front work. d) TMDA
recognizes itself and won't have a problem if you are starting contact
with another TMDA user.  But TMDA can't possibly recognize all
possible competing TMDA implementations and there is no standard.  e)
by replying (with a tmda confirmation) the ack will most likely be
undeliverable and will add to the mail congestion.  f) If the reply
works then you are likely to get on more spam lists.

Of these b) is probably the most serious.  There will be times when
you need to accept an email from your grandmother even though it looks
like spam and you won't be able to convince them to ack it.  f) may
mean that once commited to TMDA that you can't stop or you will be
getting even more spam afterward.

> Unfortunately, the amount of spam that efm and I get really prevents
> us from combing through the messages marked as spam to find the
> non-spam.

I rarely go through the trash myself.  Only if I think I might have
something there do I browse it.  I have been fortunate that my false
positive rate has been extremely low.  Only people who rudely sent me
HTML mail have been in that list.  (Web pages were not meant to be
used as email.  Obviously I crank up HTML mail as a high spam
indicator.  Which for it is almost always is.  I realize some
non-techies like the flashiness available in html mail.  But I won't
hide my opinion that they are wrong to be using it like that.)

> I had tried to set up TMDA so that it would do this for me, but it
> wasn't really designed to function that way in the environment I'm using
> it in.

This comment leads me to believe you are doing your own TMDA
implementation.  True of false?  Or connecting the well known
http://tmda.net with http://spamassassin.org?

Bob