It is funny that people say that no one has written software that does Beysian filtering in order to reject Spam at the SMTP level, just when I felt like writing up my mimedefang setup. I am of the school of thought that the only acceptable method of dealing with spam is to reject it at the SMTP protocol level. To accomplish this, I run crm114 and spamassassin from my mimedefang-filter, which runs as a milter while my sendmail is processing mail. I also do conditional grey listing, more on that below.

The reason for running two different Spam detection mechanisms is that while one or the other is often fooled and mis-categorizes mail, the score or confidence level of such mis-categorization is low, and the mechanism that is not fooled enables the combined system to do the right thing in most cases. I tend to quarantine and keep even the messages I reject for about a month, and do spot checks, so out of roughly a 800 mails a day volume, I get about one or two spam messages a day, and did not ever reject a mail I manually would classify as ham in the 6 months n which I double checked every decision taken by my system, and in the spot checks I continue to perform. While this is not as good a record as the author of crm114 reports, it is good enough for me.

The grey listing: since I try to be conservative, and opt to lean towards not discarding any email that is legitimate at the expense of not trapping all spam, and since I do not discard any mail I accept unread based on an automated decision, I was getting more Spam than I cared for still. So, I added the grey listing layer -- but only for mail that were not ambiguously spam or viruses (which are rejected), or ham, which are accepted for delivery. Most of the Spam that was slipping through came from the grey area. So, I implemented grey listing only for these mails.

Grey listing happens on a triplet of the class C domain of the senders IP, the sender, and recipient. A grey listed email encounters processing problems, and is temporarily rejected with a 503 for 30 minutes, and then retries are allowed through for 4 hours, in which case the triplet is white-listed for 35 days (catches monthly announcement messages). Known Spam resets the white-listing, and retrying past the window does too.

Doing it this way, known good correspondents suffer no delay, unambiguous mail from strangers flows right through, and I improve my filters based on the results all the time, so my discriminator improves.

My filter is available for examination. See also the older posting, like the time honoured tradition of /., this posting is a duplicate.


