"Instead of using pattern matching and a set of human generated rules SpamProbe relies on a Bayesian analysis of the frequency of words used in spam and non-spam emails received by an individual person. The process is completely automatic and tailors itself to the kinds of emails that each person receives"

It's based on this article, which describes a Bayesian filter algorithm:

Here is a LinuxJournal article that describes SpamProbe and how to set it up:

That's basically what I followed. The only real difference in my .procmail, is that with MH folders you need to add a period '.' to the end of the destination mailbox names.

The trick is to get it initially trained properly. After that, it's like zero maintenance. What I did, was push almost 500 megs of mail archives into it, using a command like this to find all the messages in my MH folders, handle spaces in folder names, and ignore spam folders:

find . -type f |grep -v spam | sed -e 's/^/"/' -e 's/$/"/' | xargs spamprobe good

Then, go into your spam folder:

spamprobe spam *

If you're using MBOX folders (the default for UW-IMAP and the /var/mail mail spools), then simply add a -m before the command:

spamprobe -m good /var/mail/$USER


See also SpamFiltering, BogoFilter, ContentFiltering