Results 1 to 1 of 1
2003-05-17, 19:04 #1
- Join Date
- Mar 2001
- Hampton, Virginia, USA
- Thanked 0 Times in 0 Posts
Free Bayesian spam filtering for Outlook (Outlook 2000 and 2002 (XP))
I get anywhere from 20 to 50 spam messages per day, so I've been looking at, and trying, a lot of potential solutions. I've come to favor the Bayesian approach, a nice introduction to which (A Plan for Spam by Paul Graham) can be found at http://www.paulgraham.com/spam.html.
I tried to use Spammunition (http://www.upserve.com/spammunition/default.asp) for a while. It's freeware, and though it did it's job quite well (after training, it correctly filtered about 85%-90% of my spam messages with no false positives), it's written for Outlook 2000 and tends to lock up Outlook 2002. Those of you running Outlook 2000 might want to give it a try.
What I'm using now, and it seems to work fine, is also freeware (GNU License). Its a combination of Popfile (http://popfile.sourceforge.net/) and Outclass (http://www.vargonsoft.com/Outclass/). Popfile is a Bayesian mail classification utility for POP3-based email accounts, and Outclass provides POPFile's functionality natively to Outlook. Outclass is what enables you to use the POPFile's capabilities to classify Outlook email, including IMAP-based and Exchange-based email.
If you're going to try this, install POPFile first. When given the option, deselect "run automatically" because you don't ever really want to run POPFile. It just has to be installed on your machine so that Outclass can use its routines. Then go ahead and install Outclass. It will attach itself to Outlook & be good to go. The next time you start Outlook, Outclass will ask you to specify your inbox and spam folders. You can then proceed to train it with collections of good and bad messages. If you already have a folder full of spam that you've previously identified, this part is a snap.
My experience with Outclass so far is that it works quite well. If anything, it's a little better than Spammunition at identifying spam, but it also mis-classified a few good messages at first - mostly tech newsletters to which I subscribe. Of course, the Bayesian classifier refines its probabilities each time you correct it, so that it only gets better over time. Now, after 2 weeks of use, it hardly ever makes a mistake.
Since you'll want to check every so often for mis-classified messages, both for the purpose of recovering good ones and for training the classifier, I offer one additional suggestion that I have found helpful. I set the view in my spam folder to "unread messages". Then each day (maybe once or twice a day at the beginning) I look over the messages, re-classify any good ones, and use Edit|Mark All As Read to make the others disappear. That helps keep the process tractable.
Regards to all - I've gotten a lot of help from the Lounge, and it's nice to be able to give something back.