[GUFSC] HOWTO: Busting Spam with Bogofilter, Procmail and Mutt

Rafael R Obelheiro gufsc@das.ufsc.br
Mon, 18 Nov 2002 17:46:06 -0200


[http://www.linuxjournal.com/article.php?sid=6439]

HOWTO: Busting Spam with Bogofilter, Procmail and Mutt
Posted on Monday, November 11, 2002 by Nick Moffitt

   Make the latest spam-fighting software train itself while you read
   your mail normally with mutt.

Eric S. Raymond's bogofilter is a fast Bayesian spam filter that
implements the algorithm described in Paul Graham's A Plan For Spam.
To make it easy for all mutt users on my server to use it, I put the
following macros into the system-wide mutt configuration file,
/etc/Muttrc:

  s (save) is bound to run bogofilter -N before saving
  r,g, and l (individual reply, group reply, and list reply) are bound
    to run bogofilter -n before replying 
  X is bound to run bogofilter -S before deleting

  macro index s "<enter-command>unset wait_key\n<pipe-entry>bogofilter -N\n<enter-command>set wait_key\n<save-entry>"
  macro pager s "<enter-command>unset wait_key\n<pipe-entry>bogofilter -N\n<enter-command>set wait_key\n<save-entry>"

  macro index r "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<reply>"
  macro pager r "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<reply>"

  macro index g "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<group-reply>"
  macro pager g "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<group-reply>"

  macro index l "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<list-reply>"
  macro pager l "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<list-reply>"

  macro index X "<enter-command>unset wait_key\n<pipe-entry>bogofilter -S\n<enter-command>set wait_key\n<delete-message>"
  macro pager X "<enter-command>unset wait_key\n<pipe-entry>bogofilter -S\n<enter-command>set wait_key\n<delete-message>"


You also can place these macros in your personal .muttrc file. The
logic for this setup goes like this: if you're saving a message, that
means it's worthwhile to you. Thus, we run bogofilter -N, which adds
the words in the message to the good list and subtracts them from the
bad.

If you're replying to a message in any way, it is also not spam. You
obviously wouldn't be replying to spam, because that only begets more
spam! So we simply add it to the good list. Then comes the new key, X.
Note that this is shift-X, and not lowercase x. It is a special
"delete as spam" key. I use bogofilter -S, which adds words to the
spam list and subtracts them from the good list, because the
assumption is you're marking spams that bogofilter missed.

Here's how I use these keys. First of all, I put the following three
stanzas into my .procmailrc file, to run bogofilter on all incoming
mail:

  :0fw
  | bogofilter -u -e -p

  :0e
  { EXITCODE=75 HOST }

  # file the mail to spam-bogofilter if it's spam.
  :0:
  * ^X-Bogosity: Yes, tests=bogofilter
  inboxes/zztrash

This means that all mail gets filtered through bogofilter, and it
reinforces itself. All spams get added to the spam list, and all good
messages get added to the good list, so if spam evolves this will
catch it as time goes on.

Now I have put all caught spams into inboxes/zztrash, which is the
last mailbox I read. I read my normal inboxes, deleting uninteresting
but legitimate mail with the regular d key but zapping spam with X.
Remember, if something is in a normal mailbox, bogofilter must have
marked it as good, hence the -S to subtract from the good list and add
to the spam list.

Every mail I reply to receivers extra reinforcement on the good list.
It was added once because it wasn't caught as spam, but it'll get
added again because it caught my attention enough to warrant a
response.

Once I hit the zztrash folder, I check for any mail misclassified as
spam. I simply save them to the folders where they were supposed to
go! This runs them through bogofilter -N, which removes them from the
spam list and places them on the good list.

I have found that after only a couple days of mail, the system seems
to really be catching on to patterns in spam. I find myself correcting
less and less for the system, as it is getting much better with the
self-reinforcing stuff.

The setup comes with the caveat that the registration performed by the
macros is done in addition to whatever bogofilter did when invoked
from .procmailrc. For example, saving recognized non-spam means that
three things have happened:
 
  1.All words in the mail were added to the non-spam list when it was
    processed.

  2.These words are then deleted from the spam word list, even though
    the mail was never added there.

  3.The mail is again added to the non-spam list. 

This is actually a desired, or at least acceptable, result in my eyes.
If I save a mail, it is something that is really worth my while. The
belt-and-suspenders approach to marking it as non-spam, then, is fine
with me.

Of course, you can always change .procmailrc to run bogofilter without
-u to remove the feedback loop effects. That makes the mutt
keybindings the only commands the registration gets. In that case, the
-N and -S switches should be made -n and -s, respectively.

See the bogofilter man page for a complete list of bogofilter options.
I encourage you all to play with bogofilter!