Time for an update I guess...
I've experimented with two systems so far.
The first one that I tried was CRM114
http://sourceforge.net/projects/crm114 which has the important benefit that it comes with pre-populated spam/non-spam databases, meaning you don't need to train it with your own data before it starts becoming useful.
It's not just it's name that makes CRM114 very odd (named after a comms system in Dr. Strangelove). It has a bizarre language for it's configuration that makes obscure Perl look normal...
CRM114 was very effective at spotting spam, in the 12 hours or so that I used it only a handful of the hundreds of spam messages I get a day slipped through (and that was without me training it at all with my own archive of messages). More importantly it didn't falsely identify any real messages as spam.
I had to stop using it though, because it has one major flaw as it comes out of the box. The problem is that it scans all the data in every attachment in every message. I happened to be sending myself a couple of messages with 1Mb+ attachments when I realised that CRM114 was spending five minutes trying to scan these large messages before procmail timed it out and delivered the message anyway. In the process it was holding up the mail queue for that user.
It is probably possible to tweak CRM114 to ignore huge attachments, but I have no desire to add another very obscure language to my existing list of obscure languages just so that I can filter my email...
So I moved onto bogofilter
http://bogofilter.sourceforge.net/
Bogofilter does not come with any pre filled databases, so the first thing I did was feed it with my archive of spam/non-spam messages. Then I fed it some messages to filter.
The good news is that bogofilter doesn't struggle with large messages, because it ignores encoded attachments (on the basis that spam messages will get caught by the data in the headers/any plain text anyway).
When it started out it wasn't as accurate as CRM114, about 10% of my messages were getting classified wrongly. But I kept training it, telling it which messages it had classified wrongly and it quickly got better.
After a couple of days it is now only letting about 5% of my spam slip through and it hasn't put a real message in the spam bin in the last two days. Hopefully it will improve as I continue to train it, but it is looking very promising already. It is certainly working far better than SpamAssasin did when I tried it, I was getting far too many false positives with it.
I didn't try dbacl in the end, I can't remember why, so I'll have to give it a go at some point.
P.S. I'm not running this directly on my normal email at the moment, I have an alias set up to forward all my email to another account that I can experiment with it. I won't start using it for real until I have run with it for a few weeks and am comfortable that it is not throwing away real messages and even then I'll check the spam bin before hitting delete...