It seems the spammers now have the edge over GMail’s antispam algorithms. It appears to be down to spam that contains lots and lots of random text, with an image containing the actual ad. (If enough spammers generate random text for long enough, will one of them eventually send Shakespeare to somebody?)
I’m not sure how many have been arriving, but there’s certainly a large number sneaking through into my Inbox, and counting those that consequently get thrown (by me) there, the Spam Folder now contains over 6500 spam, going back about a month (which is how long Gmail holds them before auto-purging).
That makes over 200 per day, which includes those directed at my old email accounts that now get directed to Gmail. Perhaps 20 of those are arriving in the Inbox. Of course, I’d rather they get through to the Inbox than any false positives go to the spam folder.
It’s probably not helped by Google Groups refusing to obfuscate the From address when posting onto Usenet. I don’t know how many addresses still get harvested off Usenet, since most users know full-well to munge their addresses, but I bet it’s quite a few.
I seem to be getting about 20-30 a day, but they’re all being filtered into the spam thingie…
Bayesian Filtering is being given a workout, and it seems it’s failing. But these emails have the image in common, so perhaps that can be used as a indicator of being spam. And perhaps it’s time to check images to see if they contain text (expensive, I know) and use that as another indicator for spaminess.