Category Archives: CMS

Pirates! Spammers! Gyroscopes! Bandwidth thieves!

This is officially getting ridiculous. Not only are my blogs getting a lot of comment spam, but my personal blog site is burning huge amounts of bandwidth, as particular (I assume zombie) hosts hit the site.

Below are the top ten bandwidth users of danielbowen.com for June:

Top 10 of 15312 Total Sites By KBytes
# Hits Files KBytes Visits Hostname
1 14380 4.10% 3801 1.77% 111235 2.22% 159 0.24% host-148-244-150-58.block.alestra.net.mx
2 17558 5.01% 3191 1.48% 99441 1.98% 157 0.24% host-207-248-240-119.block.alestra.net.mx
3 3927 1.12% 3640 1.69% 75989 1.51% 3 0.00% csr010.goo.ne.jp
4 3062 0.87% 2797 1.30% 74881 1.49% 171 0.26% rrcs-24-97-174-130.nys.biz.rr.com
5 3057 0.87% 2200 1.02% 62547 1.25% 392 0.60% msnbot.msn.com
6 2691 0.77% 2248 1.04% 60684 1.21% 153 0.23% 64.124.85.78.become.com
7 2256 0.64% 2082 0.97% 56383 1.12% 124 0.19% 98-101-196-200.linkexpress.com.br
8 2146 0.61% 2033 0.94% 51665 1.03% 279 0.43% dsl-250-198.monet.no
9 2001 0.57% 1755 0.82% 47605 0.95% 23 0.04% host133.sprintnetops.net
10 1686 0.48% 1571 0.73% 35979 0.72% 325 0.50% corporativos

It’s not like this site is hosting pr0n or something — there’s just no reason why any single host would need to grab 110Mb of traffic in a single month. In total traffic topped 4Gb for the month, which is ludicrous for a diary site with a few photos on it. 4Gb is actually my monthly limit — thankfully my web ISP isn’t too strict about charging extra for hitting that, but there’s always the risk if this is consistent that it’ll be costing me real money.

As a result I’ve started a list of bandwidth hogs’ IP addresses, which I’m putting in the .htaccess file. Anything with lots of hits and grabbing above about 5Mb per month is going onto the list, and the list is being duplicated (manually unfortunately) across to the other WordPress sites that I run.

Inspection of the access_log is particularly enlightening, with at present a staggering number of requests coming in with a referer at poker-related sites. Of the 6665 hits in the file for today (covering about 13 hours) there are 674 from texasholdemcenteral.com (note the wonky spelling) and 1212 from sportscribe.com. All of these too are now being blocked with a 403 (forbidden) via .htaccess.

Sigh. I suppose it’s just too much to expect people to place nice?

.htaccess extract – Feel free to copy for your own site to block miscreants.
Continue reading

Using Atomz free search with WordPress

I’ve set up the Atomz free search to index both my old site toxiccustard.com and my personal blog at danielbowen.com together. Atomz allows you to specify multiple entry points for its crawler, putting all the specified sites into the one index.

Given the free search only allows 750 documents in its index, the catch with WordPress is to avoid it indexing individual blog entries, but doing the monthly pages instead. This is done using the URL Masks feature, so for instance with my blog structure of danielbowen.com/year/month/day/entry-slug I specify

exclude regexp http://www.danielbowen.com/..../../../*

The other ones I’ve excluded are RSS feeds (which it chokes on, and wastes processing time on), comments and category URLs.

exclude http://www.danielbowen.com/category/*
exclude http://www.danielbowen.com/comments/*
exclude regexp http://www.danielbowen.com/*/feed/*

This keeps my current total number of pages (both domains together) down to 519, which is pretty good, and well under the 750 limit for the freebie version.

It’s also handy in that the crawler logs broken links. I’ve got quite a few that have shown up as I move my old blog archives into WordPress, so I can just work through the list and fix them.

Smoke me a kipper…

About to upgrade this blog to WordPress 1.5.

11:40pm. Done. The main difference noticeable to readers will be that your comments automatically go to moderation if you’ve never left a comment before.

WP admin heading in Firefox.One thing notable to us authors is that the top of the admin pages looks a bit screwy in Firefox (but okay in IE). Not sure why that is, because WP1.5 doesn’t do that on my other blogs… something to look at when I have more time.

Recent spam stopping techniques

Okay, two techniques, one that’s going to be comprimised sooner, one that’s going to be compromised later:

  1. A hidden field that must be supplied
  2. A javascript client-server MD5 oneway hash

I don’t see the second as a viable solution because it demands javascript (precluding certain users), and the first will be bested by the spammers when it becomes economically viable. I guess it depends on the implementation cost as to if it’s adopted here.

Moving WordPress to a new server

I moved my diary WordPress installation yesterday from an old WP1.22 installation to a brand new shiny WP1.5 database and URL. Here are the steps, in summary:

  1. We don’t want to lose any comments so get into phpMyAdmin and shut down comments/trackbacks on the old blog, by running this SQL:
    UPDATE wp_posts SET comment_status = 'closed', ping_status = 'closed'
  2. Then export the database, with Complete Data Inserts turned on. Get the dump down into a text file (there’s probably an automatic way, but I just copy/pasted into my preferred editor — Ultraedit)
  3. Do whatever replacements are needed on the data. I replaced all the toxiccustard.com/diary URLs with danielbowen.com ones, for instance. Be sure to change the setting in wp_options that specifies the site (WP) URL, ‘cos you won’t be able to logon if you don’t — the logon code will throw you over to the old blog. There’s another setting called Blog address which will also need changing if you’re coming off WP1.5.
  4. My export seemed to add extraneous escape characters in odd places. For instance a quote "e; in the database came out with two backslashes in front of it. I did some replacing to remove "e; with "e; — and similarly with single quotes, they all need only one
  5. Create the new database, with whatever database user WP will be using, and plug the details into your wp-config.php
  6. Run the export SQL into the new database, by copy/pasting into myPhpAdmin. I did it table-by-table so I could catch and correct any problems easily. I was especially wary of the wp_posts table, which had almost 700 rows, most with very long data. But as it turns out it all went very smoothly, with no problems whatsoever.
  7. Time to upload all the WP files into the new web server. Because I was moving from WP1.22 to 1.5, there were some steps to follow first for migrating the old template. All pretty straightforward really. Then run the WP wp-admin/upgrade.php to make sure the tables are all up to date with the latest design.
  8. Log onto WordPress and go through the config screens to make sure it’s all okay. Things to watch out for include the timezone (if different on the new server), setting your preferred template, activating any plugins you want, and setting the new file upload directory (on which you’ll need to set permissions).
  9. Check out the Permalinks. Set it up, then copy what it tells you to your .htaccess file. (The WP1.5 version wouldn’t actually work for me. For now I’m still using the WP1.2 version until I figure it out.)
  10. Check how the blog looks to the outside world. Post a test post and comment, just to check it all works. If not, go back and correct where applicable.
  11. Re-enable (selected) comments on the new blog:
    UPDATE wp_posts SET comment_status = 'open', ping_status = 'open'
  12. Insert an .htaccess redirect on the old site to point people over to the new:
    Redirect /olddirectory http://yoursite.com/newdirectory/

And presto! Done!

(Okay, I had some further hassles with some old HTML and broken image links mixing it up with WordPress, but that’s my problem, not yours!)

WordPress 1.5

WordPress 1.5 came out overnight. Well actually it came out on Valentine’s Day, but they didn’t announce it until a few hours ago. From the sounds of it, there’s been a lot of work done on the template system, comment control, a way to make non-dated pages run in the system (ooh, getting more CMS-ey). All sounds rather good to me, and I’ll be checking it out and (all being well) implementing it on the blogs I run directly.

Spam Karma

Well after deleting what seems like hundreds of bloody comment and trackback spams over the past week, I’ve installed Spam Karma (billed as a “fearless Spam Killing Machine”) on this blog. If it’s successful, I’ll be installing it on my other WordPress blogs.

It includes blacklists, captcha or email verification for suspicious comments, a myriad of settings, all that good stuff. For now I’ve set it to “lenient” mode until I get a feel for how strict it is. Feel free to leave junk comments here to see how it goes. (But beware of deliberately leaving spammy comments — for all I know it may decide to blacklist your IP address!

PS. Tuesday 21:25. The manual install as in the ReadMe worked for fine me, except that you can’t get to the config page through the menus, you have to activate it from the plugins page, then go to the URL it quotes. (This is apparently a known thing with WP1.2, but I guess it applies to WP1.2.2 as well, which we’re running here. Presumably it doesn’t apply to the current nightly builds or to the future 1.5.)

Also be sure to try the test captcha page (linked off the config page) to make sure that bit works (eg the correct PHP libraries are there somewhere. If they’re not, I guess you need to hassle your ISP. Works fine for me.)

PS. Wednesday 21:15. There is a hitch: the e-mail it sends out summarising what it’s done is encoded with something. I think this is an incompatibility with the PHP setup on my ISP… the same thing happened with WordPress 1.2’s password reminder messages. I’ll have to dig around for a fix.

It should also be noted that Tony has tried to plonk it onto a blog he runs, and is having some issues. So it’s not all beer and skittles.

On the bright side, it tells me it caught 20 spam comments in the last 24 hours. I certainly haven’t seen any get let through.

PS. Thursday 20:05. Some are getting through, but evidently nowhere near the total number being caught. Hmmm.

Comment spam vs nofollow

More comment spam hitting us at the moment, but curiously the comments don’t seem to have URLs with them, so I’m not sure what the point is. They’re all purporting to be from non-English-speaking e-mail addresses, and many in broken English, with a generic compliment about how marvellous your web site is. Odd.

Meanwhile, Google have come up with a new <rel=”nofollow”> attribute for links to help fight comment spam. And they’ve got a bunch of blogging heavyweights to back it, too, including the MT/TypePad, Blogger (duh), MSN Spaces and the WordPress gang, which might well cover a good proportion of blogs running today.

Now, W3C ratification, anybody? Oh pah, who cares?

A few brief things

Some people aren’t so happy about Google suggest… certainly not Eric Rice, who gets his name listed with words like “child molestor”. Wouldn’t be delighted about that, myself. (via the G’Day World podcast)

New version for WordPress (minor fixes) (hopefully it fixes the thing where if you forget your password and need it mailed to you, it sends it in some incomprehensible encoding format that can’t be read… at least not on any web or Windows email client I have access to).

New version for Trillian (major new release). Haven’t had the chance to try it yet… no time Bellamy, no time.

WordPress siteurl/path bug

Today geekrant.org’s stylesheet was fading in and out of existance. Well, to be precise, the path to it got screwed up a bit, because somehow it thought it was in a directory called (deep breath):

http://www.geekrant.org/wp-login.php/wp-images/smilies/ wp-images/smilies/wp-images/smilies/wp-images/smilies/ wp-images/smilies/wp-images/smilies/wp-images/smilies/ wp-images/smilies/wp-images/smilies/wp-images/smilies/ wp-images/smilies/wp-images/smilies/wp-images/smilies/ wp-images/smilies/wp-images/smilies/wp-images/smilies/ wp-images/smilies/

rather than the much more succinct (and correct):

http://www.geekrant.org/

This appears to be caused by a bug in WordPress 1.21, where under some circumstances registered users go to login, and a particular browser/server configuration is present (looks like something to do with proxies) and it thinks the WordPress directory has moved, and tries to compensate. It’s detailed in the WordPress support forums, and if anybody’s having problems with it, the fix is to manually fix the siteurl setting in the wp_options table (it’s the first row) and to get into wp-login.php and comment out the two lines following

// If someone has moved WordPress let’s try to detect it

…because really, if someone’s moved it, they should have done it properly and updated the siteurl setting themselves.

See, not even WordPress is perfect. But it does have a strong user community, open source code that’s not too confusing to dabble in even for PHP-newbies like me, and a straightforward database structure holding all it’s stuff together. And that counts for a lot, I think.

Protect WordPress against comment spam

I was asked to go step-by-step through how to protect WordPress from the current rash of spam comment attacks, so here it is. It’s fairly easy to get them to go into the moderation queue, but it’s a pain having to continually clear it out.

The way the current attacks (hold ’em poker and so on) are working is to attack a file called wp-comments-post.php which does the grunt-work of posting comments into the database… if this isn’t there, they can’t do it.

So first rename wp-comments-post.php to something else. Doesn’t really matter what, as long as it doesn’t clask with anything else. eg xyz.php. (It’s not ever seen by users so it really could be called anything without confusing people, though you might want to avoid confusing yourself if you later can’t remember what it is.)

Then you need to edit the files that call xyz.php, which are:

  • wp-comments.php
  • wp-comments-popup.php
  • wp-comments-reply.php

Save all those files to your server, and make sure the original wp-comments-post.php file is deleted, and then you should be done. Post a comment yourself to make sure it works.

For now it seems to stop the spammers… no doubt in future they’ll figure out something more advanced (like scanning the <form> code to figure out the name of the post file), but it should stop them for a little while at least.