I’ve been getting whacked with a lot of referrer spam lately. A LOT of referrer spam. A LOT.
If you’re unfamiliar with this particular bit of pernicious stupidity, referrer spam is where a spammer will hit your site for the sole purpose of sticking their own web site into your referrer logs so it looks like there’s a link from their site to yours. At first glance this would seem completely moronic — only the site admin is ever going to see any of this — but some sites publish their referrer lists, so the spammer is presumably hoping to find one of these by flooding other people’s sites with links to their own.
Usually the referrer spam on my site is innocuous, I get 50 hits from some casino or porn site once and then they go away. But in the last week I’ve been getting hundreds and hundreds of hits from one site that uses multiple domain names, each a variation on a windows software theme, enough that there’s no real referrers in my top 150.
Enough is enough.
IP banning in Movable Type only works on comments and trackbacks, it doesn’t work for regular hits. Fortunately, there is the wonder of the .htaccess file, and it works for any kind of web site, blog or otherwise. The htaccess file is used generically to make web server configuration changes on a per-directory basis. Using the LIMIT directive you can restrict access to your web site or any part of your web site by IP address or by regular expression.
Here’s the basic template for restricting access by IP:
<Limit GET> order allow,deny # referrer spammers die die die deny from XXX.XXX.XXX.XXX # allow everyone else allow from all </LIMIT>
You’d replace the XXX part with the evil spammer’s IP, of course, and you can use the three-byte version (XXX.XXX.XXX) to block entire class Cs (be careful if you do that, though, there are lots of addresses in a class C and you can block folks you don’t mean to).
If you’d rather block by keyword (poker, casino, porn, viagra, blah blah blah), you can use a regular expression (pattern matching) filter with two additional deny lines:
<Limit GET> order allow,deny # referrer spammers die die die deny from XXX.XXX.XXX.XXX SetEnvIfNoCase Referer ".*bestpokersite.com" BadReferrer deny from env=BadReferrer # allow everyone else allow from all </LIMIT>
If you don’t know regular expressions you can just add more SetEnvIfNoCase lines with more
hostname keywords to kill (note that the .* part is important).
SetEnvIfNoCase Referer ".*buycialishere.com" BadReferrer
If you’re matching on keywords more of the hostname is probably better than less so you don’t unintentionally block innocent users. You never know when an actual reader might be legitimately coming from a poker site.
If you do know regular expression syntax, any regex is valid. The case is insensitive. Use the SetEnvIf directive for case-sensitive regexes.
When a site that is blocked in your .htaccess tries to access your site, they’ll get a 403 error (restricted access). The hit won’t show up in your logs at all.
- Referer Spam Redux: As I was writing this yesterday Kuro5in posted this article on this same topic with much the same content. There are also some additional links here to public blacklists of referrer spam sites and other resources.
- Wired news article about referrer spam from 2002
- Apache .htaccess tutorial
- Apache LIMIT directive docs
- Apache SetEnvIf directive docs