MSNbot still overspidering
I've been monitoring the traffic to Archivist quite a bit recently. Archivist is a publically searchable mailing list archive, you subscribe the system's email address to your mailing list and all posts automagically appear on the site (threaded, and searchable).
Because Archivist is basically a text-only site, the search engine robots love it, and the majority of the site's traffic comes from search engine referrals. And because of the archive nature of the site, most of the pages on there never change; so we send appropriate last modified HTTP headers to aid caching and help keep the bandwidth usage down.
Unfortunately, unlike all the other major robots, MSNBot completely ignores these and is constantly indexing the same content over and over again. It doesn't take long to find proof of this, here's the robot traffic from April '07:

So, over this time periodMSN has done only about 50% more requests than Googlebot, but has used more than six times the bandwidth. (The number after the + is the number of hits to the robots.txt file, for those who aren't familiar with AWStats.)
At the same time MSN provides just 0.4% of the site's search engine referrals (Google is 97.6%). With numbers like this, it's hard to justify not blocking MSN completely.
Comments (0)
A blog!
It had to happen eventually. I'm one of those web developers who's had an unfinished personal site for... too long, about ten years now I think. My problem is that I don't like putting up sites that look ugly, but at the same time I have next-to-no design skills; so my own sites just go through this endless cycle of getting coded but with only half a design, and then abandoned until I have another half-design idea.
Hopefully what I have now is something that looks just about passable, but is functional enough for me to be able to post random things to. We'll see how it goes.
Comments (0)