I've been monitoring the traffic to Archivist quite a bit recently. Archivist is a publically searchable mailing list archive, you subscribe the system's email address to your mailing list and all posts automagically appear on the site (threaded, and searchable).
Because Archivist is basically a text-only site, the search engine robots love it, and the majority of the site's traffic comes from search engine referrals. And because of the archive nature of the site, most of the pages on there never change; so we send appropriate last modified HTTP headers to aid caching and help keep the bandwidth usage down.
Unfortunately, unlike all the other major robots, MSNBot completely ignores these and is constantly indexing the same content over and over again. It doesn't take long to find proof of this, here's the robot traffic from April '07:
So, over this time periodMSN has done only about 50% more requests than Googlebot, but has used more than six times the bandwidth. (The number after the + is the number of hits to the robots.txt file, for those who aren't familiar with AWStats.)
At the same time MSN provides just 0.4% of the site's search engine referrals (Google is 97.6%). With numbers like this, it's hard to justify not blocking MSN completely.
Add Comment