In her post Blog Activity Julia Lerman writes

There must be a few people who have their aggregators set to check rss feeds every 10 seconds or something. I very rarely look at my stats because they don't really tell me much. But I have to say I was a little surprised to see that there were over 14,000 hits to my website today (from 12am to almost 5pm).

So where do they come from?

10,000+ are from NewzCrawler then a whole lot of other aggregators and then a small # of browsers. 

This problem is due to the phenomenon originally pointed out by Phil Ringnalda in his post What's Newzcrawler Doing? and expounded on by me in my post News Aggregators As Denial of Service Clients. Basically 

According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.  

Recently I upgraded my web server to using Windows 2003 Server due to having problems with a limitation on the number of outgoing connections using Windows XP. Recently I noticed that my web server was still getting overloaded with requests during hours of peak traffic. Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

I'm really irritated at this behavior and have considered banning Sauce Reader & Newzcrawler from fetching RSS feeds on my blog due to the fact that they significantly contribute to bringing down my site on weekday mornings when people first fire up their aggregators at work or at home.  Instead, I'll probably end up patching my local install of dasBlog to support HTTP conditional GET for comments feeds when I get some free time. In the meantime I've tweaked some options in IIS that should reduce the amount of times the server is inaccessible due to being flooded with HTTP requests.

This doesn't mean I think this feature of the aforementioned aggregators is something that should be encouraged. I just don't want to punish readers of my blog because of decisions made by the authors of their news reading software.


Friday, September 24, 2004 2:10:11 PM (GMT Daylight Time, UTC+01:00)
You should ban them. The users should change aggregators. We should all ban them. This is unacceptable.

If I use a piece of software that causes grief, I expect that I would suffer the consequences, not the owner of the sites that I visit/read.
Friday, September 24, 2004 3:03:59 PM (GMT Daylight Time, UTC+01:00)
I once wrote a post on the XML behind the iTunes music store, and I linked a sample. Newzcrawler decided that an XML file linked from a blog ebtry constituted a resource that should be crawled till the end of time.
Friday, September 24, 2004 4:23:22 PM (GMT Daylight Time, UTC+01:00)
I've been trying out Sauce Reader for the last 2 days or so (I know I shouldnt say this on Dare's blog, but it really is quite nice). It defaults to downloading feeds every 4 hours (which is nice), however I am going to turn off the "Get Comments" feature, and hopefully this problem will go away for anyone I'm hitting every 4 hours.

Maybe I'll just switch back to NewsGator.
Friday, September 24, 2004 6:13:38 PM (GMT Daylight Time, UTC+01:00)
Mark, you should try RSS Bandit. It's very well behaved.
Friday, September 24, 2004 6:56:58 PM (GMT Daylight Time, UTC+01:00)
That's interesting info, I'll have to look at my access logs to see if that's contributing to me going over my bandwith limits for the last 2 months. Incidentally, SauceReader appears to be checking robots.txt, so you could try to just ban those UAs from everything but the main RSS feed. I don't think I'd ban them outright, except as an emergency measure.
Friday, September 24, 2004 7:38:49 PM (GMT Daylight Time, UTC+01:00)
This robots.txt entry should work for sauce reader:

User-agent: Sauce Reader
Disallow: /weblog/CommentView.aspx

Saturday, September 25, 2004 9:34:04 AM (GMT Daylight Time, UTC+01:00)
I think banning suggestions comes from unhappy competitors. And one of the reason people using aggregators like NewzCrawler is the possibility to see new comments arrived in the feed without any user additional activity. Wfw:CommentRSS is not a scalable solution, and if you see it boundaries simple don’t use it. Personally I’m using annotate:reference extension and quite happy with it. NewzCrawler supports all traffic saving techniques: conditional gets, gzip encodings, RFC3229 with "feed" instance manipulation method, adjustable feeds update periods and so on.
Today content provide have many options to save your traffic so it is not right to blame aggregators.
Saturday, September 25, 2004 4:24:38 PM (GMT Daylight Time, UTC+01:00)
>I think banning suggestions comes from unhappy competitors.

Nope, it comes from an unhappy website operator whose site is rendered inaccessible during certain times of the day due to your inconsiderately written application.

>Wfw:CommentRSS is not a scalable solution, and if you see it boundaries simple don’t use it.

It was a scalable solution until applications like yours started to abuse it.

>Today content provide have many options to save your traffic so it is not right to blame aggregators.

When aggregators act like malicious Web clients then there is every reason to blame them.
Comments are closed.