September 12, 2004
@ 02:17 AM

In his post Full text RSS on MSDN gets turned off Robert Scoble writes

Steve Maine: what the hell happened to blogs.msdn.com?

RSS is broken, is what happened. It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour).

Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.

So, they are trying to attack the problem by making the feeds lighter weight. I don't like the solution (I've unsubscribed from almost all weblogs.asp.net feeds because they no longer provide full text) but I understand the issues.

This is becoming a broken record. Every couple of months some web site that hasn't properly prepared for the amount of bandwidth consumed by having a popular RSS feed loudly complains and the usual suspects complain that RSS is broken. This time the culprit is Weblogs @ ASP.NET and their mistake was not providing HTTP compression to clients speaking HTTP 1.0. This meant that they couldn't get the benefits of HTTP compression when talking to popular aggregators like Straw, FeedDemon, SharpReader, NewsGator and RSS Bandit. No wonder their bandwidth usage was so high.

But lets ignore that the site wasn't properly configured enough to utilize all the bandwidth saving capabilities of HTTP. Instead lets assume Weblogs @ ASP.NET had done all the right things but was still getting to much bandwidth being consumed. Mark Nottingham covered this ground in his post The Syndication Sky is Falling!

A few people got together in NYC to talk about Atom going to the W3C this morning. One part of the minutes of this discussion raised my eyebrows a fair amount;

sr: […] Lots of people are saying RSS won’t scale. Somebody is going to say I told you so.
bw: Werner Vogels at Cornell has charted it out. We're at the knee of the curve. I don’t think we have 2 years.
sr: I have had major media people who say, until you solve this, I’m not in.
bw: However good the spec is, unless we deal with the bag issues, it won’t matter. There are fundamental flaws in the current architecture.

Fundamental flaws? Wow, I guess I should remind the folks at Google, Yahoo, CNN and my old colleagues at Akamai that what they’re doing is fundamentally flawed; the Web doesn’t scale, sorry. I guess I’ll also have to tell the people at the Web caching workshops that what they do is futile, and those folks doing Web metrics are wasting their time. What a shame...

Bad Reasons to Change the Web Architecture

But wait, there’s more. "Media people" want to have their cake and eat it too. It’s not good enough that they’re getting an exciting, new and viable (as compared to e-mail) channel to eyeballs; they also have to throw their weight around to reduce their costs with a magic wand. What a horrible reason to foist new protocols, new software, and added complexity upon the world.

The amusing new wrinkle is that every body's favorite leader of the "RSS is broken let's start all over" crowd, Sam Ruby, has decided it is time to replace blogs pinging weblogs.com when they update and using HTTP to fetch RSS feeds. Hopefully, this will be more successful than his previous attempts to replace RSS and the various blogging APIs with Atom. It's been over a year and all we have to show from the creation of Atom is yet another crufty syndication format with the promise of one more incompatible one on the way.

Anyway, the point is that RSS isn't broken. After all, it is just an XML file format. If anything is broken it is using HTTP for fetching RSS feeds. But then again do you see people complaining every time some poor web site suffers the effects of the Slashdot Effect about how HTTP is broken and needs to be replaced? If you are running a popular web site, you will need to spend money to afford the traffic. AOL.com, Ebay.com and Microsoft.com are all serving terrabytes of content each month. If they were serving these with the same budget that I have for serving my website these sites would roll over and die. Does this mean we should replace using web browsers and HTTP for browsing the Web and resort to using BitTorrent for fetching HTML pages? It definitely would reduce the bandwidth costs of sites like AOL.com, Ebay.com and Microsoft.com.

The folks paying for the bandwidth that hosts Weblogs @ ASP.NET (the ASP.NET team not MSDN as Scoble incorrectly surmises) decided they had reached their limits and reduced the content of the feeds. It's basically a non-story. The only point of interest is that if they had announced this with enough warning internally folks wuld have advised them to turn on HTTP compression for HTTP 1.0 clients before resorting to crippling the RSS feeds. Act in haste, repent at leisure.


 

Sunday, September 12, 2004 3:16:51 AM (GMT Daylight Time, UTC+01:00)
RSS isn't broken, and HTTP as a method of distributing RSS items isn't "broken", but it's not efficient. You can make it a little more efficient by compressing and other tricks but it's just not a good way to widely distribute information.

If IRC depended on a single server, or if Usenet depended on a single server, they'd have collapsed long ago. Propagating information, rather than having only one source for it, is an efficient way to scale.

A syndication protocol that kept RSS items but propagated them through an NNTP style network would eliminate the "RSS takes too much bandwidth" argument.
Sunday, September 12, 2004 9:21:54 AM (GMT Daylight Time, UTC+01:00)
When I heard that the site in question is aggregating all its blogs into one feed I realized it had nothing to do with the way RSS works, they were abusing it, thwarting 304-processing, because it would always be dirty, always appear to have changed, the problem is the site that forced the changed is almost certainly one I'm not interested in. The person who made the decision to make one giant feed DOESN'T UNDERSTAND WEBLOGS. Please don't create a giant Usenet group, create individual independent publications. Scoble says this himself. Why does he blame RSS? He's got the journalist's impulse and he can't say Microsoft is broken or they'd fire him. ;->
Sunday, September 12, 2004 9:36:15 AM (GMT Daylight Time, UTC+01:00)
Who put the bee in your bonnet? Sam isn't trying to replace pinging weblogs.com (which, if you think it works, you must not log how long it takes, and whether it succeeds or times out), he's trying to replacing pinging weblogs.com and blo.gs and technorati.com and blogrolling.com and pubsub.com and syndic8.com and yahoo.com and having them all separately check your page, because you can't trust them to share or all accept your ping or do things on a reasonable schedule unless you ping them all.

Somehow I missed seeing this the last go-round, but: mnot saying that everything's fine because CNN and RSS both use HTTP is a bit off-kilter. If CNN had to use it like RSS publishers do to keep us happy, delivering a single page with the full text of every current story on it so that most people will download the same story 14 or 29 or 49 times until CNN finally decides everyone's had a chance to read it and it falls off the bottom, they would be yelping too. They can get away with excerpts because there's no discontinuity between the excerpt and the full story, having to switch from one program to another, or one display style to another, and because they actually write their excerpts to suck people in. Because we're too lazy to write excerpts, and switching from reading RSS items to reading web pages is awkward, we can't.

I wouldn't be surprised if the folks at Weblogs @ ASP.NET would be happy to deliver the full content of each item to each reader, *once*. It's when you have to deliver it 24 times to each reader, at best, more if they are broken, that it gets painful.
Sunday, September 12, 2004 3:55:29 PM (GMT Daylight Time, UTC+01:00)
Personally, I find the annoucement that they're only going to release summaries in the RSS feeds a real dissappointment. I don't have access to an always-on internet connection and really enjoy being able to read people's blogs in their entirity (sp?) offline. While concerns to limit the amount of bandwidth usage are valid, those who release summaries only feeds are effectively blocking me from reading thier content. Hopefully HTTP compression (or some other bandwidth saving technique) will save the day.

Thanks Dare for clearning this up.

Sunday, September 12, 2004 6:53:31 PM (GMT Daylight Time, UTC+01:00)
Dare: you seem to be reaching awfully far to take a petty swipe at Sam. As Phil says above, blog pinging is _very_ broken. It's retarded to ping a dozen services and have them _each_ hit my site to verify, oh yeah, you really did change something.
Sunday, September 12, 2004 10:17:29 PM (GMT Daylight Time, UTC+01:00)
Dare,

I agree with you on being unhappy with Scoble's blanket statements.

Where I am puzzled is the attitude that I seem to gather from your posts. Maybe I am interpreting it wrong, but it seems as though you are uncomfortable with generating new ideas regarding RSS. I almost sense the "partisanship" of sorts regarding the standard. I hope I am wrong.

I am stating the obvious here, but idea generation is a necessary force that pushes us forward. Yes, only a small percentage of those new ideas are going to survive and/or be influential in producing new ideas, but we've got to have the constant stream of those crazy pipe dreams to pick from.
Sunday, September 12, 2004 11:51:22 PM (GMT Daylight Time, UTC+01:00)
I am the "bw" that Dare quotes as having said: "bw: However good the spec is, unless we deal with the bag issues, it won’t matter. There are fundamental flaws in the current architecture."

The "bag" that mentioned is an abbreviation for "Blogging Architecture Group" which I have advocated for during the meeting as a group that would look at the unique way in which blogging, RSS and Atom stress the web. Just as the "TAG" has produced a general architecture for the Web, I propose that a "BAG" would determine if the special requirements of Blogging require any extensions to the general model. The MSDN experience shows us that there is such a need... I proposed not a replacement for the web, but a means of strengthening its ability to address our needs.

The MSDN experience shows us that for blogging to work well, bloggers should probably be more vigilent than some others in ensuring that they use well well-understood mechanisms like conditional-get, compression, etc. It would be great if everyone did these things -- but it is especially useful for blogging applications. In addition, the particular usage patterns of blogging lead to much stronger than average motivation to exploit such under-utilized specifications such as RFC3229 "Delta Encoding in HTTP." While RFC3229 would provide benefit to everyone, it is especially useful to bloggers. That is why we're encouraging its use and extension at this time. We're not inventing a great deal of new stuff -- just saying that we should use what we've got. For more info, see: http://bobwyman.pubsub.com/main/2004/09/if_you_must_pol.html

In any case, the "fundamental" issues with RSS/Atom, etc. aren't related to the formats themselves. The issues are things like the fact that we poll for updates -- in whatever format they are delivered. In the RSS/Atom world, people use the language of publish/subscribe (an inherently "push" technology) yet they use the old and inefficient technology of polling. Similarly, we have a "pinging" architecture that forces every publisher to know every "subscriber" that might want to be notified of updates and the pinging mechanism (XML-RPC) can't even pass messages through most firewalls. These are real issues that need to be addressed. The are fundamental to what we do and are independent, for the most part, of the RSS/Atom formats.

On push vs. pull and pinging... Microsoft seems to think it makes sense for us to get "push" notifications of updates to Scoble's blog via Microsoft Alerts in MSN Messenger -- using a persistent non-HTTP protocol. Thus, Scoble can "ping" through firewalls to announce updates to his blog. If it is ok for Scoble to use "push", why do folk at Microsoft object when the rest of us want to use "push" as well? Do you advocate that Scoble should stop using Microsoft Alerts?

bob wyman
Tuesday, September 14, 2004 3:14:02 PM (GMT Daylight Time, UTC+01:00)
In a great tradition of American liberalism, I invoke my right to flip-flop a little bit. After reading your posts on atom-syntax list, I think I misunderstood your intentions. Therefore, I am going to withdraw my claim that you are a luddite :)

However, I have a bone or two to pick with the notion that caching could be considered as the ultimate solution to the problem. Categorically speaking, caching was invented and is only used to cover performance deficiencies of the implementation. Using caching is not the solution, it is the best proof that the implementation is flawed. Whether these flaws are rooted in current technology capability barriers or poor design, that's a different question.

Does this mean that we all need to abandoning RSS and HTTP and start from scratch? No, and I believe that this is where the majority of your point is focused: there are means to address a specific problem and let's get it fixed without sitting and whining about how everything's broken.
Comments are closed.