A Critical Look at the RSS-Data Proposal

October 8, 2003

@ 12:58 AM

Why

The problem Jeremy Allaire is attempting to solve is how to extend the information provided in an RSS feed with custom information specific to a particular news source. Traditionally, RSS has been used as a way to syndicate news items but it quickly became obvious that RSS feeds and RSS readers are a good way to represent and consume periodically changing data from any information source. Various ways to provide custom information beyond the data traditionally in an RSS feed (i.e. author information, publication date, title, description of content, etc) were proposed and the RSS world settled on using namespace based extensions. Below is an example of an RSS 2.0 item without namespace based extensions

<item> <title>RSS Bandit 1.1.0.36 Released</title> <description>The latest version of RSS Bandit has been released, it fixes a number of bugs in previous releases and contains 6 new stylesheets for displaying feeds</description> <link>http://www.gotdotnet.com/Community/Workspaces/newsitem.aspx?id=cb8d3 173-9f65-46fe-bf17-122e3703bb00&mp;newsId=1559</link> </item>

while the following fragment uses namespaced based extensions to transmit more information than can be expressed with the traditional RSS elements.

<item xmlns:stats="http://www.example.com/downloadstatistics"> <title>RSS Bandit 1.1.0.36 Released</title> <description>The latest version of RSS Bandit has been released, it fixes a number of bugs in previous releases and contains 6 new stylesheets for displaying feeds</description> <link>http://www.gotdotnet.com/Community/Workspaces/newsitem.aspx?id=cb8d3 173-9f65-46fe-bf17-122e3703bb00&mp;newsId=1559</link> <stats:num-downloads>350</stats:num-downloads> <stats:rating>8</stats:rating> </item>

The above fragment adds information about the number of downloads of the application as well as its score (out of 10) when it was reviewed by the editor(s) of the download site. An aggregator that understands elements from the "http://www.example.com/downloadstatistics" namespace could display the information in those elements in specific ways while aggregators that do not could just ignore the elements. The approach of using namespaced extensions is flexible because RSS readers can simply ignore elements they do not understand but yet can still display useful information about the news item. It is decentralized because anyone can come up with their own namespace based extensions which then sink or swim based on how much traction the extensions get with content producers and aggregator authors.

The problem with this approach is that aggregators have to continually update themselves to take advantage of new namespaced extensions as they appear or fall behind. Aggregator authors like Dave Winer have complained about the development treadmill that is pushed on aggregator authors. Thus many have tried to figure out if it is possible to create a way for aggregators to automatically understand arbitrary namespaced extensions as they appear in feeds that are encountered on the Web. As mentioned in previous posts by myself and Jon Udell this is not a solvable problem in the general case without solutions bordering on Artificial Intelligence and Semantic Web technologies.

Jeremy Allaire's proposal tries to solve this problem or at least propose a solution that is better than the status quo. There are some that seem to believe that his proposal also solves the problem of how to transmit programming language data structures across the network which seems strange given that technologies already exist which provide such functionality in a satisfactory manner such as XML-RPC and SOAP 1.1

How

The meat of Jeremy Allaire's proposal is excerpted below

Here's what I think is necessary for RSS-Data, which is almost literally the XML-RPC data serialization model.

Same data model, including all elements such as <struct>, <array>, <boolean>, <dateTime>, <string>, <number>, <base64binary>, etc.
Unicode-based, fixing a known problem with XML-RPC
Time-zone aware, also fixing a known problem a variety of serialization approaches
RSS-Data could be used inside any RSS 2.0 element that can contain namespace extensions, including <item>, <channel>, and inside other custom namespaces. Likewise, other XML applications in need of a simple object data exchange format could use the <sdl> namespace to extend their applications.

Ignoring minor issues such as the fact that Jeremy Allaire never actually provides a namespace URI for the "sdl" ~~namespace~~ prefix the proposal is fairly straightforward. Below is the example from the previous section which used namespaced elements to convey additional information in the RSS item converted to use the RSS-Data proposal.

<item xmlns:stats="http://www.example.com/sdl"> <title>RSS Bandit 1.1.0.36 Released</title> <description>The latest version of RSS Bandit has been released, it fixes a number of bugs in previous releases and contains 6 new stylesheets for displaying feeds</description> <link>http://www.gotdotnet.com/Community/Workspaces/newsitem.aspx?id=cb8d3 173-9f65-46fe-bf17-122e3703bb00&mp;newsId=1559</link> <sdl:member> <sdl:name>num-downloads</sdl:name> <sdl:value> <sdl:number>350</sdl:number> </sdl:value> </sdl:member> <sdl:member> <sdl:name>rating</sdl:name> <sdl:value> <sdl:number>8</sdl:number> </sdl:value> </sdl:member>

The above example uses syntax based on the example in Les Orchard's post on an example of an RSS 2.0 namespace versus RSS-Data usage. There are two things that hit me instantly when I saw the examples, the first was that the RSS-Data approach is more verbose than it has to be especially since there is already a mechanism for what it is trying to do using standardized W3C technologies and the other was that adding datatype information to the embedded information adds very little if anything.

What

The first problem I had with the RSS data proposal is that it is more verbose than it has to be making it require more code to process RSS feeds that contain embedded instances of RSS-Data elements. Embedding type information in an XML document can already be achieved by using the xsi:type attribute from the W3C XML Schema recommendation. An example of this is shown below.

<item xmlns:stats="http://www.example.com/downloadstatistics" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" > <title>RSS Bandit 1.1.0.36 Released</title> <description>The latest version of RSS Bandit has been released, it fixes a number of bugs in previous releases and contains 6 new stylesheets for displaying feeds</description> <link>http://www.gotdotnet.com/Community/Workspaces/newsitem.aspx?id=cb8d3 173-9f65-46fe-bf17-122e3703bb00&mp;newsId=1559</link> <stats:num-downloads xsi:type="xs:integer">350</stats:num-downloads> <stats:rating xsi:type="xs:integer">8</stats:rating> </item>

The example above which uses xsi:type as opposed to Jeremy Allaire's RSS-Data proposal is not only easier to process but also supports a richer set of builtin datatypes. The only benefit of Jeremy Allaire's proposal in this case is that since there are already libraries that can convert XML-RPC constructs into programming language objects then this code could be reused to process embedded RSS-Data elements and convert them into objects. This leads to my second problem with the proposal.

It isn't clear what added benefit one gets from being able to distinguish the type of some embedded data as opposed to being able to distinguish its name. For instance, for my news aggregator RSS Bandit it is more important that I can tell that the namespace URI of an extension element named comment is "http://wellformedweb.org/CommentAPI/" than it is to know that it's type is <string> or even xs:anyURI. Also the fact that Jeremy Allaire's proposal doesn't create a way to dissambiguate vocabularies the way using namespace names does means that it is likely that there could be name collisions (for instance in the RSS world there are at least two elements in common usage with the same local name slash:comments and comments, see Chris Sells' post on End to End comment support in RSS feeds to tell the difference between them).

The bottom line is that it is hard to see how Jeremy Allaire's proposal is any better than the status quo and in many ways it is worse. Secondly, if one felt it was necessary to send type information along with data about extension elements it is possible to do so in a manner that is compatible with the current status quo thus making it flexible, decentralized and simple to use.

Categories: XML

« XML Journal on the Universal XML API, RS... | Home | First Post From DasBlog »

Friday, 10 October 2003 02:23:40 (GMT Daylight Time, UTC+01:00)

I completely agree with all your points Dare. I believe extensions through namespaces is a great addition to RSS 2.0, but one that was also possible with the W3C RSS 1.0...
I imagine an extensible plug-in architecture that would load plugins from URL-retrievable namespaces (that would be a convention, of course, such as "http://danielcazzulino.com.ar/rss/metadata"), and that would participate in the item's rendering process, either server-side or client-side in a news aggregator. The returned plug-in would have to conform to some interface and would be readily downloaded and executed (CAS approval assumed...).
For the server-side, I even imagine an Accept HTTP header that would tell it whether the client requesting the RSS feed supports a certain feature, for example, Accept: x-include. If such a header was not present, the server could pre-accumulate the XInclude-d fragments from other sources.

BTW: your server is sending me your IP instead of the hostname, making it difficult to manage bookmarks, posting a comment a couple hours after (if you reset your dynamic IP connection! ;))...

Daniel Cazzulino

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for A Critical Look at the RSS-Data Proposal - Dare Obasanjo's weblog