Seeing Jon Udell's post about having difficulty with the Google PR team with regards to discussing the Google GData API reminded me that I needed to write down some of my thoughts on extending RSS and Atom based on looking at GData. There are basically three approaches one can take when deciding to extend an XML syndication format such as RSS or Atom

  1. Add extension elements in a different namespace: This is the traditional approach to extending RSS and it involves adding new elements as children of the item or atom:entry element which carry application/context specific data beyond that provided by the RSS/Atom elements. Microsoft's Simple Sharing Extensions, Apple's iTunes RSS extensions, Yahoo's Media RSS extensions and Google's GData common elements all follow this model.

  2. Provide links to alternate documents/formats as payload: This approach involves providing links to additional data or metadata from an item in the feed. Podcasting is the canonical example of this technique. One argument for this approach is that instead of coming up with extension elements that replicate existing file formats, one should simply embed links to files in the appropriate formats. This argument has been used in various discussions on syndicating calendar information (i.e. iCalendar payloads) and contact lists (i.e. vCard payloads). See James Snell's post Notes: Atom and the Google Data API for more on this topic.

  3. Embed microformats in [X]HTML content: A microformat is structured data embedded within another markup language (typically HTML/XHTML). This allows one to represent both human-readable data and machine-readable data in a single document. The Structured Blogging initiative is an example of this technique.

All three approaches have their pros and cons. Option #1 is problematic because it encourages a proliferation of duplicative extensions and may lead to fragmenting the embedded data into multiple unrelated elements instead of a single document/format. Option #2 requires RSS/Atom clients to either build parsers for non-syndication formats or rely on external libraries for consuming information in the feed. The problem with Option #3 above is that it introduces a dependency on an HTML/XHTML parser for extracting the embedded data from the content of the feed.

From my experience with RSS Bandit, I have a preference for Option #1 although there is a certain architectural purity with Option #2 which appeals to me. What do the XML syndication geeks in the audience think about this?


 

Categories: Syndication Technology
Tracked by:
"Windows Live Expo API now available " (Dare Obasanjo aka Carnage4Life) [Trackback]
http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=b1d31f52-b20e-448d-ac7e-0d... [Pingback]
http://gator393.hostgator.com/~rocata/sitemap2.html [Pingback]
http://zabivwn.net/colorado/sitemap1.html [Pingback]
http://weujmru.net/rooms/sitemap1.html [Pingback]
http://box405.bluehost.com/~dugablog/sitemap1.html [Pingback]
http://weujmru.net/classes/sitemap1.html [Pingback]
http://ptmy0sx.net/alaska/sitemap1.html [Pingback]
http://restablog.dreamhosters.com/photo/sitemap1.html [Pingback]
http://kivablog.com/sitemap1.html [Pingback]
http://oymykjb.net/sitemap1.html [Pingback]
http://biggest-hosting10.com/~rocata/ [Pingback]
http://privet.150m.com [Pingback]
http://ks2vlvw.net/02/index.html [Pingback]
http://ks2vlvw.net/01/index.html [Pingback]
http://restablog.dreamhosters.com/beauty/sitemap1.html [Pingback]
http://tulanka.readyhosting.com/truck/sitemap1.php [Pingback]
http://host239.hostmonster.com/~blogford/sitemap1.html [Pingback]
http://host239.hostmonster.com/~blogford/sitemap3.html [Pingback]
http://gator413.hostgator.com/~digital/auction/sitemap1.html [Pingback]
http://gator413.hostgator.com/~digital/handbags/sitemap1.html [Pingback]
http://ghtj3bo.net/rooms/sitemap1.html [Pingback]
http://lgicsge.net/shoes/sitemap1.html [Pingback]
http://lt5d7yk.net/sitemap1.html [Pingback]
http://ok7mmwm.net/sitemap1.html [Pingback]
http://qyaq5qm.net/retail/sitemap1.php [Pingback]
http://ujprjlw.net/colorado/sitemap1.html [Pingback]
http://box432.bluehost.com/~zbloginf/sitemap1.html [Pingback]
http://box432.bluehost.com/~zbloginf/sitemap2.html [Pingback]
http://gator442.hostgator.com/~hockteam/internet/sitemap1.html [Pingback]
http://gator442.hostgator.com/~hockteam/southwest/sitemap1.html [Pingback]

Monday, June 5, 2006 9:57:33 PM (GMT Daylight Time, UTC+01:00)
My preference is also for #1, but only when #2 isn't an option.
Monday, June 5, 2006 10:52:48 PM (GMT Daylight Time, UTC+01:00)
To what degree does the Atom format itself support microformats? (Microstandards?) Does it have the equivalent of class attributes, or alt, or rev, or rel, or anything else to overlay human-readable and machine-readable data? -m
Tuesday, June 6, 2006 8:09:55 AM (GMT Daylight Time, UTC+01:00)
Well, this comment form is prefixed with '(HTML not allowed)' so Dare's position is made even clearer.
A microformat is HTML. This is nice as it passes through well designed feed readers and ends up as HTML for the user, who can then use existing parsers or client side tools to extract it. Embedding calendar and contact info is already supported as hCalendar and hCard - have a look at http://kitchen.technorati.com/search/ to see what we're finding.
Micah: Atom supports embedding HTML (as does RSS), so you can use microformats in both within the content/description. The Universal Feed Parser already has a degree of microformat parsing included.
Tuesday, June 6, 2006 12:31:13 PM (GMT Daylight Time, UTC+01:00)
Kevin, I believe that saying "it passes through well designed feed readers" is slightly disingenuous. The issue is potentially much more complicated. At the moment I would expect some, but not all microformatted content to make it through to an aggregator from a feed intact.
Tuesday, June 6, 2006 12:58:54 PM (GMT Daylight Time, UTC+01:00)
#1 and #3 both, #2 gets away from the power of the independent self-explanable document. Please dig your head out from your academic sand pit (but keep up the nice blog, thanks).
Tuesday, June 6, 2006 3:07:32 PM (GMT Daylight Time, UTC+01:00)
Kevin,
You've confused me. My first thought is that it is the feed reader that is expected to process the microformats not the end user. From that perspective, I think it is easier for tools from NetNewsWire and Bloglines to mobile phone RSS readers to process XML extensions than microformats. This is the same reason I don't favor option #2.

Wednesday, June 7, 2006 3:48:16 AM (GMT Daylight Time, UTC+01:00)
How about 1, 2 and 3? For example, an RSS feed carrying contact data could use whatever namespaced extension(s) are popular, link to a .vcf enclosure and put hCard microformat data in a CData section within the description element, all at the same time. That way the consumer can use whatever method is possible/easiest to parse the data. In the case that the client can't parse any of the formats, then the data is still visible to the user via the microformat. Don't almost all RSS aggregators display HTML in item descriptions?

Microformats are xhtml, so parsing them should be pretty easy. If there was an xhtml-only equivalent of the description element, things would be even easier. As it is, I usually look through the description for known microformat tags, e.g. class='vcard', and if found, extract / parse it with an XML parser.
Wednesday, June 7, 2006 1:12:39 PM (GMT Daylight Time, UTC+01:00)
I'd prefer #3' or #1. I wouldn't choose #2 as that'll add up network latency in a long run - it'll break down if new metadata always mean new separated content. #1 is the most straightforward way, and that's probably the best for now. Both Atom/RSS are XML, so you can simply extend it in XML way without breaking anything (and you only need to pay for only extra latency).

I really like #3, but only when data is embedded in XHTML, tagged with XML namespace for each extension (hence #3-dash instead of #3). If in XHTML, it's a no-brainer to filter out unwanted part. But today's microcontent gets away with XHTML restriction by creative use of HTML, so I'm concerned with namespace(?) collision upon wide acceptance. Now, good thing about intermixing human/machine-readable content is that you can tag human-targetted data more precisely, meaning better machine support. I'd love to see future browser extensions to enhance browsing experience by machine-processing embedded microcontents inside human-consuming document.
Taisuke Yamada
Monday, June 12, 2006 12:31:45 AM (GMT Daylight Time, UTC+01:00)
I strongly prefer #3 (microformats). Option #1 uses namespaces, which I think is even more purist than #2 (associated resource). In all three cases, the extension requires specialized code in the aggregator or client to process -- only in #1 can you expect to get some value if the extension is passed through to HTML. The duplicative problem you cite for #1 also applies to #3.
Comments are closed.