September 26, 2004
@ 07:10 PM

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds designed by the class of people Mark Pilgrim described in his post Why specs matter as assholes. These are people who

read specs with a fine-toothed comb, looking for loopholes, oversights, or simple typos.  Then they write code that is meticulously spec-compliant, but useless.  If someone yells at them for writing useless software, they smugly point to the sentence in the spec that clearly spells out how their horribly broken software is technically correct

This is the first in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website. Feeds in these series will often be technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators.

This week's gem is the Cafe con Leche RSS feed. Instead of pointing out what is wrong with this feed myself I'll let the author of the feed do so himself. On September 24th Elliotte Rusty Harold wrote

I've been spending a lot of time reviewing RSS readers lately, and overall they're a pretty poor lot. Latest example. Yesterday's Cafe con Leche feed contained this completely legal title element:

<title>I'm very pleased to announce the publication of XML in a Nutshell, 3rd edition by myself and W.
          Scott Means, soon to be arriving at a fine bookseller near you.
          </title>

Note the line break in the middle of the title content. This confused at least two RSS readers even though there's nothing wrong with it according to the RSS 0.92 spec. Other features from my RSS feeds that have caused problems in the past include long titles, a single URL that points to several stories, and not including more than one day's worth of news in a feed.

Elliote is technically right, none of the RSS specs says that the <link> element in an RSS feed should be unique for each item so he can reuse the same link for multiple items and still have a valid RSS feed.  So why does this cause problems for RSS aggregators?

Consider the following RSS feed

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item 1</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
  </channel>
</rss>

Now consider the same feed fetched a few hours later

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item one</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 3</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
   </channel>
</rss>

Now how does the RSS aggregator tell whether the item with the title "I am item 1" is the same as the one named "I am item one" with a typo in the title fixed or a different one?  The simple answer is that it can't. A naive hack is to look at the content of the <description> element to see if it is the same but what happens when a typo was fixed or some update to the content of the <description>?

Every RSS aggregator has some sort of hack to deal with this problem. I describe them as hacks because there is no way that an aggregator can 100% accurately determine when items with the same link and no guid are the same item with content changed or different items. This means the behavior of different aggregators with feeds such as the Cafe con Leche RSS feed is extremely inconsistent.

A solution to this problem is for Elliotte Rusty Harrold to upgrade his RSS feed to RSS 2.0 and use guid elements to distinctly identify items.


 

Sunday, 26 September 2004 19:49:24 (GMT Daylight Time, UTC+01:00)
He's not an asshole, that's giving him too much credit. He's an idiot. Sub-standard intelligence. Flunked Life 101.
Spanish Fly
Monday, 27 September 2004 03:29:30 (GMT Daylight Time, UTC+01:00)
I think the real problem is people who continue to use 0.92, old versions of RSS. W/ Atom creating a new syndication format every 3 months and RSS having already too many, we have to start saying it's wrong to use anything except 1.0 and 2.0 and 1.0 is on its way out.
Monday, 27 September 2004 07:13:46 (GMT Daylight Time, UTC+01:00)
Randy,
I totally disagree. The Cafe con Leche feed is a valid RSS 2.0 feed. Changing the value of the version attribute in the feed doesn't change the fact that it is an asshole feed.
Tuesday, 28 September 2004 06:09:51 (GMT Daylight Time, UTC+01:00)
I guess I'm also an asshole. Or maybe I'm an idiot. Sub-standard intelligence. Flunked Life 101. My rss feed DOES NOT have unique link-tags! I must have read specs with a fine-toothed comb, looking for loopholes, oversights, or simple typos! Actually I didn't read the specs that closely, and that, if anything, makes me an asshole. You suggest that I should spend my time guessing what parts of the spec ypu find difficult to implement. I, and every other rss-feed-supplier should save you the work of dealing with the spec as it's written. And what is the problem? If someone corrects a typo you get the same item twice, big deal. People don't go around fixing typos every five minutes. As an aside, my feed works with every aggregator I have tested, otherwise I wouldn't have offered a feed in the first place.
Wednesday, 29 September 2004 04:46:05 (GMT Daylight Time, UTC+01:00)
Come on, Dare. You could have communicated this issue with Elliotte privately or, at least, prevented this post from turning into a pissing contest between two well known professionals.

As to the 'title' problem, I informed Elliotte over private channel that 'title' is optional in RSS 0.92. Since he ran into the 'title' problem because he thought he had to fabricate one, I think the problem will be fixed soon.
Thursday, 07 October 2004 02:09:13 (GMT Daylight Time, UTC+01:00)
Sorry, Don,

It's an interesting point that titles are optional; but I still think they're useful and I'm not going to remove the titles from my feeds just because some RSS readers get thrown by long titles. I did add a call to normalize-space() in the stylesheet that generates the feeds to fix problems with readers that got confused by title elements with carriage returns and linefeeds, but I really shouldn't have had to do that. How much effort should I expend in order to work around broken readers that don't follow the spec? Should I avoid characters like é and 藤 just because they'll break some readers?
Comments are closed.