I recently found a complaint about how NetFlix's RSS feeds appear in RSS Bandit from Danny Glasser, a dev manager on my team, in his post Netflix sucks less?. He wrote

Netflix has recently created RSS feeds for subscribers' current queues and recent rental activity, so in theory I can exchange the URLs with friends and view their queues in an RSS aggregator.  I've been playing with this a bit and unfortunately it doesn't render particularly well in RSS Bandit.  It doesn't sort nicely and old entries aren't expired properly.  I'm not sure if this is true with other aggregators but I suppose I could ask Dare

I decided to take a look at the various Netflix RSS feeds and the problem became instantly obvious. Below is an excerpted version of the Netflix Top 100 RSS feed which I'll use the discuss the various problems with syndicating lists in RSS.

<rss version="2.0">
  <channel>
    <title>Netflix Top 100</title>
    <ttl>20160</ttl>
    <link>http://www.netflix.com/Top100</link>
    <description>Top 100 Netflix movies, published every 2 weeks.</description>
    <language>en-us</language>
    <item>
      <title>1- Mystic River</title>
      <link>http://www.netflix.com/MovieDisplay?movieid=60031232&amp;trkid=134852</link>
      <description><![CDATA[Three childhood friends, Sean (Kevin Bacon), Dave (Tim Robbins) and Jimmy (Sean Penn) are reunited in Boston 25 years later when they are linked together in the murder investigation of Jimmy's daughter. ]]></description>
    </item>
    <item>
      <title>2- The Last Samurai</title>
      <link>http://www.netflix.com/MovieDisplay?movieid=60031274&amp;trkid=134852</link>
      <description><![CDATA[Tom Cruise stars as Captain Nathan Algren in this epic movie set in 1870s Japan. ]]></description>
    </item>
    <item>
      <title>3- Something's Gotta Give</title>
      <link>http://www.netflix.com/MovieDisplay?movieid=60031278&amp;trkid=134852</link>
      <description><![CDATA[Sixty and still sexy, Harry (Jack Nicholson) is having the time of his life, wining, dining and bedding women half his age.]]></description>
    </item>
  </channel>
</rss>

There are several problems with the above feed. The first is a combination of the fact that no mechanism is provided for uniquely identifying items in the feed using GUIDs and the lack of dates in the feed.  The problem manifests itself when two weeks from now the top 100 list is refreshed. Using the above feed as an example imagine that a new entry becomes number 1 thus moving Mystic River and Last Samurai one notch down. Now several things break at once.

The first problem is that the user has no way of grouping together top 100 lists for each week so I can't have last month's top 100 list and this week's top 100 list in my aggregator in any sort of meaningful way. Even if there were dates the fact that there are no GUIDs means that the aggregator will likely use the <link> element to uniquely identify the item for determining whether the user has seen it or not. This means that only the new entrant to the list will be marked as unread while movies that were already in the list and have been seen remain unhighlighted. I can see arguments for both viewpoints. On the one hand Netflix may expect that the aggregator should always have 100 items in it with only the new entrants in the list being marked as unread and positions of movies changing from week to week. On the other hand, a user may want to keep the top 100 feeds for each time period in their aggregator so they can see a timeline of the movie rankings in their aggregator. In that case, every two weeks there should be a 100 new items waiting for the user. Unfortunately neither of these happens in RSS Bandit or a number of other aggregators with Netflix's current implementation. Instead old entries in the feed and new entries show up munged together with no separation of them based on date so users can't group by date. Another problem is that he link to the movie's page is the only thing used to uniquely identify the item. So when the feed is fetched and the position of a movie changes (i.e. the title changes) instead of creating a new item in the aggregator, RSS Bandit assumes it is a post whose title has been changed and simply updates the feed in place. This makes sense in 99% of aggregator scenarios when changing the title usually means a typo was fixed in a blog post. However in the Netflix case this means a movie will always show up with its most recent position in the top 100 list. BUT once the movie leaves the list (i.e. is dropped off the feed) the movie will remain at its last position seen in the feed within the aggregator.

The second problem is the fact that there is no way to tell the aggregator how to sort the list of movies. Sorting using the title won't work because it will be an alphabetical sort, ditto for using the description. Even if there were dates, using those for sorting wouldn't make much sense either. Ideally there would have to be some way for the item to specify its position relative to other items in the same list with it at a given point in time. Again, this would require the dates should be attached to the items in the feed.

There are a number of issues raised by the Netflix problem. One could look at the problem as an indication that there should be an item expiry mechanism in RSS so the aggregator should know to dump the list every 2 weeks and refresh it with the new list. Others could argue that this could be solved by giving each item a unique ID independent of the movie and specify its date as well as a sort position. This would allow the user to track changing lists over time even if the same item appears in the list multiple times.

I don't think I've seen anyone raise any of the various problems with the Netflix feeds online. This is surprising since I'd be hard pressed to imagine how any aggregator does the 'right' thing with these feeds. More importantly the Netflix feeds show a significant hole in RSS as well as syndication formats like Atom whose primary goal seems to be RSS feature parity.

I'm going to bring this up on the RSS-AggDev mailing list and see what the other aggregator developers think about this problem.


 

Thursday, January 13, 2005 6:04:48 PM (GMT Standard Time, UTC+00:00)
I don't disagree there are still holes in Atom, but I don't think unique IDs and dates are among them. This is a good case though, I'll pass it on to the Atom list (holes there *can* be filled, because the spec isn't frozen). I suspect the best approach might be to add the ranking/week information as separate (extension) fields, to tell the aggregator exactly what is the 'right' thing to do.
Monday, January 17, 2005 7:45:15 PM (GMT Standard Time, UTC+00:00)
Yeah, I was rather excited when I saw Netflix was providing RSS feeds, but have not found them very useful for the very reasons you mention. It helps a bit to set the expiration to 30 days, but that still doesn't fix the ordering problem. Glad it's not just me who thinks this is an issue. I could see numbered lists being a very useful thing to be able to do in RSS and right now it seems tricky.

Friday, February 4, 2005 2:45:49 AM (GMT Standard Time, UTC+00:00)
Well, you could use dates to sort the lists, if the feed had dates that is, and (here's the hacky bit) if the dates were slightly different ... have #1 be the date at 9:01 AM, then #2 be the date at 9:02 AM, and so on.

You could sort on title too, since they have the ranking number as the first word. Naive alphabetic sorts would need zero padded numbers though. Smarter sorts would recognise they are numbers and sort the titles numerically for that first word, alphabetically otherwise.

Does RSS Bandit have a smart sort?
Comments are closed.