Over the past few weeks there have been a bunch of reports on internal mailing lists about problems with MSN Spaces RSS feeds and Bloglines. The specific problem is that every once in a while old posts containing photos are marked as being new in Bloglines. There have also been some complaints that indicate this problem also manifests itself in Newsgator as well.

After some investigation we discovered that this problem seemed to only occur in RSS items containing links to photos hosted on our storage servers such as blog posts with photo attachments or photo albums. This led to a hunch that this problem only affected RSS readers that mark old posts as new if any content in the <description> element changes. Once this was confirmed then we had our answer. For certain reasons, the permalink URL to an image stored on our storage servers changes over time*. Whenever one of these changes to the URLs of images takes place, then RSS readers that detect changes to the content <description> element of a feed will indicate that this post has been altered. 

A brief discussion with the folks behind Bloglines indicates that there isn't a straightforward solution to this problem. It is unlikely that they will change their RSS parsing code to deal with the idiosyncracies of RSS feeds provided by MSN Spaces. Being the author of an RSS reader as well, I can understand not wanting to litter the code with special cases. Similarly it is unlikely that we will be changing the behavior that causes URLs to images hosted on our servers to change in the short term.

After chatting with Mike and Jason about this one of the solutions we came up with was to use the dcterms:modified element in our RSS feeds. The element would contain the date of the last time a user directed change was made to the item, in this case the item would be a blog post or photo album. This means that RSS readers can simply test the value of the dcterms:modified element to determine if a post was changed by the user instead of performing inefficient textual comparisons of the contents of the post. In fact, the main reason I don't provide support for detecting changes in RSS items in RSS Bandit is the high rate of false positives as well as slowdowns caused by performing lots of text comparisons. Having this element in RSS feeds would make it a lot easier for me to support detecting changes to the contents of items in an RSS feed without degrading the user experience in the general case.

Of course, without RSS readers deciding to support the use of the dcterms:modified element in RSS feeds this will continue to be a problem. I need to send some mail to Mark Fletcher and the RSS-AggDev mailing list to see what people think about supporting this element as a way to get around the "bogus new items" problem.

* Note that this doesn't break links that reference that image with the old URL.


 

Tuesday, 03 May 2005 17:05:26 (GMT Daylight Time, UTC+01:00)
http://web.resource.org/rss/1.0/modules/dcterms/#modified

doesn't really jive very well with the definition you gave. You could use atom:updated, which has exactly the semantics you want.

http://atompub.org/2005/04/18/draft-ietf-atompub-format-08.html#rfc.section.4.2.15
Tuesday, 03 May 2005 18:09:48 (GMT Daylight Time, UTC+01:00)
Sounds like we've had similar experiences, Dare. Like RSS Bandit, FeedDemon doesn't detect changes to descriptions. I experimented with this, but found that the vast majority of changes were simple typo/punctuation corrections, and given the performance hit I figured it wasn't worth it.

FeedDemon doesn't support dcterms:modified, but it should be fairly simply to do so since it already supports atom:modified. However, the definition of dcterms:modified states that it "should correspond to the HTTP 1.1 Last-Modified date, of the document the link element points to," which is different than your proposed usage. I personally don't have a problem with this and will support your usage of dcterms:modified if it goes “live,” but it may be a point of contention among the more anal-retentive in the RSS crowd :)
Tuesday, 03 May 2005 22:15:35 (GMT Daylight Time, UTC+01:00)
Glad to hear that you are working on this. I can report that it does indeed influence Newsgator as well. Since a new Newsgator version is in the works, hopefully it can be fixed there.
kip
Tuesday, 03 May 2005 23:30:05 (GMT Daylight Time, UTC+01:00)
If you are publishing RSS v2.0, why not use or otherwise encourage the use of item/pubDate?

Unless the actual post itself has changed, that value should remain the same and therefore the readers that use that value will know nothing has changed. If the "internalized" links within a post have been changed, the publication date of a given post entry is not changing, right? If so, then advocating pubDate may be the best approach; instead of adding yet another tag which rss readers need to monitor.

Maurice
Comments are closed.