December 14, 2006
@ 05:09 PM

I've noticed that some problems with viewing feeds of sites hosted on TypePad for the past few months in RSS Bandit. The problem was that every other post in a feed would display raw markup instead of correctly rendered HTML. I decided to look into the problem this morning and tracked down the problem. Take a look at http://blog.flickr.com/flickrblog/atom.xml. Here are relevant excerpts from the feed


<content type="html" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
&lt;div xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;p&gt;&&nbsp;&lt;/p&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://us.i1.yimg.com/us.yimg.com/i/ww/news/2006/12/12/gtfof.gif&quot; style=&quot;padding-bottom: 6px;&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;It&#39;s now easier than ever to spread joy this holiday season by giving the &lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;strong&gt;Gift of Flickr&lt;/strong&gt;&lt;/a&gt;. You can purchase a special activation code that you can give to anyone, whether or not they have an existing Flickr account. We&#39;ve even created a special Gift Certificate card that you can print out yourself, fold up and stuff in a stocking, under a tree or hidden away for after the candles are lit (of course, you can also send the gift code in an email).&lt;/p&gt;

&lt;p&gt;And it&#39;s even better to give the gift of Flickr since now your recipients will get &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;unlimited uploads&lt;/strong&gt;&lt;/a&gt; — the two gigabyte monthly limit is no more (&lt;em&gt;yep, pro users have no limits on how many photos they can upload&lt;/em&gt;)! At the same time, we&#39;ve upped the limit for free account members as well, from &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;20MB per month up to 100MB&lt;/strong&gt;&lt;/a&gt; (yep, five times more)!&lt;/p&gt;

&lt;p&gt;The Flickr team also wants to take this opportunity to thank you for a wonderful year and wish you and yours all the best of the season. Yay!&lt;/p&gt;&lt;/div&gt;
</content>
...
<content type="xhtml" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.flickr.com/photos/eye_spied/313572883/" title="Photo Sharing"><img width="500" height="357" border="0" src="http://static.flickr.com/117/313572883_8af0cddbc7.jpg" alt="Dec 2 2006 208 copy" /></a></p>

<p><a title="Photo Sharing" href="http://www.flickr.com/photos/mrtwism/71294604/"><img width="500" height="375" border="0" alt="riding" src="http://static.flickr.com/34/71294604_b887c01815.jpg" /></a></p>

<p>See more photos in the <a href="http://www.flickr.com/photos/tags/biggame/clusters/cal-berkeley-stanford/">"Berkeley," "Stanford," "big game" cluster</a>.</p>

<p>Photos from <a href="http://www.flickr.com/photos/eye_spied/" title="Link to caryniam's photos">caryniam</a> and <a title="Link to mrtwism's photos" href="http://www.flickr.com/photos/mrtwism/">mrtwism</a>.</p></div>
</content>

So the first mystery is solved. The reason some posts look OK and some don't is that for some reason TypePad seems to alternate between escaped HTML and well-formed XHTML as the content of an entry in the feed. When the feed uses well-formed XHTML the item looks fine but when it uses escaped HTML it looks like crap. The next question is why the items aren't rendered correctly when escaped HTML is used.

So I referred to section 3.1 of the Atom 0.3 specification and saw the following

3.1.2  "mode" Attribute

Content constructs MAY have a "mode" attribute, whose value indicates the method used to encode the content. When present, this attribute's value MUST be listed below. If not present, its value MUST be considered to be "xml".

"xml":
A mode attribute with the value "xml" indicates that the element's content is inline xml (for example, namespace-qualified XHTML).
"escaped":
A mode attribute with the value "escaped" indicates that the element's content is an escaped string. Processors MUST unescape the element's content before considering it as content of the indicated media type.
"base64":
A mode attribute with the value "base64" indicates that the element's content is base64-encoded [RFC2045]. Processors MUST decode the element's content before considering it as content of the the indicated media type.

To prevent aggregators from having to use their psychic powers to determine when an item contains plain text or escaped HTML, the Atom folks introduced a mode attribute that indicated whether the content should be treated as is or should be unescaped. As you can see the default value for this is not "escaped". Since the TypePad Atom feeds do not state that the HTML content is escaped then the aggregator is not expected to unescape the content before rendering it. Second mystery solved. Buggy feeds are the culprit. 

Even though these feeds are broken it is probably faster for me to special case feeds fromTypePad than trying to track down and convince the folks at SixApart that this is a bug worth fixing. This issue will be fixed in the next beta of the Jubilee release of RSS Bandit.


 

Thursday, December 14, 2006 7:33:22 PM (GMT Standard Time, UTC+00:00)
IME you shouldn't special case feeds in RssBandit. If feed readers become more error tolerant it will only lead to more feeds having errors. The same happened with HTML. Even if it is more work to inform SixApart it is the better solution.

Thanks,
Thomas Krause
Thomas Krause
Friday, December 15, 2006 3:31:23 AM (GMT Standard Time, UTC+00:00)
The 'type' attributes have the wrong values for Atom 0.3:

http://www.validome.org/rss-atom/validate?lang=en&url=http://blog.flickr.com/flickrblog/ato
m.xml&version=atom_0_3
Friday, December 15, 2006 9:32:45 PM (GMT Standard Time, UTC+00:00)
I encountered a similar problem in Snarfer earlier this year. After testing a bunch of other aggregators, and finding almost every one of them could handle the feed (RSS Bandit included), I figured it'd be easier to fix myself than try and persuade the publisher it was their fault.

If you're wondering why RSS Bandit worked then, but doesn't work in this case, the difference was that the content was CDATA escaped that time. It shouldn't make any difference, but I guess you handle the two forms of escaping differently.

And if you're worried about special casing, think of it this way: when the type is set to "text/html" it's technically impossible for the mode to be "xml", because HTML is not an XML format. Treating the mode as "escaped" is a reasonable recovery from such a situation, IMHO.
Saturday, December 16, 2006 11:22:18 PM (GMT Standard Time, UTC+00:00)
Sam,
How come those errors weren't pointed out by the validator at http://www.feedvalidator.org

James,
I came to the same conclusion when I decided to implement the fix.
Friday, December 29, 2006 12:53:19 PM (GMT Standard Time, UTC+00:00)
Thanks for this veryy good to read articles. I search somethink like this!

Thanks
Sunday, February 4, 2007 2:20:06 PM (GMT Standard Time, UTC+00:00)
Enjoyed browsing through the site. Keep up the good work. Greetings
Sunday, May 13, 2007 11:53:31 AM (GMT Daylight Time, UTC+01:00)
You might have come across the words above if you read blogs about search engines and search engine optimization. There’s a competition going on where the one who gets the highest rank for the keywords ‘V7ndotcom Elursrebmem’ wins a small prize and a lot of honour.

Comments are closed.