December 14, 2006
@ 05:09 PM

I've noticed that some problems with viewing feeds of sites hosted on TypePad for the past few months in RSS Bandit. The problem was that every other post in a feed would display raw markup instead of correctly rendered HTML. I decided to look into the problem this morning and tracked down the problem. Take a look at http://blog.flickr.com/flickrblog/atom.xml. Here are relevant excerpts from the feed


<content type="html" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
&lt;div xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;p&gt;&&nbsp;&lt;/p&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://us.i1.yimg.com/us.yimg.com/i/ww/news/2006/12/12/gtfof.gif&quot; style=&quot;padding-bottom: 6px;&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;It&#39;s now easier than ever to spread joy this holiday season by giving the &lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;strong&gt;Gift of Flickr&lt;/strong&gt;&lt;/a&gt;. You can purchase a special activation code that you can give to anyone, whether or not they have an existing Flickr account. We&#39;ve even created a special Gift Certificate card that you can print out yourself, fold up and stuff in a stocking, under a tree or hidden away for after the candles are lit (of course, you can also send the gift code in an email).&lt;/p&gt;

&lt;p&gt;And it&#39;s even better to give the gift of Flickr since now your recipients will get &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;unlimited uploads&lt;/strong&gt;&lt;/a&gt; — the two gigabyte monthly limit is no more (&lt;em&gt;yep, pro users have no limits on how many photos they can upload&lt;/em&gt;)! At the same time, we&#39;ve upped the limit for free account members as well, from &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;20MB per month up to 100MB&lt;/strong&gt;&lt;/a&gt; (yep, five times more)!&lt;/p&gt;

&lt;p&gt;The Flickr team also wants to take this opportunity to thank you for a wonderful year and wish you and yours all the best of the season. Yay!&lt;/p&gt;&lt;/div&gt;
</content>
...
<content type="xhtml" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.flickr.com/photos/eye_spied/313572883/" title="Photo Sharing"><img width="500" height="357" border="0" src="http://static.flickr.com/117/313572883_8af0cddbc7.jpg" alt="Dec 2 2006 208 copy" /></a></p>

<p><a title="Photo Sharing" href="http://www.flickr.com/photos/mrtwism/71294604/"><img width="500" height="375" border="0" alt="riding" src="http://static.flickr.com/34/71294604_b887c01815.jpg" /></a></p>

<p>See more photos in the <a href="http://www.flickr.com/photos/tags/biggame/clusters/cal-berkeley-stanford/">"Berkeley," "Stanford," "big game" cluster</a>.</p>

<p>Photos from <a href="http://www.flickr.com/photos/eye_spied/" title="Link to caryniam's photos">caryniam</a> and <a title="Link to mrtwism's photos" href="http://www.flickr.com/photos/mrtwism/">mrtwism</a>.</p></div>
</content>

So the first mystery is solved. The reason some posts look OK and some don't is that for some reason TypePad seems to alternate between escaped HTML and well-formed XHTML as the content of an entry in the feed. When the feed uses well-formed XHTML the item looks fine but when it uses escaped HTML it looks like crap. The next question is why the items aren't rendered correctly when escaped HTML is used.

So I referred to section 3.1 of the Atom 0.3 specification and saw the following

3.1.2  "mode" Attribute

Content constructs MAY have a "mode" attribute, whose value indicates the method used to encode the content. When present, this attribute's value MUST be listed below. If not present, its value MUST be considered to be "xml".

"xml":
A mode attribute with the value "xml" indicates that the element's content is inline xml (for example, namespace-qualified XHTML).
"escaped":
A mode attribute with the value "escaped" indicates that the element's content is an escaped string. Processors MUST unescape the element's content before considering it as content of the indicated media type.
"base64":
A mode attribute with the value "base64" indicates that the element's content is base64-encoded [RFC2045]. Processors MUST decode the element's content before considering it as content of the the indicated media type.

To prevent aggregators from having to use their psychic powers to determine when an item contains plain text or escaped HTML, the Atom folks introduced a mode attribute that indicated whether the content should be treated as is or should be unescaped. As you can see the default value for this is not "escaped". Since the TypePad Atom feeds do not state that the HTML content is escaped then the aggregator is not expected to unescape the content before rendering it. Second mystery solved. Buggy feeds are the culprit. 

Even though these feeds are broken it is probably faster for me to special case feeds fromTypePad than trying to track down and convince the folks at SixApart that this is a bug worth fixing. This issue will be fixed in the next beta of the Jubilee release of RSS Bandit.