January 13, 2004
@ 04:06 PM

Mark Pilgrim's recent post entitled There are no exceptions to Postel's Law among other things implies that news aggregators should process ill-formed XML feeds given that it is better for end users since they don't care about esoteric rules defined in the XML 1.0 recommendation but they do care if they can't read the news from their favorite sites.

This has unleashed some feedback from XML standards folks such as Tim Bray's On Postel, Again and Norman Walsh's On Atom and Postel's Law who argue that if an feed isn't well-formed XML then it is a fatal error. Aggregator authors have also gotten in the mix. Brent Simmons has a posted a number, of, entries on the topic where he mentions that NetNewsWire currently doesn't error on RSS feeds that are ill-formed XML if it work around the error but plans to change this for ATOM so that it errors on ill-formed feeds. Nick Bradbury has posted similar thoughts with regards to how FeedDemon has behaved in the past and will behave in future. On the other end of the spectrum is Greg Reinacker, the author of NewsGator, who has stated that NewsGator will process ill-formed RSS or ATOM feeds because he feels this is the best choice for his customers.

My thoughts on this matter are the same as Dave Winer's in his post Postel's Law has two parts 

Personally I disagree with the first half of the law when applied to XML -- the idea that aggregators should bend over backwards to accept poorly formed XML. I always understood that XML was trying to do something different, as a response to the awful mess that HTML became because browser vendors adopted the first half of Postel's philosophy.

When I adopted XML, in 1997, as I understood it -- I signed onto the idea of rejecting invalid XML. It was considered a bug if you accepted invalid XML, not a bug if you didn't.

Brent Simmons, an early player in this market, says users are better served if he reads bad feeds, but when he does that, he's raising the barrier to entry, in undocumented ways that are hard to reproduce.

His interests are served by high barriers to entry, but the users do better if they have more choice.

Now, the users are happy as long as Brent is around to keep updating his aggregator to work around feed bugs, but he might move on, it happens for all kinds of reasons. It's better to insist on tight standards, so users can switch if they want to, for any reason; so that next year's feed will likely work with this year's aggregator, even if it doesn't dominate the market.

I yearn for just one market with low barriers to entry, so that products are differentiated by features, performance and price; not compatibility.

I work on the XML team at Microsoft and one of the things I have to do is coordinate with all the other teams using XML at Microsoft. The ability to consume and produce XML is or will be baked into a wide range of products including BizTalk, SQL Server, Word, Excel, InfoPath, Windows, and Visual Studio. This besides the number of developer technologies for processing XML from XQuery and XSLT to databinding XML documents to GUI components. In a previous post I mentioned my XML Litmus Test for deciding whether XML would beneifit your project

Using XML for a software development project buys you two things (a) the ability to interoperate better with others and (b) a number of off-the-shelf tools for dealing with format.

Encouraging the production and consumption of ill-formed XML damages both these benefits of using XML since interoperability is lost when different tools treat the same XML document differently and off-the-shelf tools can no longer be reliably used to process the documents of that format. This poisons the well for the entire community of developers and users.

Developers and users of RSS or ATOM can't reap the benefits of the various Microsoft technologies and products (i.e querying feeds using XQuery or storing feeds in SQL Server) if there is a proliferation of ill-formed feeds. So far this is not the case (ill-formed feeds are a minority) but every time an aggregator vendor decides to encourage content producers to generate ill-formed XML by working aroound it and displaying the feed to the user with no visible problems that is one more drop of cyanide in the well.