Mark Pilgrim has a post entitled The history of draconian error handling in XML  where he excerpts a couple of the discussions on the draconian error handling rules of XML which state that if an XML processor encounters a syntax error in an XML document it should stop parsing and indicate a fatal error as opposed to muddling along or trying to fixup the error in some way. According to Tim Bray 

What happened was, we had a really big, really long, really passionate argument on the subject; the camps came to be called “Draconians” and “Tolerants.” After this had gone on for some weeks and some hundreds of emails, we took a vote and the Draconians won 7-4.

Reading some of the posts from 6 years ago on Mark Pilgrim's blog it is interesting to note that most of the arguments on the sides of the Tolerants are simply no longer relevant today while the Draconians turned out to be the reason for XML's current widespread success in the software marketplace.

The original goal of XML was to create a replacement for HTML which allowed you to create your own tags yet have them work in some fashion on the Web (i.e SGML on the Web). Time has shown that placing XML documents directly on the Web for human consumption just isn't that interesting to the general Web development comunity. Most content on the Web for human consumption is still HTML tag soup. Even when Web content claims to be XHTML it often is really HTML tag soup either because it isn't well-formed or is invalid according to the XHTML DTD. Even applications that represent data internally as XML tend to use XSLT to transform the content to HTML as opposed to putting the XML directly on the Web and styling it with CSS. As I've mentioned before the dream of the original XML working group of replacing HTML by inventing “SGML on the Web” is a failed dream. Looking back in hindsight it doesn't seem that the choice of tolerant over draconian error handling would have made a difference to the lack of adoption of XML as a format for representing content targetted for human consumption on the Web today.

On the other hand, XML has flourished as a general data interchange format for machine-to-machine interactions in wide ranging areas from distributed computing and database applications to being a format for describing configuration files and business documents. There are a number of reasons for XML's rise to popularity

  1. The ease with which XML technologies and APIs enabled developers to process documents and data in an easier and more  flexible manner than with previous formats and technologies.
  2. The ubiquity of XML implementations and the consistency of the behavior of implementations across platforms.
  3. The fact that XML documents were fairly human-readable and seemed familiar to Web developers since it was HTML-like markup.

Considering the above points, does it seem likely that XML would be as popular outside of its original [failed] design goal of being a replacement for HTML if the specification allowed parsers to pick and choose which parts of the spec to honor with regards to error recovery? Would XML Web Services be as useful for interoperability between platforms if different parser implementations could recover from syntax errors at will in a non-deterministic manner? Looking at some of the comments linked from Mark Pilgrim's blog it does seem to me that a lot of the arguments on the side of the Tolerants came from the perspective of “XML as an HTML replacement” and don't stand up under scrutiny in today's world.

April 19, 1997. Sean McGrath: Re: Error Handling in XML

Programming languages that barf on a syntax error do so because a partial executable image is a useless thing. A partial document is *not* a useless thing. One of the cool things about XML as a document format is that some of the content can be recovered even in the face of error. Compare this to our binary document friends where a blown byte can render the entire content inaccessible.

Given that today XML is used for building documents that are effectively programs such as XSLT, XAML and SVG it does seem like the same rules that apply for partial programs should apply as well.

May 7, 1997. Paul Prescod: Re: Final words, I think, on error handling

Browsers do not just need a well-formed XML document. They need a well-formed XML document with a stylesheet in a known location that is syntactically correct and *semantically correct* (actually applies reasonable styles to the elements so that the document can be read). They need valid hyperlinks to valid targets and pretty soon they may need some kind of valid SGML catalog. There is still so much room for a document author to screw up that well-formedness is a very minor step down the path.

I have to agree here with the spirit of the post [not the content since it assumed that XML was going to primarily be a browser based format]. It is far more likely and more serious that there are logic errors in an XML document than syntax errors. For example, there are more RSS feeds out there with dates are invalid based on the RSS spec they support than there are ill-formed feeds. And in a number of these it is a lot easier to fix the common well-formedness errors than it is to fix violations of the spec (HTML in descriptions or titles, incorrect date formats, data other than email addresses in the <author> element, etc).

May 7, 1997. Arjun Ray: Re: Final words, I think, on error handling

The basic point against the Draconian case is that a single (monolithic?) policy towards error handling is a recipe for failure. ...

XML is many things but I doubt that one could call it a failure except when it comes to its original [flawed] intent of replacing HTML. As an mechanism for describing structured and semi-structured content in a robust, platform independent manner IT IS KING.

So why do I say everyone lost yet everyone won? Today most XML on the Web targetted at human consumption [i.e. XHTML] isn't well-formed so in this case the Tolerants were right and the Draconians lost since well-formed XML has been a failure on the human Web. However in the places were XML is getting the most traction today, the draconian error handling rules promote interoperability and predictability which is the opposite of what a number of the Tolerants expected would happen with XML in the wild.  


Comments are closed.