November 18, 2004
@ 07:12 PM

My XML 2004 talk, Designing XML Formats: Versioning vs. Extensibility, went over well yesterday. Lots of interesting questions were asked during the Q&A session for my talk and the following talk by Dave Orchard, Achieving Distributed Extensibility and Versioning.

One issue that came up during the discussions after our talk was the cost/benefit of using a mustUnderstand construct in an XML format similar to the SOAP mustUnderstand attribute. The primary benefit of the having such a construct is that it enables third parties to create mandatory extensions to an XML format. However there a number of costs to having such a construct

  1. Entire Element or Document Must Be Read: A processor that just wants to extract a subset of the data in the document still has to parse the entire document and see if there are any mustUnderstand constructs before it can process the document. This increases the cost of processing instances of the format.
  2. Ambiguity as to what is Meant by 'Understand': The concept of what it means to "understand" an XML vocabulary is context specific. For example, should a stylesheet that pretty prints an XML document fail because the format contains a mustUnderstand construct that is not explicitly handled by the stylesheet? A mustUnderstand construct is particularly limiting since it forces all consumers to fail even though there may be some consumers that can still use the format even if they don't explicitly understand certain elements or attribute in the document.
  3. Causes Confusion for Intermediaries: In certain cases, a format may be processed by an intermediary on the way to the client from the server. For example, HTTP requests often pass through proxy servers and there are also web-based aggregators of RSS/Atom feeds such as Feedster & PubSub which can then be subscribed to by other aggregators. In such cases, it is ambiguous whether intermediaries are expected to fail if a construct which isn't explicitly handled is labelled as mustUnderstand or whether they are expected to pass it on with that label to third party aggregators. In fact certain formats thus have separate mustUnderstand constructs for hop-to-hop versus end-to-end transmission.

From my perspective, the cost of having a mustUnderstand construct is often not worth the benefits provided. This wasn't explicitly in my talk but is a conclusion I came to recently which I expanded upon during the Q&A session.