Designing XML Formats: Versioning vs. Extensibility

April 12, 2004

@ 03:33 AM

Many people designing XML formats whether for application-specific configuration files, website syndication formats or new markup languages have to face the problem of how to design their formats to be extensible and yet be resilient to changes due to changes to versions of the format. One thing I have noticed in talking to various teams at Microsoft and some of our customers is that many people think about extensibility of formats and confuse that for being the same as the versioing problem. I have written previously On Versioning XML Vocabularies in which I stated

At this point I'd like to note that this a versioning problem which is a special instance of the extensibility problem. The extensibility problem is how does one describe an XML vocabulary in a way that allows producers to add elements and attributes to the core vocabulary without causing problems for consumers that may not know about them. The versioning problem is specific to when the added elements and attributes actually are from a subsequent version of the vocabulary (i.e. a version 2.0 server talking to a version 1.0 client).

The problem with the above paragraph is that it it focuses on a narrow aspect of the versioning problem. A versioning policy should not only be concerned with when new elements and attributes are added to the format but also when existing ones are changed or even removed.

The temptation to think about versioning as a variation of the extensibility problem is due to the fact that the focus of the XML family of technologies has been about extensibility. As I wrote in my previous posting

One of the primary benefits of using XML for building data interchange formats is that the APIs and technologies for processing XML are quite resistant to additions to vocabularies. If I write an application which loads RSS feeds looking for item elements then processes their link and title elements using any one of the various technologies and APIs for processing XML such as SAX, the DOM or XSLT it is quite straightforward to build an application that processes said elements which is resistant to changes in the RSS spec or extensions to the RSS spec as the link and title elements always appear in a feed.

Similarly XML schema languages such as W3C XML Schema have a number of features that promote extensibility such as wildcards, substitution groups and xsi:type but few if any that target the versioning problem. I've written about a number of techniques for adding extensibility to XML formats using W3C XML Schema in my article W3C XML Schema Design Patterns: Dealing With Change but none so far on approaches to versioning in combination with your favorite XML schema language.

There are a number of things that could change about the constructs in a data transfer format including

New concepts are added (e.g. new elements or attributes added to format or new values for enumerations)
Existing concepts are changed (e.g. existing elements & attributes should be interpreted differently, added elements or attributes alter semantics of their parent/owning element)
Existing concepts are deprecated (e.g. existing elements & attributes should now issue warning when consumed by an application)
Existing concepts are removed (e.g. existing elements & attributes should no longer work when consumed by an application)

How all four of the above changes between versions of the XML format are handled should be considered when designing the format. Below are sample solutions for each of the aformentioned changes

New concepts are added: In some cases the new concepts are completely alien to those in the existing format. For example, the second version of XQueryX will most likely have to deal with the additions of data update commands such as insert or delete while the existing format only has query constructs. In such cases it is most prudent to eschew backwards compatibility by either changing the version number or namespace of XML format. On the other hand, if the new additions are optional or ignorable and the format has extensibility rules for items from a different namespace than that of the format itself then the new additions (elements and attributes) can be placed in a different namespace from that of the format. In more complex cases, it may be likely that there are some additions that cannot be ignored by older processors while others can. In such cases, serious consideration should be made for adding a concept similar to the mustUnderstand attribute in SOAP 1.1 where one can indicate which additions to the format are backwards compatible and which ones are not.

In the case of new possible values being added to an enumeration (e.g. a color attribute that had the option of being "red" or "blue" in version 1.0 of the format has "green" added as a possible value in future version of the format) the specification for the format needs to determine what the behavior of older processors should be when they see values they do not understand.
Existing concepts are changed: In certain cases the interpretation of an element or attribute may be changed across versions of a vocabulary. For example, the current working draft of XSLT 2.0 has a list of incompatibilities between it and XSLT 1.0 when the same elements and attributes are used in a stylesheet. In such cases it is most prudent to change the major version number of the format if one exists or change the namespace of the format otherwise. This means the format will not be backwards compatible.
Existing concepts are deprecated: Sometimes as a format evolves, one realizes that some concepts need to be reworked and replaced by improved implementations of these concepts. An example of this is the deprecation of the requiredRuntime element in favor of the supportedRuntime element in .NET Framework application configuration files. Format designers need to consider how to make such changes work in a backwards compatible manner. In the case of .NET Framework configuration files, both elements are used for applications targetting version 1.0 of the .NET Framework since the former is understood by the configuration engine while the latter is ignored.
Existing concepts are removed: Constructs may sometimes be removed from formats because they prove to be inappropriate or insecure. For example, the most recent draft of XHTML 2.0 removes familiar elements like img and br (descriptions of the backwards incompatible changes in XHTML 2.0 from XHTML 1.1 are available in the following articles by Mark Pilgrim, All That We Can Leave Behind and The Vanishing Image: XHTML 2 Migration Issues). This approach removes forwards compatibility and in such cases it is most prudent to either change the version number or namespace of the XML format.

This blog post just scratches the surface of what can be written about the various concerns when designing XML formats to be version resilient. There are a couple of issues as to how best to represent such changes in an XML schema and if one should even bother trying in certain cases. I'll endeavor to put together an artricle about this on MSDN in the next month or two.

Categories: XML

« Implementing Standards is a Game of Dice... | Home | The Purpose of Standards in the Software... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Designing XML Formats: Versioning vs. Extensibility - Dare Obasanjo's weblog