November 28, 2006
@ 08:56 PM

Tim Bray has a blog post entitled Choose RELAX Now where he writes

Elliotte Rusty Harold’s RELAX Wins may be a milestone in the life of XML. Everybody who actually touches the technology has known the truth for years, and it’s time to stop sweeping it under the rug. W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs. To Elliotte’s list of important XML applications that are RELAX-based, I’d add the Atom Syndication Format and, pretty soon now, the Atom Publishing Protocol. It’s a pity; when XSD came out people thought that since it came from the W3C, same as XML, it must be the way to go, and it got baked into a bunch of other technology before anyone really had a chance to think it over. So now lots of people say “Well, yeah, it sucks, but we’re stuck with it.” Wrong! The time has come to declare it a worthy but failed experiment, tear down the shaky towers with XSD in their foundation, and start using RELAX for all significant XML work.

In a past life I was the PM for XML schema technologies at Microsoft so I obviously have an opinion here. What Tim Bray and Elliotte Rusty Harold gloss over in their advocacy is that there are actually two reasons one would choose an XML schema technology. I covered both reasons in my article XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? for a few years ago. The relevant part of the article is excerpted below

As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.
  1. Describing and enforcing the contract between producers and consumers of XML documents: An XML schema ordinarily serves as a means for consumers and producers of XML to understand the structure of the document being consumed or produced. Schemas are a fairly terse and machine readable way to describe what constitutes a valid XML document according to a particular XML vocabulary. Thus a schema can be thought of as contract between the producer and consumer of an XML document. Typically the consumer ensures that the XML document being received from the producer conforms to the contract by validating the received document against the schema.

    This description covers a wide array of XML usage scenarios from business entities exchanging XML documents to applications that utilize XML configuration files.

  2. Creating the basis for processing and storing typed data represented as XML documents: As XML became popular as a way to represent rigidly structured, strongly typed data, such as the content of a relational database or programming language objects, the ability to to describe the datatypes within an XML document became important. This led to Microsoft's XML Data and XML Data-Reduced schema languages, which ultimately led to WXS. These schema languages are used to convert an input XML infoset into a type annotated infoset (TAI) where element and attribute information items are annotated with a type name.

    WXS describes the creation of a type annotated infoset as a consequence of document validation against a schema. During validation against a WXS, an input XML infoset is converted into a post schema validation infoset (PSVI), which among other things contains type annotations. However practical experience has shown that one does not need to perform full document validation to create type annotated infosets; in general many applications that use XML schemas to create strongly typed XML such as XML<->object mapping technologies do not perform full document validation, since a number of WXS features do not map to concepts in the target domain.

RELAX NG is good at #1 but not #2 which is by design. Most of the folks who are interested in XSD are either WS-* folks who are building toolkits that map XML on the wire to in-memory objects or database folks implementing XQuery who also have to deal with strongly typed data. Neither category of developers/vendors are interested in RELAX NG because it wasn't designed to meet their needs. On the other hand, if you are designing an XML format from scratch and need a language/toolkit for validating the structure and correctness of your documents you definitely need to strongly consider using RELAX NG over XSD.


Categories: XML
Tracked by: [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback]