Should You Choose RELAX Now? - Dare Obasanjo's weblog

November 28, 2006

@ 08:56 PM

Tim Bray has a blog post entitled Choose RELAX Now where he writes

Elliotte Rusty Harold’s RELAX Wins may be a milestone in the life of XML. Everybody who actually touches the technology has known the truth for years, and it’s time to stop sweeping it under the rug. W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs. To Elliotte’s list of important XML applications that are RELAX-based, I’d add the Atom Syndication Format and, pretty soon now, the Atom Publishing Protocol. It’s a pity; when XSD came out people thought that since it came from the W3C, same as XML, it must be the way to go, and it got baked into a bunch of other technology before anyone really had a chance to think it over. So now lots of people say “Well, yeah, it sucks, but we’re stuck with it.” Wrong! The time has come to declare it a worthy but failed experiment, tear down the shaky towers with XSD in their foundation, and start using RELAX for all significant XML work.

In a past life I was the PM for XML schema technologies at Microsoft so I obviously have an opinion here. What Tim Bray and Elliotte Rusty Harold gloss over in their advocacy is that there are actually two reasons one would choose an XML schema technology. I covered both reasons in my article XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? for XML.com a few years ago. The relevant part of the article is excerpted below

As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.
Describing and enforcing the contract between producers and consumers of XML documents: An XML schema ordinarily serves as a means for consumers and producers of XML to understand the structure of the document being consumed or produced. Schemas are a fairly terse and machine readable way to describe what constitutes a valid XML document according to a particular XML vocabulary. Thus a schema can be thought of as contract between the producer and consumer of an XML document. Typically the consumer ensures that the XML document being received from the producer conforms to the contract by validating the received document against the schema.

This description covers a wide array of XML usage scenarios from business entities exchanging XML documents to applications that utilize XML configuration files.
Creating the basis for processing and storing typed data represented as XML documents: As XML became popular as a way to represent rigidly structured, strongly typed data, such as the content of a relational database or programming language objects, the ability to to describe the datatypes within an XML document became important. This led to Microsoft's XML Data and XML Data-Reduced schema languages, which ultimately led to WXS. These schema languages are used to convert an input XML infoset into a type annotated infoset (TAI) where element and attribute information items are annotated with a type name.

WXS describes the creation of a type annotated infoset as a consequence of document validation against a schema. During validation against a WXS, an input XML infoset is converted into a post schema validation infoset (PSVI), which among other things contains type annotations. However practical experience has shown that one does not need to perform full document validation to create type annotated infosets; in general many applications that use XML schemas to create strongly typed XML such as XML<->object mapping technologies do not perform full document validation, since a number of WXS features do not map to concepts in the target domain.

RELAX NG is good at #1 but not #2 which is by design. Most of the folks who are interested in XSD are either WS-* folks who are building toolkits that map XML on the wire to in-memory objects or database folks implementing XQuery who also have to deal with strongly typed data. Neither category of developers/vendors are interested in RELAX NG because it wasn't designed to meet their needs. On the other hand, if you are designing an XML format from scratch and need a language/toolkit for validating the structure and correctness of your documents you definitely need to strongly consider using RELAX NG over XSD.

Categories: XML

« The Risk Averse and the Indentured | Home | REST is Hazardous to Your Career? »

Tuesday, 28 November 2006 22:50:42 (GMT Standard Time, UTC+00:00)

Hmm, but are there #2 use cases that aren't better served by JSON?

Robert Sayre

Tuesday, 28 November 2006 23:33:20 (GMT Standard Time, UTC+00:00)

>Hmm, but are there #2 use cases that aren't better served by JSON?

Probably. I pretty much advocate exposing XML and JSON when exposing services on the Web these days.

Dare Obasanjo

Wednesday, 29 November 2006 13:04:19 (GMT Standard Time, UTC+00:00)

...Except that we haven't agreed on a schema language for JSON yet (hint, hint).

Carsten Bormann

Tuesday, 05 December 2006 05:18:28 (GMT Standard Time, UTC+00:00)

Hm. I have to say I disagree with you there -- Relax-NG is -much- more typeful! For example, there have been several whole programming languages whose type systems are, essentially, RELAX-NG, which support not only subtyping but also polymorphism (aka generics) of strong XML types for functions, variables, etc.

(The underlying formalism that they all have in common is that of 'regular hedge grammars'.)

XQuery's formal type model is almost exactly that of RELAX-NG's. Microsoft's own upcoming LINQ (language integrated query) technologies have the same core data model as RELAX-NG, not XSD. (X-Ref: The "XQuery 1.0 and XPath 2.0 Formal Semantics") (Note: not confirmed but this is an educated best-guess based on various LINQ postings).

No, what XSD features is a seductive initial mapping to less advanced (less ivory-tower, but that is changing with the .NET platform advancing) OO object type systems: "Hey, ComplexTypes are object types!"

Unfortunately XSD tries to tackle far more than simple mapping to OO classes and so it fails at everything! WSDL.exe and every other 'object mapping' creation tools I've tried has had a horrendous number of corner cases that are in the standard, yet unsupported.

Anyway :) Not to slag off -too- much on XSD, but the only reason you can say "It's better for mapping to strongly typed languages is":

a) It has some trivial surface syntax, and widely used conventions (not "standards" per se), that make basic XSD -> OO type mapping fairly straightforward .

b) Everyone and their dog decided that (a) was a great idea and so XSD is used for OO type mapping all over the place.

RELAX-NG & its associated formalisms can, on the other hand, form the basis of a very powerful and expressive type system all on its own.

Dan S.

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Should You Choose RELAX Now? - Dare Obasanjo's weblog