XML the Data Format, Not

I read Clemens original post on Saturday and agreed with a lot of it only disagreeing with how he presented his argument. The one thing I did agree with most is that XML is not just a data format. XML is a family of technologies that make working with structured and semi-structured information much easier than has ever been done before. This harkens back to my Why Use XML post from a few weeks back.

I'll go back to reasons #1 and #2 why I believe people use XML. Reason (1) is that Everyone Else is Using It this leads to easy transferability of skillset and interop at the syntax level. It is important to note that the interop story is all about sharing and understanding how to process UnicodeWithAngleBrackets

Reason (2) which is Huge Selection of Off-The-Shelf Tools is a very compelling reason why people use XML. This reason has little to do with XML syntax itself and more to do with the fact that hundreds of individuals and corporations have built technologies and specifications around XML while jumping on the hype bandwagon. These off-the-shelf tools make XML very attractive to people who want to process structured and semi-structured data (RSS feeds, config files, database exports, wire transfer formats, etc) who may not be enamored with the syntax of XML or all its esoteric rules. For these people there is the Godsend that is the XML Infoset.

The XML infoset gives leeway to people to use alternate representations of XML by describing a logical (as opposed to physical) model for an XML document and opening the door to creating virtual XML views. Now people can get access to XML technologies like
  • Model Based APIs for in memory representation - CHECK [DOM]

  • Stream based APIs for fast processing - CHECK [Pull-based APIs, SAX]

  • Grammer languages for specifying valid content - CHECK [DTD, W3C XML Schema]

  • Query languages - CHECK [XQuery, XPath]

  • Ability to perform regexes against the structure of the content - CHECK [XPath, XSLT]

  • Ability to create fairly human readable serialization - CHECK [XML 1.0 serialization, looks good in IE]
without sacrificing themselves on the altar of angle brackets and unicode text unless absolutely necessary. However this is not an interop or data integration story unless everyone shares a common serialization of the infoset syntax and [some] semantics for exchanging data.

Now let's go back to Clemens Vasters' example with Biztalk server. Biztalk supports various binary data transfer protocols as well as XML 1.0. However Biztalk needs to match on parts of the input stream, transform it and or specify structure using some schema. All of these are technologies that already exist with off-the-shelf XML tools which are infoset compliant so Biztalk can use XPath to query a binary stream, XSLT to transform it or XML Schema to specify structure for it while never having to resort to converting either the input or output stream to UnicodeWithAngleBrackets. Basically, they get the best of both worlds.
Note I am not saying this is what BizTalk does given that I don't work for or with them except tangentially so I have no idea what they actually do. This especially true given that I've never needed actually used BizTalk Server. Duh.
XML Everywhere. The cry of a new generation.



Andy Conrad didn't like the way I potrayed his position in my // Considered Dangerous post and wanted me to clarify it. This slipped my mind but Andy has beaten me to the punch and done a great job of clarifying his position in his post Serendipity and the Sith Lord of XPath


Get yourself a News Aggregator and subscribe to my RSSfeed

Disclaimer: The above comments do not represent the thoughts, intentions, plans or strategies of my employer. They are solely my opinion.


Comments are closed.