December 16, 2003
@ 05:33 PM

The XML 2003 conference was a very interesting experience. Compared to the talks at XML 2002 I found the talks at XML 2003 to be of more interest and relevance to me as an developer building applications that utilize XML. The various hallway and lunchtime conversations I had with  various people were particularly valueable. Below are the highlights from the various conversations I had with some XML luminaries at lunch and over drinks. Tomorrow I'll post about the various talks I attended.

CONVERSATIONS
James Clark: He gave two excellent presentations, one on his Namespace Routing Language (NRL) and the other about some of implementation techniques used in his nxml-mode for Emacs. I asked whether the fact that he gave no talks about RELAX NG meant that he was no longer interested in the technology. He responded that there wasn't really anything more to do with the language besides shepherd it through the standardization process and evangelization. However given how entrenched support for W3C XML Schema was with major vendors evangelization was an uphill battle.

I pointed out that at Microsoft we use XML schema language technologies for two things;

    1. Describing and enforcing the contract between producers and consumers of XML documents: .
    2. Creating the basis for processing and storing typed data represented as XML documents:

The only widely used XML Schema language that fit the bill for both tasks is W3C XML Schema. However W3C XML Schema is too complex and yet doesn't have enough features for the former and has too many features which introduce complexity for the latter case. In my ideal world, people would use something like RELAX NG for the former and XML-Data Reduced (XDR) for the latter. James asked if I saw value in creating a subset of RELAX NG which also satisfied the latter case but I didn't think that there would be compelling argument for people who've already baked W3C XML Schema into the core of their being (e.g. XQuery, XML Web Services, etc) to find interest in such a subset.

In fact, I pointed out that in designing for Whidbey (next version of the .NET Framework) we originally had designed the architecture to have a pluggable XML type system so that one could potentially generate Post Schema Validation Infosets (PSVI) but realized that this was a case of YAGNI. First of all, only one XML schema language exists that can generate PSVIs so creating a generic architecture makes no sense if there was no other XML schema language that could be plugged in to replace W3C XML Schema. Secondly, one of the major benefits of this approach I had envisioned was that one would be able to plug their own type systems into XQuery. This turned out to be more complicated than I thought because XQuery has W3C XML Schema deeply baked into it and it would take more than genericizing at the PSVI level to make it work (we'd also have to genericize operators, type promotion rules, etc) and once then once all that effort would have been expended any language that could be plugged in would have to act a lot like W3C XML Schema anyway.  Basically if some RELAX NG subset suddenly came into existence, it wouldn't add much to that we don't already get from W3C XML Schema (except less complexity but you could get the same from coming up with a subset of W3C XML Schema or following my various W3C XML Schema Best Practices articles on XML.com).

I did think that there would be some value to developers building applications on Microsoft platforms who needed more document validation features than W3C XML Schema in having access to RELAX NG tools. This would be nice to have but isn't a showstopper preventing development of XML applications on Microsoft platforms (translation: Microsoft won't be building such tools in the forseeable future). However if such tools existed I definitely would evangelize them to our users who needed more features than W3C XML Schema provides for their document validation needs.  

Sam Ruby: I learned that Sam is on one of "emerging technologies" groups at IBM. Basically he works on stuff that's about to become mainstream in big way and helps them along the way. In the past this has included PHP, Open Source and Java (i.e. the Apache project), XML Web Services and now weblogging technologies. Given his track record I asked him to give me a buzz whenever he finds some new technology to work on. : )

I told him that I felt syndication formats weren't the problem with weblogging technologies and he seemed to agree but pointed out that some of the problems they are trying to solve with ATOM make more sense in the context of using the same format for your blog editing/management API and archival format. There were also the various interpersonal conflicts & psychological baggage which needs to be discarded to move the technology forward and a clean break seems to be the best way. On reflection, I agreed with him.

I did point out that the top 3 problems I'd like to fix in syndication were one click subscription, subscription harmonization and adding calendar events to feeds. I mentioned that I should have RFCs for the first two written up over the holidays but the third is something I haven't thought about hard. Sam pointed out that instead of going the route of coming up with a namespaced extension element to describe calendar events in an RSS feed that perhaps a better option is the ATOM approach that uses link tags. Something like

   <link type="text/calendar" href="...">

In fact he seemed to have liked this idea so much it ended up in his presentation.

As Sam and I were finishing our meals, Sam talked about the fact that the effect that blogging has had on his visibility is immense. Before blogging he was well known in tight-knit technical circles such as amongst the members of the Apache project but now he knows people from all over the world working at diverse companies and regularly has people go "Wow, you're Sam Ruby, I read your blog". As he said, this the guy sitting across from us at the table said "Wow, you're Sam Ruby, I read your blog", Sam turned to me and said "See what I mean?"

The power of blogging...

Eve Maler: I spoke to her about a talk I'd seen on UBL given by Eduardo Gutentag and Arofan Gregory where they talked about the benefits of using the polymorphic features of W3C XML Schema to good use in business applications. The specific scenario they described was the following

Imagine a small glue supplier that provides glue to various diverse companies such as a shoe manufacturer, an automobile manufacturer and an office supplies company. This company uses UBL to talk to each of its customers who also use UBL but since the types for describing purchase orders and the like are not specific enough for them they use the type derivation features of W3C XML Schema to create specific types (e.g. a hypothetical LineItem type from UBL is derived to AutomobilePart or ShoeComponent by the various companies). However the small glue company can handle all the new types with the same code if they use type aware processing such as the following path XPath 2.0 or XQuery expression  which matches all instances of the LineItem type

element(*, LineItem)

The presenters then pointed out  that there could be data loss if one of the customers extended the LineItem type by adding information that was pertinent to their business (e.g. priority, pricing information, prefeerred delivery options, etc) since such code would not know about the extensions.

This seems like a horrible idea and yet another reason why I view all the "object oriented" features of W3C XML Schema with suspicion.

Eve agreed that it probably was a bad idea to recommend that people process XML documents this way then stated that she felt that calling such processing "polymorphic" didn't sit right with her since true polymorphism doesn't require subtype relationships. I agreed and disagreed with her. There are at least four types of polymorphism in programming language parlance and the kind used above is subtype polymorphism. This is just one of the four types of polymorphism (the others being coercion, overloading and parametric polymorphism) but the behavior above is polymorphism. From talking to Eve it seemed that she was more interested in parametric polymorphism because it subtype polymorphism is not a loosely coupled approach. I pointed out that just using XPath expressions to match on predicates could be considered to be parametric polymorphism since you are treating instances similarly even though they are of different types but satisfy the same constraints. I'm not sure she agreed with me. :)    

Jon Udell: We discussed the online exchange we had about WinFS types and W3C XML Schema types. He apologized if he seemed to be coming on too strong in his posts and I responded that of the hundreds of articles and blog posts I'd read about the technologies unveiled at the recent Microsoft Professional Developer's Conference (PDC) that I'd only seen two people provide insightful feedback; his was the first and Miguel de Icaza's PDC writeup was the second. 

Jon felt that WinFS would be more valuable as an XML database as opposed to an object oriented database (I think the terms he used were "XML store" and "CLR store") especially given his belief that XML enables the "Universal Canvas". I agreed with him but pointed out that Microsoft isn't a single entity and even though some parts may think that XML is one step closer to giving us a universal data interchange format and thus universal data access which there are others who see XML as "that format you use for config files" and express incredulity when they here about things like XQuery because they wonder why anyone would need a query language for their config files. :)

Reading Jon's blog post about Word 11, XML and the Universal Canvas it seems he's been anticipating a unified XML storage model for a while which explains his disappointment that the WinFS unveiled at PDC was not it.

He also thought that the fact that so many people at Microsoft were blogging was fantastic.