December 26, 2003
@ 04:07 PM

Mark Pilgrim's most recent entry in his RSS feed contains the following text

The best things in life are not things. (11 words)

Note: The "dive into mark" feed you are currently subscribed to is deprecated. If your aggregator supports it, you should upgrade to my Atom feed, which includes both summaries and full content.

A lot of the ATOM vs. RSS discussion has been mired in childishness and personality conflicts with the main proponents of ATOM claiming that the creation of the ATOM syndication format will be a good thing for users of syndication and blogging software. Now let's pretend this is true and the only people who have to bear the burden are aggregator authors like me who now have to add support for yet another syndication format. Let's see what my users get out of ATOM feeds compared to RSS feeds.

  1. Mark Pilgrim's ATOM feed: As I write this his feed contains the following elements per entry;  id, created, issued, modifed, link, summary,title, dc:subject and content. The aformentioned elements are equivalent to guid, pubDate, issued, modified, link, description, title, dc:subject and content:encoded/xhtml:body that exist in RSS feeds today. In fact an RSS feed with those elements and Mark Pilgrim's feed will be treated identically by RSS Bandit. The only problematic pieces are that his feed contains three dates that express when the entry was issued, when it was modified and when it was created. Most puzzling is that the issued date is before its created date. I have no idea what this distinction means and quite frankly I doubt many people will care.

    Basically, it looks like Mark Pilgrim's ATOM feed doesn't give users anything they couldn't get from an equivalent RSS feed except the fact that they have to upgrade their news aggregators and deal with potential bugs in the implementations of these features [because there are always bugs in new features]
  2. LiveJournal's ATOM feeds: As I write this a sample feed from Live Journal (in this case Jamie Zawinski's) contains the following elements per entry;  id, modified, issued, link, titleauthor and content . The aformentioned elements are equivalent to guid, modified, issued, link, titleauthor/dc:author and content:encoded/xhtml:body. Comparing this feed to Mark Pilgrim's I already see a bunch of ambiguity which supposedly is not supposed to exist since what ATOM supposedly gives consumers over RSS is that it will be better defined and less ambiguous than RSS. How are news aggregators supposed to treat the three date types defined in ATOM? In RSS I could always use the pubDate or dc:date now I have to figure out which of <modified>, <issued> or <created> is the most relevant one to show the user. Another point is what do I do if a feed contains <content rel="fragment"> amd a <summary>? Which one do I show the user?
  3. Movable Type's ATOM feeds: As I write this the MovableType ATOM template contains the following elements; id, modified, issued, link, titleauthor, dc:subject. summary and content. The aformentioned elements are equivalent to guid, modified, issued, link, titleauthor/dc:author, dc:subject, description and content:encoded/xhtml:body. Again besides the weirdness with dates (and I suspect RSS Bandit will end up treating <modifed> equivalent to <pubDate>) there isn't anything users get from the ATOM feed that they don't get from the equivalent RSS feed. Interesting, I'd expected that I'd find at least one of the first 3 sample ATOM feeds that I took a look at would show me why it was worth it that I spend a weekend or more implementing ATOM support in RSS Bandit. 

The fundamental conceit of the ATOM effort is that they think writing specifications is easy. Many of its proponents deride RSS for being ambiguous and not well defined yet they are producing a more complex specification with more significant ambiguities in it than I've seen in RSS. I actually have a mental list of significant issues with ATOM that I haven't even posted yet, the ones I mentioned above were just from glancing at the aforementioned feeds. My day job involves reading or writing specs all day. Most of the specs I read either were produced by the W3C or by folks within Microsoft. Every one of them contains contradictions, ambiguities and lack crucial information for determining in edge cases. Some are better than others but they all are never well-defined enough. Every spec has errata.

The ATOM people seem to think that if a simple spec like RSS can have ambiguities they can fix it with a more complex spec, which anyone who actually does this stuff for a living will tell you just leads to more complex ambiguities to deal with not less.

I wish them luck. As I implement their spec I at least hope that some of these ATOM supporters get a clue and actually use some of the features of ATOM that RSS users have enjoyed for a while and are lacking in all of the feeds I linked to above such as the ATOM equivalent to wfw:commentRss. It's quite irritating to be able to read the comments to any .TEXT or dasBlog weblog in my news aggregator but then have to navigate to the website when I'm reading a Movable Type or LiveJournal feed to see the comments.  


 

Categories: XML

In a post entitled A Plea to Microsoft Architects, Michael Earls writes

This post in in response to a post by Harry Pierson over at DevHawk...

It is abundantly frustrating to be keeping up with you guys right now.  We out here in the real world do not use Longhorn, do not have access to Longhorn (not in a way we can trust for production), and we cannot even begin to test out these great new technologies until version 1.0 (or 2.0 for those that wish to stay sane)...My job is to work on the architecture team as well as implement solutions for a large-scale commercial website using .NET.  I use this stuff all day every day, but I use the  1.1 release bits.

Here's my point, enough with the "this Whidbey, Longhorn, XAML is so cool you should stop whatever it is you are doing and use it".  Small problem, we can't.  Please help us by remembering that we're still using the release bits, not the latest technology... Oh yeah, we need more samples of current bits and less of XAML.

Remember, we're your customers and we love this new technology, but we need more of you to focus CURRENT topics on CURRENT RELEASE bits.  I don't want to read about how you used XAML and SOA to write a new version of the RSS wheel.  The RSS I have now is fine (short of the namespace that Harry mentions).  Leave it alone.

The only folks at Microsoft with Architect in their job title that blog I can think of are Don, Chris Anderson and Chris Brumme so I assume Michael is complaining about one or more of these three although they may be other software architect bloggers at Microsoft that I am unaware of. The first point I'd note is that most people that blog at Microsoft do so without any official direction so they blog about what interests them and what they are working on not what MSDN, PSS or our documentation folks thinks we need more public documentation and guidance around. That said, architects at Microsoft usually work on next generation technologies since their job is to guide and supervise their design so it is to be expected that when they blog about what they are working on it will be about next generation stuff. The people who work on current technologies and are most knowledgeable about them are the Program Managers, Developers and Testers responsible for the technology not the architects that oversee and advise their design.

My advice to Michael would be that he should broaden his blog horizons and consider reading some of the other hundreds of Microsoft bloggers many of whom blog about current technologies instead of focusing on those folks who are designing stuff that'll be shipping in two or more years and complaining when they blog about said technologies.

This isn't to say I disagree with Michael's feedback and in fact being a firm believer in Joel Spolsky's  Mouth Wide Shut principle I agree with most of it  (except for the weird bit about the fact that blogging about next generation stuff increases the perception that Microsoft is a monopoly). However he and others like him should remember that most of us blogging are just talking about we're working on not trying to give people "version envy" because we get to run the next version of the .NET Framework or Windows years before they ship.  

I have no idea how Chris Anderson, Don Box and other Microsoft architect bloggers will react to Michael's feedback but I hope they take some of it to heart.

[Update: Just noticed another Microsoft blogger with "architect" in his job title, Herb Sutter. Unsurprisingly he also blogs about the next release of the product he works on not current technology.]
 

Categories: Life in the B0rg Cube

December 24, 2003
@ 05:09 AM

Joshua Allen writes

 Before discussing qnames in content, let's discuss a general issue with qnames that you might not have known about.  Take the following XML:

<?xml version="1.0" ?>
<root xmlns:p="http://foo.org">
  <p:elem att1="" att2="" ... />
  <p:elem att1="" att2="" ... xmlns:p="
http://bar.org" / >
  <x:elem att1="" att2="" xmlns:x="
http://foo.org" />
</root>

Notice the first two elements, both ostensibly named "p:elem", but if we treat the element names as opaque strings, we'll get confused and think the elements are the same.  Luckily, we have this magical thing called a qname that uses namespace instead of prefix, and so we can note that the two element names are actually "{http://bar.org}elem" and "{http://foo.org}/elem" -- different.  By the same token, if we compare the first and third element using opaque strings, we think that they are different ("p:elem" and "x:elem").  But if we look at the qnames, we see they are both "{http://foo.org}elem".
...
so what is the big deal for qnames in content?  Look at the following XML:

<?xml version="1.0" ?>
<root xmlns:x="urn:x" xmlns:p="http://www.foo.org" >
  <p:elem>here is some data: with a colon for no good reason</p:elem>
  <p:elem>x:address</p:elem>
  <p:elem xmlns:x="urn:y">x:address</p:elem>
</root>

Now, do the last two "p:elem" elements contain the same text, or different text?  If you compared using XSLT or XPath, what would be the result?  How about if you used the values in XSD key/keyref?  The answer is that XSLT and XPath have no way of knowing that you intend those last two elements to be qnames, so they will treat them as opaque strings.  With XSD, you could type the node as qname... Most APIs are smart enough to inject namespace declarations if necessary, so the first node would write correctly as:

<p:elem xmlns:p="http://www.foo.org">here is some data: with a colon for no good reason</p:elem>

But, since the DOM has no idea that you stuffed a qname in the element content, it's got no way to know that you want to preserve the namespace for x:

<p:elem xmlns:p="http://www.foo.org">x:address</p:elem>

There is really only one way to get around this, and this is for any API which writes XML to always emit namespace declarations for all namespaces in scope, whether they are used or not (or else understand enough about the XSD and make some guesses).  Some APIs do this, but it is not something that all APIs can be trusted to do, and it yields horribly cluttered XML output and other problems.

Joshua has only hit the surface of what the real problem which is that there is no standard way to write out an XML infoset with the PSVI contributions added during validation. In plain English, there is no standard way to write out an XML document that has been validated using W3C XML Schema containing all the relevant type annotations plus other infoset augmentations. In the above example, the fact that the namespace declaration that uses the "x" prefix is not included in the output is not as significant as the fact that there is no way to tell that the type of p:elem's content is the xs:QName type.

However this doesn't change the fact that using QNames in content in an XML vocabulary is a bad idea. Specifically I am talking about using the xs:QName type in your vocabulary.  The semantics of this type are so absurd it boggles the mind. Below is the definition from the W3C XML Schema recommendation

[Definition:]   QName represents XML qualified names. The ·value space· of QName is the set of tuples {namespace name, local part}, where namespace name is an anyURI and local part is an NCName. The ·lexical space· of QName is the set of strings that ·match· the QName production of [Namespaces in XML].

This basically says that text content of type xs:QName in an XML document such as "x:address" actually is a namespace name/local name pair such as  "{http://www.example.com}address". This instantly means that you can not interpret this type without carrying around some sort of context (i.e a list of namespace name<->prefix bindings) which makes it different from most other types defined in the W3C XML Schema recommendation because it has no canonical lexical representation. A value such as  "x:address" is meaningless without knowing what XML document it came from and specifically what the namespace binding for the "x" prefix was at that particular scope.  

Of course, the existence of the QName type means you can do interesting things like use a different prefix for a particular namespace in the schema than you use in the XML instance so you can specify that the content of the <p:elem> element should be one of a:address or a:location but have x:address in the instance which would be fine if the "a" prefix is bound to the "http://www.example.com" namespace in the schema and the "x" is bound to the same namespace in the instance document. You can also ask interesting questions such as What happens if I have a default value that is of type xs:QName but there is no namespace declaration for the namespace name at that scope? Does this mean that not only should a default value be inserted as the content of an element or attribute but also that a namespace declaration is also created at the same scope if one does not exist?

Fun stuff, not.


 

Categories: XML

Shannon J Hager writes

Jeff Key wants to end default buttons on Focus-Stealing Dialogs but I think the problem is bigger than that. I don't think ANYTHING should be able to steal my focus while typing. I have ranted about this before both in places where it could help (emails with MS employees) and in places where it can't (certain blogs). Not only is it annoying to suddenly find myself typing in a IM conversation with someone on AOL when less than half a word ago I was typing an invoice for a client, it is DANGEROUS for programs to be able to steal focus like this

I agree, I didn't realize how much applications that steal focus irritate me until I used a friend's iBook which runs Mac OS X where instead of having applications steal your focus has them try to get your attention by hopping around at the bottom of the screen. I thought it was cute and a lot less intrusive than finding myself typing in a differentwWindow because some application decided that it was so important that it was going to interrupt whatever I was doing.

An operating system that enforces application politness, sweet.


 

Choosing a name for a product or software component that can stand the test of time is often difficult and can often be a source of confusion for users of the software if the its usage outgrows that implied by its name. I have examples from both my personal life and my professional life.

RSS Bandit

When I chose this name I never considered that there might one day be another popular syndication format (i.e. ATOM) which I'd end up supporting. Given that Blogger, Movable Type, and LiveJournal are going to provide ATOM feeds and utilize the ATOM API for weblog editing/management then it is a foregone conclusion that RSS Bandit will support ATOM the specifications are in slightly less flux which should be in the next few months.

One that happens the name "RSS Bandit" will be an anachronism given that RSS will no longer be the only format supported by the application. In fact, the name may become a handicap in the future once ATOM becomes popular because there is the implicit assumption that I support the "old" and "outdated" syndication format not the "shiny" and  "new" syndication format.

XPathDocument

In version 1.0 of the .NET Framework we shipped three classes that acted as in-memory representations of an XML document

  1. XmlDocument - an implementation of the W3C Document Object Model (DOM) with a few .NET specific extensions [whose functionality eventually made it into later revisions of the spec]
  2. XmlDataDocument - a subclass of the XmlDocument which acts as an XML view of a DataSet
  3. XPathDocument - a read-only in-memory representation of an XML document which conforms to the XPath data model as opposed to the DOM data model upon which the XmlDocument is based. This class primarily existed as  a more performant data source for performing XSLT transformations and XPath queries

Going forward, various limitations of all of the above classes meant that we came up with a fourth class which we planned to introduce in Whidbey. After an internal review we decided that it would be two confusing to add yet another in-memory representation of an XML document to the mix and decided to instead improve on the ones we had. The XmlDataDocument is really a DataSet specific class so it doesn't really fall into this discussion. We were left with the XmlDocument and the XPathDocument. Various aspects of the XmlDocument made it unpalatable for a number of the plans we had in mind such as acting as a strongly typed XML data source and moving away from a tree based DOM model for interacting with XML.

Instead we decided to go forward with the XPathDocument and add a bunch of functionality to it such as adding the ability to bind it to a store and retrieved strongly typed values via integrated support for W3C XML Schema datatyping, change tracking and the write data to it using the XmlWriter.

The primary feedback we've gotten about the new improved XPathDocument from usability studies and WinFX reviews is that there is little chance that anyone who hasn't read our documentation would realize that the XPathDocument is the preferred in-memory representation of an XML document for certain scenarios and not the XmlDocument. In v1.0 we could argue that the class was only of interest to people doing advanced stuff with XPath (or XSLT which is significantly about XPath) but now the name doesn't jibe with its purpose as much. The same goes for the primary mechanism for interacting with the XPathDocument (i.e. the XPathNavigator) which should be the preffered mechanism for representing and passing data as XML in the .NET Framework going forward.

If only I had a time machine and could go back and rename the classes XmlDocument2 and XmlNavigator. :(


 

Categories: Life in the B0rg Cube | XML

December 23, 2003
@ 07:29 PM

I'm kind of embarassed to write this but last week was the first time I'd installed a build of Whidbey (the next version of the .NET Framework) in about 6 months. I used to download builds on a daily basis at the beginning of the year when I was a tester working on XQuery but fell off once I became a PM. Given that certain bits were in flux I decided to wait until things were stable before installing Whidbey on my machine and writing a number of sample/test applications.

Over the next couple of weeks I'll be driving refining some of the stuff that we've designing for the next version of System.Xml and will most likely be blogging about various design issues we've had to contend with as well as perhaps giving a sneak preview of some of our end user documentation which will include answers to questions raised by some of the stuff that was shown at PDC such as whether there is any truth to the claims that XmlDocument is dead.


 

Categories: Life in the B0rg Cube

Torsten and I (mostly Torsten) have been working on a feature which we hope will satisfy multiple feature requests at one shot. Screenshot and details available by clicking the link below.
 

Categories: RSS Bandit

I just spotted the following on the wiki Ward Cunningham set up requesting advice as a new hire to Microsoft.

Take a running start and don't look back

  1. Recognize that your wonderful inventiveness is the most valuable thing you will own in a culture that values its employees solely by their latest contributions. In a spartan culture like this, you will rise quickly.

  2. Keep spewing ideas, even when those ideas are repeatedly misunderstood, implemented poorly, and excised from products for reasons that have nothing to do with the quality of the idea. When you give up on communicating your new ideas, you will just go insane waiting to vest.

  3. Be patient, or better yet, don't even look back. Don't try to track and control what people do with your ideas. It will just make you jaded and cynical. (Like many of us who have gone before :)

  4. Communicate by writing things down in compact and considered form. The most senior people, who can take your ideas the furthest fastest, are very busy. As an added side-benefit, when random program managers who just don't get it come around for the fortieth time, begging for explanations, you can provide them references to your wiki, blog, or papers for the thirty-seventh time.

  5. Don't count on the research division for anything but entertaining politics.

Have a good time, and as Don said, plan for the long-haul!

I've been in the B0rg Cube just shy of two years but the above advice rings true in more ways than one. It is a very interesting culture and with the wrong attitude one could end up being very cynical. However as with all things, the best thing to do is learn how the system works and learn how to work it. The five points above are a good starting point.   
 

Categories: Life in the B0rg Cube

There were a number of sessions I found particularly interesting either because they presented novel ways to utilize and process XML or because they gave an insightful glance at how others view the XML family of technologies. 

Imperative Programming with Rectangles, Triangles, and Circles - Erik Meijer
This was a presentation about a research language called Xen that experiments with various ways to reduce the Relational<->Objects<->XML (ROX) impedance mismatch by adding concepts and operators from the relational and XML (specifically W3C XML Schema) world into an object oriented programming language. The main thesis of the paper was that heavily used APIs and programming idioms eventually tend to be likely candidates for including into the language. An example was given with the foreach operator in the C# language which transformed the following regularly used idiom

IEnumerator e = ((IEnumerable)ts).GetEnumerator();
  try {
     while(e.MoveNext()) {  T t = (T)e.Current; t.DoStuff(); }
  } finally {
     IDisposable d = e as System.IDisposable;
     if(d != null) d.Dispose();
  }

into

foreach(T t in ts){ 
  t.DoStuff();  
 }

The majority of the presentation was about XML integration. Erik spent some time talking about the XML to object impedance mismatch and how cumbersome programming with XML could be.  Either you wrote a bunch of code for walking trees manually or you queried nodes with XPath but then you are embedding one language into another and don't get type safety, etc (if there is an error in my XPath query I can't tell until runtime). He pointed out that various XML<->object mapping technologies fall short because they either don't map a rich enough set of W3C XML Schema constructs to relevant object structures but even if they did one now looses the power of being able to do rich XPath queries or XSLT/XQuery transformations. The XML integration in Xen basically came in 3 flavors; the ability to initialize classes from XML strings, support for W3C XML Schema constructs like union types and  sequences into the language and the ability to do XPath-like queries over the contents fields and properties of a class.

There were also a few other things like adding the constraint "not null" into the language (which would be a handy modifier for parameter names in any language given how often one must check parameters for null in method bodies) and the ability to apply the same method to all the members of a collection which seemed like valuable additions to a programming language independent of XML integration.

Thinking about it I am unsure of the practicality of some features such as being able to initialize objects from an XML literal in the code especially since Xen only supported XML documents with schemas although in some cases I could imagine such an approach being more palatable than using XQuery or XSLT 2.0 for constructing or querying strongly typed XML documents. Also I was suspicious of the usefulness of being able to do wildcard queries (i.e. give me all the fields in class Foo) although this could potentially be used to get the string value of an XML element with mixed content.

The language also had integrated SQL like querying with a "select" operator but I didn't pay much attention to this since I was only really interested in XML.

The meat of this presentation is available online in the paper entitled Programming with Circles, Triangles and Rectangles. The presentation was well received although sparsely attended (about two or three dozen people) and the most noteworthy feedback was that from James Clark who was so impressed he kept saying "I'm speechless" in between asking questions about the language. Sam Ruby was also impressed by the fact that not only was there a presentation but the demo which involved compiling and running various samples showed that this you could implement such a language in the CLR and even integrate it into Visual Studio.

Namespace Routing Language (NRL) - James Clark
This was a presentation for a language for validating a single XML document with multiple schemas simultaenously. This was specifically aimed at validating documents that contained XML from multiple vocabularies (e.g. XML content embedded in a SOAP envelope, RDF embedded in HTML, etc).

The core processing model of NRL is that it divides an XML document into sections each containing elements from a single namespace then each section can be validated using the schema for its namespace. There is no requirement that the same schema language is used so one could validate one part of the document using RELAX NG and use W3C XML Schema for another. There also was the ability to to specify named modes like XSLT which allowed you to match against element names against a particular schema instead of just keying off the namespace name. This functionality could be used to validate interleaved documents (such as XHTML within an XSLT stylesheet) but I suspect that this will be easier said than done in practice.

All in all this was a very interesting talk and introduced some ideas I'd never have considered on my own.  

There is a spec for the Namespace Routing Language available online.


 

Categories: XML

December 16, 2003
@ 05:33 PM

The XML 2003 conference was a very interesting experience. Compared to the talks at XML 2002 I found the talks at XML 2003 to be of more interest and relevance to me as an developer building applications that utilize XML. The various hallway and lunchtime conversations I had with  various people were particularly valueable. Below are the highlights from the various conversations I had with some XML luminaries at lunch and over drinks. Tomorrow I'll post about the various talks I attended.

CONVERSATIONS
James Clark: He gave two excellent presentations, one on his Namespace Routing Language (NRL) and the other about some of implementation techniques used in his nxml-mode for Emacs. I asked whether the fact that he gave no talks about RELAX NG meant that he was no longer interested in the technology. He responded that there wasn't really anything more to do with the language besides shepherd it through the standardization process and evangelization. However given how entrenched support for W3C XML Schema was with major vendors evangelization was an uphill battle.

I pointed out that at Microsoft we use XML schema language technologies for two things;

    1. Describing and enforcing the contract between producers and consumers of XML documents: .
    2. Creating the basis for processing and storing typed data represented as XML documents:

The only widely used XML Schema language that fit the bill for both tasks is W3C XML Schema. However W3C XML Schema is too complex and yet doesn't have enough features for the former and has too many features which introduce complexity for the latter case. In my ideal world, people would use something like RELAX NG for the former and XML-Data Reduced (XDR) for the latter. James asked if I saw value in creating a subset of RELAX NG which also satisfied the latter case but I didn't think that there would be compelling argument for people who've already baked W3C XML Schema into the core of their being (e.g. XQuery, XML Web Services, etc) to find interest in such a subset.

In fact, I pointed out that in designing for Whidbey (next version of the .NET Framework) we originally had designed the architecture to have a pluggable XML type system so that one could potentially generate Post Schema Validation Infosets (PSVI) but realized that this was a case of YAGNI. First of all, only one XML schema language exists that can generate PSVIs so creating a generic architecture makes no sense if there was no other XML schema language that could be plugged in to replace W3C XML Schema. Secondly, one of the major benefits of this approach I had envisioned was that one would be able to plug their own type systems into XQuery. This turned out to be more complicated than I thought because XQuery has W3C XML Schema deeply baked into it and it would take more than genericizing at the PSVI level to make it work (we'd also have to genericize operators, type promotion rules, etc) and once then once all that effort would have been expended any language that could be plugged in would have to act a lot like W3C XML Schema anyway.  Basically if some RELAX NG subset suddenly came into existence, it wouldn't add much to that we don't already get from W3C XML Schema (except less complexity but you could get the same from coming up with a subset of W3C XML Schema or following my various W3C XML Schema Best Practices articles on XML.com).

I did think that there would be some value to developers building applications on Microsoft platforms who needed more document validation features than W3C XML Schema in having access to RELAX NG tools. This would be nice to have but isn't a showstopper preventing development of XML applications on Microsoft platforms (translation: Microsoft won't be building such tools in the forseeable future). However if such tools existed I definitely would evangelize them to our users who needed more features than W3C XML Schema provides for their document validation needs.  

Sam Ruby: I learned that Sam is on one of "emerging technologies" groups at IBM. Basically he works on stuff that's about to become mainstream in big way and helps them along the way. In the past this has included PHP, Open Source and Java (i.e. the Apache project), XML Web Services and now weblogging technologies. Given his track record I asked him to give me a buzz whenever he finds some new technology to work on. : )

I told him that I felt syndication formats weren't the problem with weblogging technologies and he seemed to agree but pointed out that some of the problems they are trying to solve with ATOM make more sense in the context of using the same format for your blog editing/management API and archival format. There were also the various interpersonal conflicts & psychological baggage which needs to be discarded to move the technology forward and a clean break seems to be the best way. On reflection, I agreed with him.

I did point out that the top 3 problems I'd like to fix in syndication were one click subscription, subscription harmonization and adding calendar events to feeds. I mentioned that I should have RFCs for the first two written up over the holidays but the third is something I haven't thought about hard. Sam pointed out that instead of going the route of coming up with a namespaced extension element to describe calendar events in an RSS feed that perhaps a better option is the ATOM approach that uses link tags. Something like

   <link type="text/calendar" href="...">

In fact he seemed to have liked this idea so much it ended up in his presentation.

As Sam and I were finishing our meals, Sam talked about the fact that the effect that blogging has had on his visibility is immense. Before blogging he was well known in tight-knit technical circles such as amongst the members of the Apache project but now he knows people from all over the world working at diverse companies and regularly has people go "Wow, you're Sam Ruby, I read your blog". As he said, this the guy sitting across from us at the table said "Wow, you're Sam Ruby, I read your blog", Sam turned to me and said "See what I mean?"

The power of blogging...

Eve Maler: I spoke to her about a talk I'd seen on UBL given by Eduardo Gutentag and Arofan Gregory where they talked about the benefits of using the polymorphic features of W3C XML Schema to good use in business applications. The specific scenario they described was the following

Imagine a small glue supplier that provides glue to various diverse companies such as a shoe manufacturer, an automobile manufacturer and an office supplies company. This company uses UBL to talk to each of its customers who also use UBL but since the types for describing purchase orders and the like are not specific enough for them they use the type derivation features of W3C XML Schema to create specific types (e.g. a hypothetical LineItem type from UBL is derived to AutomobilePart or ShoeComponent by the various companies). However the small glue company can handle all the new types with the same code if they use type aware processing such as the following path XPath 2.0 or XQuery expression  which matches all instances of the LineItem type

element(*, LineItem)

The presenters then pointed out  that there could be data loss if one of the customers extended the LineItem type by adding information that was pertinent to their business (e.g. priority, pricing information, prefeerred delivery options, etc) since such code would not know about the extensions.

This seems like a horrible idea and yet another reason why I view all the "object oriented" features of W3C XML Schema with suspicion.

Eve agreed that it probably was a bad idea to recommend that people process XML documents this way then stated that she felt that calling such processing "polymorphic" didn't sit right with her since true polymorphism doesn't require subtype relationships. I agreed and disagreed with her. There are at least four types of polymorphism in programming language parlance and the kind used above is subtype polymorphism. This is just one of the four types of polymorphism (the others being coercion, overloading and parametric polymorphism) but the behavior above is polymorphism. From talking to Eve it seemed that she was more interested in parametric polymorphism because it subtype polymorphism is not a loosely coupled approach. I pointed out that just using XPath expressions to match on predicates could be considered to be parametric polymorphism since you are treating instances similarly even though they are of different types but satisfy the same constraints. I'm not sure she agreed with me. :)    

Jon Udell: We discussed the online exchange we had about WinFS types and W3C XML Schema types. He apologized if he seemed to be coming on too strong in his posts and I responded that of the hundreds of articles and blog posts I'd read about the technologies unveiled at the recent Microsoft Professional Developer's Conference (PDC) that I'd only seen two people provide insightful feedback; his was the first and Miguel de Icaza's PDC writeup was the second. 

Jon felt that WinFS would be more valuable as an XML database as opposed to an object oriented database (I think the terms he used were "XML store" and "CLR store") especially given his belief that XML enables the "Universal Canvas". I agreed with him but pointed out that Microsoft isn't a single entity and even though some parts may think that XML is one step closer to giving us a universal data interchange format and thus universal data access which there are others who see XML as "that format you use for config files" and express incredulity when they here about things like XQuery because they wonder why anyone would need a query language for their config files. :)

Reading Jon's blog post about Word 11, XML and the Universal Canvas it seems he's been anticipating a unified XML storage model for a while which explains his disappointment that the WinFS unveiled at PDC was not it.

He also thought that the fact that so many people at Microsoft were blogging was fantastic. 


 

Categories: XML