When Bad Ideas Attack - Dare Obasanjo's weblog

December 24, 2003

@ 05:09 AM

Joshua Allen writes

Before discussing qnames in content, let's discuss a general issue with qnames that you might not have known about. Take the following XML:
<?xml version="1.0" ?>
<root xmlns:p="http://foo.org">
<p:elem att1="" att2="" ... />
<p:elem att1="" att2="" ... xmlns:p="http://bar.org" / >
<x:elem att1="" att2="" xmlns:x="http://foo.org" />
</root>

Notice the first two elements, both ostensibly named "p:elem", but if we treat the element names as opaque strings, we'll get confused and think the elements are the same. Luckily, we have this magical thing called a qname that uses namespace instead of prefix, and so we can note that the two element names are actually "{http://bar.org}elem" and "{http://foo.org}/elem" -- different. By the same token, if we compare the first and third element using opaque strings, we think that they are different ("p:elem" and "x:elem"). But if we look at the qnames, we see they are both "{http://foo.org}elem".
...
so what is the big deal for qnames in content? Look at the following XML:

<?xml version="1.0" ?>
<root xmlns:x="urn:x" xmlns:p="http://www.foo.org" >
<p:elem>here is some data: with a colon for no good reason</p:elem>
<p:elem>x:address</p:elem>
<p:elem xmlns:x="urn:y">x:address</p:elem>
</root>

Now, do the last two "p:elem" elements contain the same text, or different text? If you compared using XSLT or XPath, what would be the result? How about if you used the values in XSD key/keyref? The answer is that XSLT and XPath have no way of knowing that you intend those last two elements to be qnames, so they will treat them as opaque strings. With XSD, you could type the node as qname... Most APIs are smart enough to inject namespace declarations if necessary, so the first node would write correctly as:

<p:elem xmlns:p="http://www.foo.org">here is some data: with a colon for no good reason</p:elem>

But, since the DOM has no idea that you stuffed a qname in the element content, it's got no way to know that you want to preserve the namespace for x:

<p:elem xmlns:p="http://www.foo.org">x:address</p:elem>

There is really only one way to get around this, and this is for any API which writes XML to always emit namespace declarations for all namespaces in scope, whether they are used or not (or else understand enough about the XSD and make some guesses). Some APIs do this, but it is not something that all APIs can be trusted to do, and it yields horribly cluttered XML output and other problems.

Joshua has only hit the surface of what the real problem which is that there is no standard way to write out an XML infoset with the PSVI contributions added during validation. In plain English, there is no standard way to write out an XML document that has been validated using W3C XML Schema containing all the relevant type annotations plus other infoset augmentations. In the above example, the fact that the namespace declaration that uses the "x" prefix is not included in the output is not as significant as the fact that there is no way to tell that the type of p:elem's content is the xs:QName type.

However this doesn't change the fact that using QNames in content in an XML vocabulary is a bad idea. Specifically I am talking about using the xs:QName type in your vocabulary. The semantics of this type are so absurd it boggles the mind. Below is the definition from the W3C XML Schema recommendation

[Definition:] QName represents XML qualified names. The ·value space· of QName is the set of tuples {namespace name, local part}, where namespace name is an anyURI and local part is an NCName. The ·lexical space· of QName is the set of strings that ·match· the QName production of [Namespaces in XML].

This basically says that text content of type xs:QName in an XML document such as "x:address" actually is a namespace name/local name pair such as "{http://www.example.com}address". This instantly means that you can not interpret this type without carrying around some sort of context (i.e a list of namespace name<->prefix bindings) which makes it different from most other types defined in the W3C XML Schema recommendation because it has no canonical lexical representation. A value such as "x:address" is meaningless without knowing what XML document it came from and specifically what the namespace binding for the "x" prefix was at that particular scope.

Of course, the existence of the QName type means you can do interesting things like use a different prefix for a particular namespace in the schema than you use in the XML instance so you can specify that the content of the <p:elem> element should be one of a:address or a:location but have x:address in the instance which would be fine if the "a" prefix is bound to the "http://www.example.com" namespace in the schema and the "x" is bound to the same namespace in the instance document. You can also ask interesting questions such as What happens if I have a default value that is of type xs:QName but there is no namespace declaration for the namespace name at that scope? Does this mean that not only should a default value be inserted as the content of an element or attribute but also that a namespace declaration is also created at the same scope if one does not exist?

Fun stuff, not.

Categories: XML

« Modal Dialogs, Applications That Steal F... | Home | On Blogging About Unreleased Technology »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for When Bad Ideas Attack - Dare Obasanjo's weblog