In his blog post entitled Namepaces in Xml - the battle to explain Steven Livingstone wrote

It seems that Namespaces is quickly displacing Xml Schema as the thing people "like to hate" - well at least those that are contacing me now seem to accept Schema as "good".

Now, the concept of namespaces is pretty simple, but because it happens to be used explicitly (and is a more manual process) in Xml people just don't seem to get it. There were two core worries put to me - one calling it "a mess" and the other "a failing". The whole thing centered around having to know what namespaces you were actually using (or were in scope) when selecing given nodes. So in the case of SelectNodes(), you need to have a namespace manager populated with the namespaces you intend to use. In the case of Schema, you generally need to know the targetNamespace of the Schema when working with the XmlValidatingReader. What the guys I spoke with seemed to dislike is that you actually have to know what these namespaces are. Why bother? Don't use namespaces and just do your selects or validation.

Given that I am to some degree responsible for both classes mentioned in the above post, XmlNode (where SelectNodes()comes from) and XmlValidatingReader,  I feel compelled to respond.

The SelectNodes() problem is that people would like to perform XPath expressions over nodes and have it not worry about namespaces. For example given XML such as

<root xmlns=”http://www.example.com”>

<child />

</root>

to perform a SelectNodes() or SelectSingleNode() that returns the <child> element requires the following code

  XmlDocument doc = new XmlDocument(); 
  doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
  XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable); 
  nsmgr.AddNamespace("foo", "http://www.example.com");  //this is the tricky bit 
  Console.WriteLine(doc.SelectSingleNode("/foo:root/foo:child", nsmgr).OuterXml);   

whereas developers don't see why the code isn't something more along the lines of

  XmlDocument doc = new XmlDocument(); 
  doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
  Console.WriteLine(doc.SelectSingleNode("/root/child").OuterXml);   

which would be the case if there were no namespaces in the document.

The reason the latter code sample is not the case is because the select methods on the XmlDocument class are conformant to the W3C XPath 1.0 recommendation which is namespace aware. In XPath, path expressions that match nodes based on their names are called node tests. A node test is a qualified name or QName for short. A QName is syntactically an optional prefix and local name separated by a colon. The prefix is supposed to be mapped to a namespace and is not to be used literally in matching the expression. Specifically the spec states

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.

There are a number of reasons why this is the case which are best illustrated with an example. Consider the following two XML documents

<root xmlns=“urn:made-up-example“>

<child xmlns=”http://www.example.com”/>

</root>

<root>

<child />

</root>

Should the query /root/child also match the <child> element for the above two documents as it does for the original document in this example? The 3 documents shown [including the first example] are completely different documents and there is no consistent, standards compliant way to match against them using QNames in path expressions without explicitly pairing prefixes with namespaces.

The only way to give people what they want in this case would be to come up with a proprietary version of XPath which was namespace agnostic. We do not plan to do this. However I do have a tip for developers showing how to reduce the amount of code it does take to write the examples. The following code does match the <child> element in all three documents and is fully conformant with the XPath 1.0 recommendation

XmlDocument doc = new XmlDocument(); 
doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
Console.WriteLine(doc.SelectSingleNode("/*[local-name()='root']/*[local-name()='child']").OuterXml);  

Now on to the XmlValidatingReader issue. Assume we are given the following XML instance and schema

<root xmlns="http://www.example.com">
 <child />
</root>

<xs:schema targetNamespace="http://www.example.com"
            xmlns:xs="http://www.w3.org/2001/XMLSchema"
            elementFormDefault="qualified">
       
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="child" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

The instance document can be validated against the schema using the following code

XmlTextReader tr = new XmlTextReader("example.xml");
XmlValidatingReader vr = new XmlValidatingReader(tr);
vr.Schemas.Add(null, "example.xsd");

vr.ValidationType = ValidationType.Schema;
vr.ValidationEventHandler += new ValidationEventHandler (ValidationHandler);

while(vr.Read()){ /* do stuff or do nothing */  

As you can see you do not need to know the target namespace of the schema to perform schema validation using the XmlValidatingReader. However many code samples in our SDK to specify the target namespace where I specified null above when adding schemas to the Schemas property of the XmlValidatingReader. When null is specified it indicates that the target namespace should be obtained from the schema. This would have been clearer if we'd had an overload for the Add() method which took only the schema but we didn't. Hindsight is 20/20.


 

Tuesday, 10 February 2004 00:16:03 (GMT Standard Time, UTC+00:00)
I thought that it was interesting that BizTalk 2004 seems to take the /*[local-name()='root']/*[local-name()='child'] approach when adding properties to the , though they actually make their XPath namespace aware using namespace-uri() as well, e.g. "/*[local-name()='root' and namespace-uri()='http://www.example.com']/*[local-name()='child' and namespace-uri()='http://www.example.com']". I'm not sure why they did that, except maybe it was too difficult to track ns prefixes, or maybe because the XPath statement can then be used out of context, without having to worry about ns bindings?
Thursday, 12 February 2004 13:09:16 (GMT Standard Time, UTC+00:00)
Great Stuff Dare - I will also forward this to the guys who were asking me about it all.
Comments are closed.