Matevz Gacnik points out Serious bug in System.Xml.XmlValidatingReader, he writes

The schema spec and especially RFC 2396 state that xs:anyURI instance can be empty, but System.Xml.XmlValidatingReader keeps failing on such an instance.

To reproduce the error use the following schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="AnyURI" type="xs:anyURI">
  </xs:element>
</xs:schema>

And this instance document:

<?xml version="1.0" encoding="UTF-8"?>
<AnyURI/>

There is currently no workaround for .NET FX 1.0/1.1. Actually Whidbey is the only patch that fixes this. :)

The schema validation engine in the .NET Framework uses the System.Uri class for parsing URIs. This class doesn't consider an empty string to be a valid URI which is why our schema validation considers the above instance to be invalid according to its schema. However it isn't clear cut in the specs whether this is valid or not at least not without a bunch of sleuthing. As Micheal Kay (XSLT working group member) and C.M. Speilberg-McQueen (chairman of the XML Schema working group) wrote on XML-DEV

To: Michael Kay <michael.h.kay@ntlworld.com>
Subject: RE: [xml-dev] Can anyURI be empty?
From: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Date: 07 Apr 2004 10:49:51 -0600
Cc: xml-dev@lists.xml.org

On Wed, 2004-04-07 at 03:47, Michael Kay wrote:
> > If it couldn't, it would be wrong. An empty string is a valid URI.
>
> On this, like so many other things, RFC 2396 is a total disaster. An empty
> string is not valid according to the BNF syntax, but the RFC gives detailed
> semantics for what it means (detailed semantics, though very imprecise
> semantics).
>
> And the schema REC doesn't help. It has the famous note saying that the
> definition places "only very modest obligations" on an implementation, and
> it doesn't say what those obligations are.

Yes.  This is a direct result of our realization that
we have as much trouble understanding RFC 2396 as anyone
else.  The anyURI type imposes the obligations of
RFC 2396, whatever those are.  Any attempt to paraphrase
them on our part would lead, I fear, to an unsatisfactory
result: either we would make some mistake (like believing
that since the BNF does not accept the empty string,
it must not be legal)
or we would make no mistakes.  In
the one case, we'd be misleading our readers, and in
either case, we'd find ourselves mired in a never-ending
effort to prove that our paraphrase was, or was not,
correct. 

RFC 2396 is one of the fundamental specifications of the World Wide Web yet it is vague and contradictory in a number of key places. Those of us implementing standards often have to go on gut feel or try and track the spec authors whenever we bump across issues like this but sometimes we miss them.

All I can do is apologize to people like Matevz Gacnik who have to bear the brunt of the lack of interoperability caused by vaguely written specifications implemented on our platform and for the fact that a fix for this problem won't be available until Whidbey.


 

Sunday, April 11, 2004 7:31:42 AM (GMT Daylight Time, UTC+01:00)
If the spec is vaguely written, then how can a fix be forthcoming?
If it /is/ clear what to do, the only apology left is why it takes so long to update software in the 21st century.
Mike
Sunday, April 11, 2004 6:35:50 PM (GMT Daylight Time, UTC+01:00)
Mike,
Even if the spec seems contradictory and vague we have to do our best to implement something satisfactory for our customers. In this case, it seems best to interpret the spec as allowing empty string as a valid URI especially since InfoPath has already shipped with this assumption.

You are right that the major part of my apology is that it is going to take so long for a fix to get in the hands of our customers.
Comments are closed.