Knowing the Limitations of XML Schema Validation

May 6, 2004

@ 04:19 PM

I recently stumbled on blog posting by Phil Ringnalda called a little chip in the concept where he notes

Still, I was a bit surprised when Xiven linked to a post to the validator mailing list, pointing out that the utterly wrong HTML <a href=""><b><a href=""></a></b></a>, which is reported as invalid in HTML, is ignored in XHTML. Nesting links is one of those basic, there's absolutely no way you can ever do this, things, but in XHTML if you put a nested link inside an inline element, the validator won't catch it. According to Hixie's answer, it's because the validator uses an XML DTD for XHTML, and an SGML DTD for HTML, and while you can say that a/b/a is wrong in an SGML DTD, you can't in an XML DTD. As he puts it, in XHTML it's XML-valid but non-compliant.

Phil has just stumbled on just one of many limitations of XML schema languages. At first, when people see an XML schema language they expect that they will be able to use it to declaratively describe all the rules of their vocabulary. However this is rarely the case, every XML schema language has limitations in the constraints it can express. For example, W3C XML Schema can't express constraints such as a choice between attributes (either an uptime or downtime attribute appears on an element), DTDs can't express constraints on the range a text value can be (must be an integer between 5 and 10), RELAX NG can't express identity constraints on numeric values (e.g. each book in the inventory must have a unique ISBN) , and so on.

This means that developers using an XML schema language should be very careful when designing XML applications or XML vocabularies about what rules they can validate when they receive an input document. In some cases, the checks performed by schema validation may be so limited for a vocabulary that it is better to check the constraints using custom code or at the very least augment schema validation with some custom checks as well.

The fact is that many XML vocabularies are complex enough that their constraints aren't easily be expressible using a conventional XML schema language. XML vocabulary designers and developers of XML applications should always be on the look out for such cases else incorrect decisions be made in choosing a validation framework for incoming XML documents.

Categories: XML

« RSS Bandit v1.2.0.112 Installer Refreshe... | Home | Design Guidelines for Exposing XML in AP... »

Monday, 10 May 2004 04:27:20 (GMT Daylight Time, UTC+01:00)

Actually, an XML schema could represent this concept. Just have two different kinds of b elements, one that can occur normally and contain whatever, and another that occurs as a descendant of a but can't contain a (as a child or descendant). You'd need to do this for every similar inline element (such as i).

Of course, this would require a combinatorial explosion of the XML schema types, but it could be done, at least for this particular example.

[Too bad mixed content is a second-class citizen in XQuery. You'd think that might limit its usefulness for processing XHTML. Maybe XHTML isn't paid enough attention by the XML specs?]

Michael Brundage

Monday, 10 May 2004 10:36:21 (GMT Daylight Time, UTC+01:00)

Funnily enough, IE(6) doesn't have any trouble rendering nested hyperlinks, although they look like a continous hyperlink, any text within the inner link points at the inner links href and any text in the outer link points at the outer links href.

anonymouse

Monday, 10 May 2004 10:38:38 (GMT Daylight Time, UTC+01:00)

You could catch this quite easily using an xslt?

anonymouse

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Knowing the Limitations of XML Schema Validation - Dare Obasanjo's weblog