September 8, 2004
@ 03:23 PM

Roger Costello recently started a discussion thread on the XML-DEV mailing list about the common misconceptions people have about XML document validation and schemas. He recently summarized the discussion thread in his post Fallacies of Validation, version #3. His post begins

The purpose of documenting the below "fallacies" is to identify erroneous common thought that many people have with regards to validation and its role in a system architecture.  Perhaps "assumptions" would be a better term to use than "fallacies".  In any case, the desire of this writeup (which is a compilation of discussions on the xml-dev list) is to provoke new ways of thinking about validation, and reject limiting and static views on validation. 

Fallacies of Validation

1. Fallacy of "THE Schema"

2. Fallacy of Schema Locality

3. Fallacy of Requisite Validation

4. Fallacy of Validation as a Pass/Fail Operation

5. Fallacy of a Universal Validation Language

6. Fallacy of Closed System Validation

7. Fallacy that Validation is Exclusively for Constraint Checking

I mostly agree with the fallacies as described in his post.

Fallacy #1 has been a favorite topic of Tim Ewald over the past year. It isn't necessarily true that there is one canonical schema for an XML vocabulary. Instead the schema for the vocabulary may depend on the context the XML document is being used in. A classic example of this is XHTML which has 3 schemas (DTDs) for a single format.

I consider Fallacy #2 to be more of a common mistake than a fallacy. Many people create validation systems that work in a local environment such as creating specific patterns or structures for addresses or telephone numbers which may work in a local system but break down when used in a global environment like the World Wide Web. This common mistake isn't limited to XML validation but applies to all arenas where user input is validated before being stored or processed

Fallacy #3 is interesting to me because I wonder how often it occurs in the wild. Are there really that many people who believe they have to validate XML documents against a schema?

Fallacy #4 is definitely a good one. However I disagree with the quotes he uses to butress the main point for this fallacy. I especially don't like the fact that he uses a generalization from Rick Jellife about bugs in a few schema validators as a core part of his argument. The important point is that schema validation should not always be viewed as a PASS/FAIL operation and in fact schema languages like W3C XML Schema go out of their way to define how one can view an XML document as being part valid, part invalid.

One size doesn't fit all is the message of Fallacy #5 to which I heartily cheer "Hear! Hear!". I agree 100%. There is no one XML schema language that satisfies every validation scenario.

I don't really understand Fallacy #6 without seeing some examples so I won't comment on it. I'll see if I can dig up the discussion threads about this on XML-DEV later.

Fallacy #7 is another one where I agree with the message but mostly disagree with how he argues the point. All of his examples are all variations of using schemas for constraint checking, they just differ on how the document is processed does after constraint checking is done. To me, the prime example of the fact that schema validation is not just for constraint checking is that many technologies actually using schemas for creating typed XML documents or for translating XML from one domain to another (e.g. Object<->XML, Relational<-> XML),

Everything said, this was a good list. Excellent work from Roger as usual.