Mark Baker has a blog post entitled Validation considered harmful where he writes

We believe that virtually all forms of validation, as commonly practiced, are harmful; an anathema to use at Web scale. Specifically, our argument is this;
Tests of validity which are a function of time make the independent evolution of software problematic.

Why? Consider the scenario of two parties on the Web which want to exchange a certain kind of document. Party A has an expensive support contract with BigDocCo that ensures that they’re always running the latest-and-greatest document processing software. But party B doesn’t, and so typically lags a few months behind. During one of those lags, a new version of the schema is released which relaxes an earlier stanza in the schema which constrained a certain field to the values “1″, “2″, or “3″; “4″ is now a valid value. So, party B, with its new software, happily fires off a document to A as it often does, but this document includes the value “4″ in that field. What happens? Of course A rejects it; it’s an invalid document, and an alert is raised with the human adminstrator, dramatically increasing the cost of document exchange. All because evolvability wasn’t baked in, because a schema was used in its default mode of operation; to restrict rather than permit.

This doesn't seem like a very good argument to me. The fact that you enforce that the XML documents you receive must follow a certain structure or must conform to certain constraints does not mean that your system cannot be flexible in the face of new versions. First of all, every system does some form of validation because it cannot process arbitrary documents. For example an RSS reader cannot do anything reasonable with an XBRL or ODF document, no matter how liberal it is in what it accepts. Now that we have accepted that there are certain levels validation that are no-brainers the next question is to ask what happens if there are no constraints on the values of elements and attributes in an input document. Let's say we have a purchase order format which in v1 has a <currency> element which can have a value of "U.S. dollars" or "Canadian dollars" then in v2 we now support any valid currency. What happens if a v2 document is sent to a v1 client? Is it a good idea for such a client to muddle along even though it can't handle the specified currency format?

As in all things in software, there are no hard and fast rules as to what is right and what is wrong. In general, it is better to be flexible rather than not as the success of HTML and RSS have shown us but this does not mean that it is acceptable in every situation. And it comes with its own set of costs as the success of HTML and RSS have shown us. :)

Sam Ruby puts it more eloquently than I can in his blog post entitled Tolerance.


Saturday, December 16, 2006 7:34:07 PM (GMT Standard Time, UTC+00:00)
"What happens if a v2 document is sent to a v1 client? Is it a good idea for such a client to muddle along even though it can't handle the specified currency format?"

No, of course not. As I say later in the post;

"rule of thumb for software is to defer checking extension fields or values until you can’t any longer"

The problem with virtually all uses of validation that I've seen is that this document would be rejected long before it even got to the bit of software which cared about currency. I'm arguing against the use of validation as a "gatekeeper", not against the practice of checking values to see whether you can process them or not ... I thought it goes without saying that you need to do that! 8-O
Tuesday, December 19, 2006 4:50:10 PM (GMT Standard Time, UTC+00:00)
I agree with Mark. Dare completely missed the point when he argued for generic validation (who can argue against checking your data?). The real issue is validation as gatekeeper which is the way the term "validation" is used with XML. Checking your incoming data is something that all programs have always done (if they want to work), but validation as practiced with XML is truly detrimental because it attempts to isolate the data checking step to a declarative spec which cannot possibly do it either completely or flexibly; and versioning is a key showstopper with XML Validation. Here's my article on this:
Comments are closed.