I was reading an XML-Deviant column on XML.com entitled Browser Boom when I came across the following excerpt

The inevitable association with Microsoft's CLI implementation is proving a source of difficulty for the Mono project. The principal author of Mono's XML support, Atsushi Eno, posted to the Mono mailing list on the problems of being conformant in Mono's XML parser implementation. More specifically, whose rules should Mono conform to. W3C or Microsoft?

MS XmlTextReader is buggy since it accepts XML declaration as element content (that violates W3C XML specification section 3 Logical Structures). ... However, there is another discussion that it is useful that new XmlTextReader (xmlText, XmlNodeType.Element, null) accepts XML declaration.

... that error-prone XmlTextReader might be useful (especially for people who already depends on that behavior)

... we did not always reject Microsoft badness; for example we are copying System.Xml.XmlCDataSection that violates W3C DOM interface hierarchy (!)

The root of the dilemma is similar to that which Mozilla and Opera are trying to manage in the browser world.

What I find interesting is that instead of pinging the MSFT XML folks (like myself) and filing a bug report this spawned a dozen message email discussion on whether Mono should be bug compatible with the .NET Framework. Of course, if the Mono folks decide to be bug compatible with this and other bugs in System.Xml and we fix them thus causing breaking changes in some cases will we see complaints about how Microsoft is out to get them by being backwards incompatible? Now that Microsoft has created the MSDN Product Feedback Center they don't even have to track down the right newsgroup or email address of a Microsoft employee to file the bug.

It's amazing to me how much work people cause for themselves and conspiracy theories they'd rather live in than communicate with others.

Update: I talked to developer responsible for the XmlTextReader class and she responded "This is by design. We allow XML declaration in XML fragments because of the encoding attribute. Otherwise the encoding information would have to be transferred outside of the XML and manually set into XmlParserContext."


 

Thursday, July 15, 2004 5:21:27 PM (GMT Daylight Time, UTC+01:00)
IMO, they've got their heads up their own, um, backsides. Their starting position for all work seems to be 'W3C Is God'. This isn't solely related to W3C projects - any published standard Is God, to most FOSS people.

Here we have a conflict between two published standards: W3C's XML DOM and ECMA/ISO's CLI. Obviously as the overarching organisation ISO wins ;-)

What they often miss is that extra non-'compliant' features are often useful, and extending the current standard to solve customer problems is useful. Such is the case with Managed C++ and the new C++/CLI. Neither are compliant with the ISO/IEC 14882 - indeed, C++/CLI explicitly breaks the prohibition on creating new keywords (well, sort of - it has context-dependent keywords) without a double-underscore prefix. The original C++ Managed Extensions used __keywords and user reaction was, well, ugly.

Now, if we're talking about implementation defects, that's a different story. If the specification defines the semantics one way, and the implementation differs, you either have to fix the implementation or change the specification. Sometimes you see places in MSDN where the documentation says something like 'works as documented for Windows 2000, XP, Server 2003 and 9x, but this flag is interpreted the other way round on NT 4.0'.
Thursday, July 15, 2004 5:21:51 PM (GMT Daylight Time, UTC+01:00)
I think the main reason is that microsoft has historically been fairly unresponsive to bug reports of this nature. Secondly, any fix made by microsoft won't be in widespread until the release of version 2.0 of the framework while the compatibility issues still need to be handled in some way in the meantime.
Bear in mind that the particular issue has been there since pre 1.0. I haven't checked the last 2.0 yet. I'll go do that now.
If the product feedback center means a ms more responsive to bug reports then thats awesome. I'll make a point of using it from now on.
ianm
Thursday, July 15, 2004 8:11:24 PM (GMT Daylight Time, UTC+01:00)
Hey Dare,

Atsushi is a guy. ;-)

Duncan.
Duncan mak
Thursday, July 15, 2004 8:11:58 PM (GMT Daylight Time, UTC+01:00)
Oh wait, I misread. Sorry.
Duncan mak
Sunday, July 18, 2004 8:08:03 AM (GMT Daylight Time, UTC+01:00)
It is fine if an API carries information that is not the set identified by the W3C XML Infoset, of course. Insignificant whitespace, etc. But the encoding declarations is a little different, because by the time the document is parsed it says what the encoding *was* which is probably not what it currently *is*. So the parsed document now lies.

(The most notorious instance of this kind of problem is Java's HTML parser, which throws an exception aborting parsing if there is content-type META tag supplying an encoding, even though that is handled by a completely different layer.)

I think would be best if APIs clearly signify any non-XML Infoset datums: for example, subclassing (or providing some distinguishing attribute for) PI into "Infoset PI" and "non-Infoset PI", sublassing text into "Significant text" and "non-significant text" or whatever. Or providing some top-level metadata carrying the original encoding, etc. The other difficulty with providing the original XML header declaration is that, because XML documents can contain entities, you really need to have all the encoding declarations from sub-entities: and then you need to have some signal to indicate that the entity ended, otherwise you cannto get round-tripping (and if round-tripping is not the point, why do it: a serialization hint?)
Rick Jelliffe
Comments are closed.